The following glossary contains terms and definitions that might be helpful for researchers and members of the public who have an interest in active public involvement in research.

**absolute risk reduction**^{2}

^{2}

(ARR) or Absolute Risk Difference (ARD). In comparative studies, it is the difference in risk of a particular event between two groups. As opposed to the risk ratio which only expresses the relative benefits of one treatment compared to another (e.g. twice as many patients died on treatment A compared with treatment B), the ARR is dependent on the risk of the event in the control group (or baseline risk). The benefits of one treatment compared with the other can therefore be expressed in absolute terms. Using the example above, it is possible to state how many deaths are prevented by treatment B in comparison with treatment A. The absolute risk reduction used to compute the number needed to treat which conveys the same idea.

### accuracy^{2}

In the context of clinical measurement, it refers to whether the measurement made is correct. A correct measurement should be both accurate and precise (1). In most instances precision is less important than accuracy. For example, if the true weight of a patient is 67.567 Kg, it is better to have it measured as 68 Kg rather than 70.432 Kg. The reliability of a measurement method depends on (among other factors) its accuracy.

### action research^{3}

Usually employs a qualitative approach. The essence of action research is that it is problem-centred and problem-solving. Action research is cyclical and includes a number of stages where the results of the research are fed back to the participants and acted upon on an ongoing basis.

### additive model^{1}

A model in which the combined effect of several factors is the sum of the effects produced by each of the factors. For example, if one factor multiplies risk by a and a second factor by b, the combined effect of the two factors is a + b. See also multiplicative model.

### adjusted estimates^{2}

As opposed to crude estimates. For example, when comparing populations with different age structure, it is not appropriate to compare the death rates observed in each population without taking account of the age differences. Methods such as stratification standardisation and multiple regression are used so that age adjusted comparisons can be made. These methods take confounding factors into account (such as age in the above example), producing adjusted estimates which less biased. For example, higher crude mortality rates in Bournemouth compared with Harrogate would reflect the older population in Bournemouth. After age standardisation any differences found in mortality rates can be attributed to factors other than age.

### aetiology^{1}

The natural history of a disease. How pathogenic agents and environmental influences interact with the human population. American English spelling, etiology.

### allocation concealment^{1}

See concealment of allocation.

### analytic induction^{3}

In qualitative research: the use of the constant comparative method to develop hypotheses which are then tested in further data collection and analysis.

### analytical research methods^{1}

Research methods which aim to compare two or more groups. These include observational (case-control study and cohort study) and intervention methods (randomised controlled trials and non-randomised trials).

### ANCOVA^{2}

Or analysis of covariance. Statistical method for comparing the means of a quantitative variable between groups (analysis of variance) whilst taking into account measurements made for another, possibly influential, quantitative variable or covariate. Since this will often be a confounder or a baseline measurement, ANCOVA produces adjusted estimates. For example, the blood pressures of smokers and non-smokers can be compared, adjusting the analysis for any differences in weight, or other factors which have an effect on blood pressure, and are associated with smoking. ANCOVA is commonly carried out using regression analysis with dummy variables.

### ANOVA^{2}

Or analysis of variance. Statistical test for comparing the means of a quantitative variable between two or more groups. It is an extension of the independent samples t test. In summary, ANOVA weighs the total variability found in an outcome variable of interest and divides it into a between-groups component and a within-groups component (each of these further divided by the appropriate number of degrees of freedom to produce a mean square - MS). The significance test for differences between groups is based on the comparison of these two components of variability, under the assumption that there are no differences between groups (Null Hypothesis). If this hypothesis is true the two MSs will be similar, and their ratio equal to 1. This is known as the F test or variance ratio test. Depending on study design one way or two way ANOVA will be used. Results from ANOVA can be reproduced with some advantages by regression methods, using dummy or indicator variables.

### applicability^{1}

The degree to which the results of an observation, study or review hold true in other settings.

### area under the curve (AUC)^{2}

Summary measure used in the context of repeated measurements analysis and diagnostic tests using quantitative measurements. It the latter context, it is the area below the ROC (receiver operating characteristics) curve. Plotting ROC curves for different diagnostic tools enables a comparison of their diagnostic ability to be made: the bigger the AUC the better the diagnostic test at correctly identifying individuals with and without a given disease. The area under the curve can thus be interpreted as the probability of correctly identifying the diseased individual and the non diseased individual, given that one is presented with two subjects randomly selected from a population, where one of them is diseased and the other is not. In repeated measurements analysis the AUC is frequently used instead of the mean to convey the idea of response over time, especially when measurements have not been made at equal time intervals. SEE ALTMAN (1991) FOR A WORKED EXAMPLE.

### assumptions^{2}

Specific conditions required by statistical tests in order to produce valid results. Parametric methods of analysis are particularly dependent on assumptions. Usual assumptions are: Normality of distributions independence of observations, linear relationship between two variables which are associated, constant variance or homoscedasticity etc., depending on the statistical test being used. For example, an independent samples t test (for comparing means between two groups) assumes the variable being compared has the same variance or variability in each of the groups. If this is not true, the test results may be flawed.

### attributable risk^{2}

Same as absolute risk reduction or difference. This term is frequently used in the context of epidemiological studies. Can also be expressed as a proportion of the risk in the exposed, the proportional attributable risk or population attributable fraction.

### attrition bias^{1}

Systematic differences between comparison groups in withdrawals or exclusions of participants from the results of a study. For example, patients may drop out of a study because of side effects of the intervention. Excluding these patients from the analysis could result in an overestimate of the effectiveness of the intervention.

### audit

See clinical audit

### Bartlett's test^{2}

A significance test for comparing the variances of two or more populations. This test is an extension of the F test.

### Bayes' theorem^{1}

A probability theorem used to obtain the probability of a condition in a group of people with some characteristic (e.g. exposed to an intervention of interest, or with a specified result on a diagnostic test) on the basis of the overall rate of that condition (the prior probability) and the likelihoods of that characteristic in people with and without the condition.

### Bayesian analysis^{1}

An approach that can be used in single studies or meta-analysis which incorporates a prior probability distribution based on subjective opinion and objective evidence, such as the results of previous research. Bayesian analysis uses Bayes' theorem to update the prior distribution in light of the results of a study, producing a posterior distribution. Statistical inferences (point estimates, confidence intervals, etc.) are based on this posterior distribution. The posterior distribution also acts as the prior distribution for the next study. This approach has many attractive features, but is controversial because it depends on opinions, and frequently they will vary considerably.

### Berkson's fallacy^{2}

Common type of bias in case-control studies in particular hospital-based and practice-based studies. It occurs due to differential admission rates between cases and controls. This leads to positive (and spurious) associations between exposure and the case control status with the lowest admission rate. For Berkson's fallacy to occur, the exposure of interest must itself be an 'admittable' condition

### bias

Deviation of results from the truth, due to systematic error(s) in the methods used. Bias does not necessarily imply prejudice, such as the investigators' desire for particular results. Bias can occur in a descriptive study where groups are not being compared, if measurements made on individuals are systematically in error; for example, bias would occur in a survey of children's weights carried out using weighing scales that always read too high. In an analytical study, bias comes in two main forms: (a) selection bias, which occurs when the two (or more) groups being studied differ systematically in some way, and (b) observer (or information) bias which occurs when there are systematic differences in the way information is collected for the groups being studied. In studies of the effects of healthcare, bias can also arise from systematic differences in the care that is provided, or exposure to other factors apart from the intervention of interest (performance bias), withdrawals or exclusions of people entered into the study (attrition bias) or how outcomes are assessed (detection bias). Interviewer bias is a classic example of observer/information bias; is an interviewer is not blinded to the group to which a subject belongs, questions may be asked in a subtly different way for one group compared to another. You will probably detect a strong similarity between the definition of selection bias and the definition of confounding; if older subjects are more likely to be allocated to one group than another, this is selection bias, and age should also be considered to be a confounding factor. However, confounding can occur in the absence of selection bias. A distinction is often made in epidemiology between things which can be measured and which therefore can potentially be controlled for (confounding), and things which cannot be measured and of which the researcher is frequently unaware (bias). The latter can be a particular problem in a case-control study if cases and controls are recruited in radically different ways, e.g. cases from hospital, and controls randomly from an electoral register, in order to be a case, the subject has to have been in contact with the NHS, whereas a proportion of people on the electoral register may not even be registered with a GP, let alone attend with any regularity. See also methodological quality, validity. Bias (in diagnostic tests). The deviation of results from the truth, due to systematic errors in the methods used. There a number of specific forms of bias when considering the accuracy of diagnostic tests.

### binary variable^{2}

A categorical variable which takes only two possible values. E.g. yes or no, dead or alive.

### blinding

The process of keeping secret from the study participants or investigators to which group (e.g. to treatment or control) a subject belongs. Studies are often referred to as single or double blind. In a single-blind study the subject do not know to which group they have been allocated, but the researcher carrying out the trial does, or vice-versa; in a double-blind trial, neither subject nor researcher knows to which groups subjects belong. Blinding is one way of trying to prevent observer and interview biases (information biases). Blinding is used to protect against the possibility that knowledge of assignment may affect patient response to treatment, provider behaviours (performance bias) or outcome assessment (detection bias). Blinding is not always practical (e.g. when comparing surgery to drug treatment). The importance of blinding depends on how objective the outcome measure is; blinding is more important for less objective outcome measures such as pain or quality of life. In trials comparing an active treatment with no treatment, placebos are usually administered to patients in the control group to maintain blindness. See also single blind, double blind and triple blind. (synonym: masking). Blinding in diagnostic studies refers to keeping the assessors of a diagnostic test result blind to (i.e. unaware of) the result of the comparison test.

### Bonferroni correction^{2}

Frequently used in the context of multiple significance testing i.e., when several significance tests are carried out simultaneously, as a method of keeping the overall probability of wrongly rejecting the Null Hypothesis (type l error) below a specified level, usually 0.05 or 5 per cent. The correction is applied by multiplying each P value obtained by the number of tests performed. If, for example, two groups of patients are compared with respect to three different outcomes (say, blood pressure, weight and heart rate) and a P-value of 0.04 (statistically significant using the conventional cut off point: 0.05) is obtained for each of these comparisons, then the value for P becomes 0.04x3=0.12, which is no longer significant. The method tends to give overcorrected P-values.

### bootstrapping^{2}

Empirical method of obtaining confidence intervals for estimates by taking repeated samples from a single data set, usually using a computer. For example, to obtain a confidence interval for a mean the mean for each sample' is calculated. The confidence interval is based on the distribution of these sample means, and can be constructed by calculating the 2.5th and the 97.5th percentiles of this distribution.

### carry over effect^{2}

In the context of crossover trials. When assessing treatment effects in the second period of a trial, it is important to evaluate to what extent the measurement observed in this period (and attributed to any treatments given) is a result of or a response to the treatment given in the first period. Carry-over effects may give rise to treatment-period interactions. Appropriate washout periods are important in preventing carry-over effects.

### case^{1}

A subject who experiences the outcome of interest.

### case control study

An observational research method which aims (a) to detect new cases of a disease (or some other outcome) and (b) to study differences between cases and controls with respect to their past experiences and exposure to risk factors. It compares cases (people with the disease or outcome of interest) and a suitable control group (those without the disease or outcome). The relationship of an attribute (intervention, exposure or risk factor) to the outcome of interest is examined by comparing the frequency or level of the attribute in the cases and controls. The comparison of results between the two groups is expressed as an odds ratio. It is very important that cases and controls should be drawn from the same population; failure to ensure that this is the case will result in selection bias. (Note that outcomes can be desirable events, e.g. discharge to home for an elderly person, as well as undesirable e.g. death, disability, hospital admission.) Case-control studies are sometimes described as being retrospective as they are always performed looking back in time. Differential recall of exposure in the two groups and difficulties in selecting appropriate cases and controls are common sources of bias in this type of study (selection bias recall bias). Berkson's fallacy is also a known cause of spurious associations found in this type of study. Case-control studies are particularly helpful in the study of rare conditions and infectious disease outbreaks. Sometimes called a retrospective study since patients who have already been diagnosed are usually used as cases, although some case-control studies are conducted prospectively i.e. using new, rather than existing cases. (synonyms: case referent study, retrospective study)

### case series^{3}

An uncontrolled observational study involving an intervention and outcome for more than one person.

### case study^{1, 3}

In qualitative research: this method of qualitative study focuses the collection of data on a single case, which may be one or more units. It involves several methods of data collection, and the gathering of several kinds of data. It may include both quantitative and qualitative strategies. A case study is also a quantitative uncontrolled observational study involving an intervention and outcome for a single person. (synonyms: anecdote, case history, single case report)

### case-control study^{3}

An observational research method which aims (a) to detect new cases of a disease (or some other outcome) and (b) to study differences between cases and controls with respect to their past experiences and exposure to risk factors. It compares cases (people with the disease or outcome of interest) and a suitable control group (those without the disease or outcome). The relationship of an attribute (intervention, exposure or risk factor) to the outcome of interest is examined by comparing the frequency or level of the attribute in the cases and controls. The comparison of results between the two groups is expressed as an odds ratio. It is very important that cases and controls should be drawn from the same population; failure to ensure that this is the case will result in selection bias. (Note that outcomes can be desirable events, e.g. discharge to home for an elderly person, as well as undesirable e.g. death, disability, hospital admission.) Case-control studies are sometimes described as being retrospective as they are always performed looking back in time. Differential recall of exposure in the two groups, and difficulties in selecting appropriate cases and controls are common sources of bias in this type of study (selection bias recall bias). Berkson's fallacy is also a known cause of spurious associations found in this type of study. Case-control studies are particularly helpful in the study of rare conditions and infectious disease outbreaks. Sometimes called a retrospective study since patients who have already been diagnosed are usually used as cases, although some case-control studies are conducted prospectively i.e. using new, rather than existing cases. (synonyms: case referent study, retrospective study)

### categorical variable^{2}

A variable whose values represent different categories or classes of the same feature e.g. ethnicity, blood group and eye colour, which can also be called nominal variables. When the variable has only two categories it is termed binary (e.g. sex). If there is an inherent ordering or where a quantitative variable has been categorised, it is called an ordinal variable.

### cause effect relationship^{2}

This describes the relationship between two factors which are associated, whenever it is possible to establish that one of the factors causes the other. Several criteria must be met before such conclusion can be reached.

### censoring^{2}

In the context of follow-up studies: the outcome (e.g. death) of a subject is said to be censored if, for that particular individual, the outcome was not observed within the follow-up period for the individual. Loss to follow-up frequently leads to censoring.

### chi squared test^{2}

A significance test for comparing two or more proportions from independent samples. It can also be used to test for an association between two nominal variables (e.g. ethnicity and blood group) or between a nominal and an ordinal variable (e.g. gender (binary) and degree of pain experienced after surgery). In the latter case the Chi-squared test for trend should be used. When carrying out the Chi squared test, the observed frequencies (0) are displayed in a contingency table and the expected frequencies (E) calculated. This is done for each cell in the table. The test is based on the differences between observed and expected frequencies across the cells: the greater the differences the smaller the P-value produced by the test. The statistical significance of the results also depends on the size of the table, i.e. on the number of categories of the two variables involved, represented by the degrees of freedom (d.f.) of the test. The larger the table the greater the differences need to be for statistical significance to be achieved. The assumptions for the Chi-squared test are independence of the observations, at least 80 per cent of the cells with expected frequencies greater than five and all cells with expected frequencies greater than one. When these assumptions are not met other tests such as the Fisher's exact test should be used. The McNemar's test is indicated when analysing paired (non-independent) proportions. When the Chi-squared test is used with small samples (roughly less than 30) in analysing two by two tables a correction Yates' correction) should be applied to the Chi-squared statistic to avoid incorrect results. The Mantel-Haenszel test is another well known chi-square test.

### CI

### CINAHL^{3}

Cumulative Index of Nursing and Allied Health Literature. A database of research literature in the fields of nursing and allied health professions.

### clinical audit^{3}

The process of measuring local clinical performance against agreed standards of care. The Department of Health, 1993 described it as "...the systematic and critical analysis of the quality of clinical care. This includes the procedures used for diagnosis, treatment and care of patients, the associated use of resources and the effect of care on the outcome and quality of life of the patient". Are we doing the right thing? How far are we from where we want to be?

### clinical effectiveness^{1}

The extent to which a treatment, procedure or service does patients more good than harm - the extent to which the outcome differs. Ideally, the determination of clinical effectiveness is based on the results of a randomised controlled trial (RCT). Clinical effectiveness is also known simply as 'effectiveness'.

**clinical epidemiology**^{1}

^{1}

"The application of epidemiological principles and methods to problems encountered in clinical medicine" (Fletcher et al. 1982).

### clinical significance^{2}

Refers to the magnitude of a treatment effect, expressed in terms such as relative risk, absolute risk difference or number needed to treat. Unlike statistical significance, it is not dependent on sample size. It requires a clinical or public health judgement on what is a large effect. A result which is statistically significant may nonetheless be too small to warrant any changes in treatment or other policies, in which case it is not clinically significant. Confidence intervals can help assess the clinical significance of a given study result.

### clinical trial

A trial that tests out a drug or other intervention to assess its effectiveness and safety. This general term encompasses randomised controlled trials and controlled clinical trials. (synonyms: therapeutic trial, intervention study). Investigators intervene in the natural course of a disease by administering drugs or other treatments (interventions to at least one of the study groups and then assess the effect of treatment.

### clinically important difference^{1}

The extent of difference in clinical effectiveness that would be likely to be acted on by most people involved in delivering healthcare. For example, a 5% reduction in the admission rate for a particular condition might be considered clinically important, whereas a 1-2% reduction might not. Consideration of what is a clinically important difference is necessary in order to carry out a sample size calculation. Contrast it with a statistically significant difference. Inevitably, when deciding what is a clinically important difference, there is a tendency to consider the other things as well, such as the cost of the intervention that is required to achieve the difference.

### closed question

A question where respondents are offered a set of answers and are invited to select the one closest to their views.

### cluster analysis^{2}

Multivariate method also referred to as unsupervised pattern recognition (in artificial intelligence language). Profiles for subjects being studied are compared, and subjects who are 'close' together are classified as being in the same cluster or group. The term 'profile' refers to a set of measurements pertaining to a single subject. These may be repeated measurements of a single variable (e.g. pain scores over a 5-hour period, after 30 minutes exercise for patients with arthritis), measurements on a variety of factors (pain, flexibility, depression, haematological parameters), or a combination of both.

### cluster sampling

A type of sampling which involves first selecting clusters or groupings and then selecting sampling units from each of the selected clusters. Groups of subjects are treated as the sampling units, as opposed to simple random sampling, where individuals are the sampling units. Typically, entire households, schools or General Practices are sampled. If the study in question is a randomised controlled trial all individuals in a particular unit will be given the same treatment or intervention. This is done for practical and ethical reasons. For example, in a study on the relationship of vitamin C supplements and incidence of flu in school children, parents of children not receiving vitamin C may find it unacceptable that other children in the same school are receiving a potentially beneficial intervention. They may even decide to give vitamin C tablets to their children, which would result in serious contamination of the control group. When calculating the sample size required in a study where clusters are the sampling units, it is necessary to make adjustments to the formulae commonly used. The effective sample size of the study will be less than the total number of individuals in the study.

### coding

Assigning codes (numbers, letters or other symbols) to each category for each variable.

**coefficient of variation (CV)**^{1}

A measure of the repeatability of a measurement method – particularly for intra-rater reliability. The standard deviation of a measurement divided by the mean.

### cohort^{2}

A group of subjects sharing some common characteristic, which is followed up for a specified period of time.

### cohort studies

See cohort study.

### cohort study

A follow-up or longitudinal (done over a period of time) study. An analytical observational study which investigates the relationship between an exposure or risk factor and one or more outcomes, by following-up a cohort over time. Aims (a) to identify a group of subjects who have been exposed to a suspected risk factor, (b) to identify a second group of subjects who have not been exposed to the risk factor and (c) to compare the rates of a specified outcome (e.g. incidence of disease, cause-specific mortality) in the two groups. A cohort study is sometimes called a prospective study but they can be retrospective (identified from past records and followed forward from that time up to the present: a "historical cohort study") or prospective (assembled in the present and followed into the future: a "concurrent cohort study"). The advantage of the former is a reduction in cost and time needed to carry out the study, although it can be very difficult to obtain detailed measures of exposure in the past; prospective cohort studies can collect exposure information as time goes by, but are very expensive and time consuming because of the delay between exposure and outcome. There is little practical difference, in the context of health services research, between a prospective cohort study and a non-randomised trial. (Cohort studies sometimes do not have a completely unexposed group and, instead, compare two or more groups with different levels of exposure.) Random allocation is not used, so matching or statistical adjustment must be used to ensure that the comparison groups are as similar as possible. Loss to follow-up and surveillance bias (information bias) are two common sources of bias in this type of study. (synonyms: follow-up, incidence, longitudinal, prospective study).

### co intervention

In a randomised controlled trial, the application of additional diagnostic or therapeutic procedures to members of either or both the experimental and the control groups.

### concealment of allocation^{1}

The process used to prevent foreknowledge of group assignment in a randomised controlled trial, which should be seen as distinct from blinding. The allocation process should be impervious to any influence by the individual making the allocation by having the randomisation process administered by someone who is not responsible for recruiting participants; for example, a hospital pharmacy, or a central office. Using methods of assignment such as date of birth and case record numbers (see quasi random allocation) are open to manipulation. Adequate methods of allocation concealment include: centralized randomisation schemes; randomisation schemes controlled by a pharmacy; numbered or coded containers in which capsules from identical-looking, numbered bottles are administered sequentially; on-site computer systems, where allocations are in a locked unreadable file; and sequentially numbered opaque, sealed envelopes.

### conditional logistic regression^{2}

A regression method for paired binary data. A common application of this type of logistic regression is the analysis of case-control studies where cases and controls have been individually matched. An example would be a study where the relationship between use of oral contraceptives and breast cancer is investigated in women aged between 20 and 60. Women with breast cancer are individually matched for age with a control (since the risk of breast cancer increases with age), which results in paired non-independent data.

### confidence interval (CI)

A range of values within which the true population summary measure is believed to be found with a given level of confidence usually 95 percent. It is where the truth probably lies 95 percent of the time. The summary measure can be a point estimate such as a mean proportion a difference between means or proportions, an odds ratio, regression coefficients, correlation coefficients, relative risks etc., also termed point estimates. A single estimate is likely to be inaccurate so the 95 percent CI provides additional information about the population value. CIs are very useful in determining the clinical significance of a given result. The rationale for calculating CIs is the uncertainty which is always associated with using samples to obtain information about the populations these samples originate from. A single estimate is likely to be inaccurate so the 95% CI provides additional information about the population value: we can be 95% confident the population value lies within its limits. Different levels of confidence can be placed on a CI, so 90% or 99% CIs can also be calculated. A 99% CI will be wider than the corresponding 95% CI. The width of a CI depends also on the sample size larger samples providing narrower CIs. CIs are extremely useful in assessing the clinical significance of a given result. The lower and/or upper boundaries of a CI may cast some doubts on rather promising point estimates, if, on examination, these boundaries fail to show clinical significance.

### confirmability^{3}

In qualitative research: involves recognising bias in a study and attempting to minimise it. To do this, qualitative researchers need to recognise their biases, and seek to fault their own assumptions or 'pet theories' about what they are researching. Bringing in colleagues to offer alternative readings, and feeding back results of an analysis to the original respondents can help to reduce these biases.

### confounding^{3}

When groups being compared in a study are different in relation to important prognostic factors other than the factor under investigation. The result of such a study is likely to be biased. Certain study designs are more prone to confounding, in particular the case control study. Randomised trials eliminate confounding in principal by making groups comparable in relation to known and unknown prognostic factors. A confounding factor is some aspect of a subject which is associated both with the outcome of interest and with the intervention (or exposure) of interest. For example, if older subjects are less likely to receive a new treatment, and are also more likely to experience the outcome of interest (e.g. admission to hospital), then any observed relationship between the intervention and the likelihood of experiencing the outcome would be confounded by age. If the age of subjects is known, then this problem can be sorted out at the stage of data analysis - although you can never be sure that you have taken account of all confounding factors. Confounding is very similar to selection bias, in many circumstances (see the definition of bias).

### consensus methods^{3}

In qualitative research: provide a way of synthesising information and dealing with conflicting evidence, with the aim of determining extent of agreement from within a selected group.

### constant comparison^{3}

In qualitative research: this is a grounded theory method in which the researcher simultaneously codes and analyses data in order to develop concepts. By continually comparing specific incidents in the data, the researcher refines these concepts, identifies their properties, explores their relationships to one another and integrates them into a coherent theory. See * Analytic induction.*

### construct validity

That any measure developed seems to be in accord with other measures in the same area of study.

### consumer

Someone who uses, is affected by, or who is entitled or compelled to use a health related service. (healthcare consumer)

### consumer advocate or representative

Consumer who is actively involved with other consumers and able to represent the perspectives and concerns of that broader group of people. A consumer advocate or representative should be linked with other consumers, accountable to them, and should not have a conflict of interest in that role.

### contamination

Contamination occurs when the intervention treatment (or aspects of it) are given inappropriately to control subjects. This is most likely to happen when the intervention involves a change in practice, e.g. as the result of a training programme, and where healthcare is delivered by teams of practitioners rather than individuals. Practitioners who learn a new skill may find it difficult to 'switch off' the skill for some subjects and to use it when treating others; practitioners may exchange their knowledge at social occasions. Contamination could also be the inadvertent failure to apply the intervention to people assigned to the intervention group.

### content analysis^{3}

In qualitative research: a procedure for organising narrative, qualitative data into emerging themes and concepts.

### contingency table^{1, 2}

A tabular cross-classification of data such that subcategories of one characteristic are indicated horizontally (in rows) and subcategories of another characteristic are indicated vertically (in columns). Tests of association between the characteristics can be readily applied. The simplest contingency table is the fourfold, or 2x2 table, which is used in clinical trials to compare dichotomous outcomes, such as death, for an intervention and control group or two intervention groups. These are used to summarise the association between two categorical variables. The rows represent the different levels of one of the variables and the columns the different levels of the other variable. The cells contain the observed frequencies resulting from the cross-tabulation of the two variables. These cells are mutually exclusive in which each subject in a study can be in one and only one of the cells. Totals must always be presented. Most commonly the Chi-squared test and related methods are appropriate to analyse contingency tables. In all cases degrees of freedom are calculated as (r-1)x(c-1), where r is the number of rows and c is the number of columns. When the two variables are ordinal (e.g. level of pain and cancer stage), rank correlation methods are indicated. When one variable is ordinal and the other is binary (e.g. degree of smoking and heart disease) a trend test should be used. The Kruskall-Wallis test is indicated when one variable is ordinal and the other has three or more categories.

### continuous variable^{2}

A quantitative variable with a potentially infinite number of possible values along a continuum. (e.g. height, weight and blood pressure).

### control

See controls

### controlled clinical trial^{1}

Refers to a study that compares one or more intervention groups to one or more comparison (control) groups. Whilst not all controlled studies are randomised, all randomised trials are controlled.

### controls

Controls are subjects used in comparative studies to act as the standard against which new treatments or interventions are to be tested (as in randomised control trials), or against which the risks connected with a particular exposure are evaluated (as in case-control studies). Controls can be concurrent or historical, depending on whether these subjects are investigated at the same time/place as those not acting as controls. Crossover trials use just a single group of participants where each individual acts as her/his own control. Any analytical study should include at least two groups, the exposed or intervention group (who have the suspected risk factor, or who are given the new treatment) and a control group who are given a different control treatment. The control group is a comparison group, so that it is possible to estimate the additional risk/effectiveness of an exposure/new treatment. Think carefully about what is an appropriate control group; in many trials, the control will 'standard care' rather than nothing, because it is important to know whether a new treatment is better than the current one, rather than whether the treatment is better than nothing. Sometimes a study will include more than one control group. 1. In clinical trials comparing two or more interventions, a control is a person in the comparison group that receives a placebo, no intervention, usual care or another form of care. 2. In case-control studies a control is a person in the comparison group without the disease or outcome of interest. 3. In statistics control means to adjust for or take into account extraneous influences or observations. 4. Control can also mean programs aimed at reducing or eliminating the disease when applied to communicable (infectious) diseases.

### core category^{2}

In qualitative research: the central category that is used to integrate all the categories identified in grounded theory research.

### correlation^{2}

Linear association between two quantitative or ordinal variables. It can be assessed by parametric or nonparametric methods. Both methods involve the computation of correlation coefficients Pearson's and rank (Spearman's or Kendall's) correlation coefficients respectively.

### correlation coefficient^{2}

Measure of association between quantitative or ordinal variables. Can be obtained by parametric (Pearson's) or non-parametric methods (rank correlation). Values taken can range from -1 (perfect negative association) to + 1 (perfect positive association), with 0 representing lack of linear association (Note: for rank correlation, it is linear association between the ranks given to the data values in each variable).

### cost benefit analysis^{1}

An economic analysis that converts effects into the same monetary terms as the costs and compares them.

### cost effectiveness

A comparison of the difference in costs and effectiveness between two or more treatments (e.g. control and intervention). Note that the units in which effectiveness (outcome) is measured must be the same for both treatments. For each treatment, a cost per effectiveness unit (e.g. year of life gained) can be calculated; the cost for different treatments for the same condition can then be compared directly. Because it can often be difficult to obtain precise costs, a cost-effectiveness study should also include a sensitivity analysis. Cost-utility analysis An economic analysis that converts effects into personal preferences (or utilities) and describes how much it costs for some additional quality gain (e.g. cost per additional quality-adjusted life-year).

### covering letter

A letter accompanying a self-completion questionnaire inviting participation and giving explanations.

### Cox regression^{2}

A regression method for modelling survival times. Also called proportional hazards model since it assumes the ratio of the risks (or hazard ratio) of the event (e.g. death) at any particular time, between any two groups being compared, to be constant. The outcome variable is whether or not the event of interest has occurred, and if so, after what period of time, if not, how long was the subject followed for. The model predicts the hazard or risk of the event in question (commonly death) at any given time. The predictor variables are prognostic factors as with any other type of regression model. [In fact, the method specifies an additive model for the log of the hazard. This implies working on the log scale and then exponentiating the regression results so that hazards (on the original scale) can be obtained.] Cox regression can be considered a 'semi-parametric' method, since no other assumptions namely about the distribution of survival times, are made.

### credibility^{3}

In qualitative research: this is concerned with the accuracy of description in a piece of qualitative research and is equivalent to internal validity. The study should state the precise parameters of the study - who was studied, where and when, and by what methods.

### critical appraisal^{1}

The process of reading, assessing and interpreting evidence, by systematically considering its validity, results and relevance to your own work or situation.

### Cronbach's alpha^{2}

A measure of the reliability of a composite rating scale, made up of several items or variables. Psychological and mental health tests are common examples of this type of scales.

### cross overs^{2}

In the context of clinical trials. It refers to subjects who do not take or receive the treatment they were allocated to, but rather the alternative treatment being compared in the trial. Intention-to-treat analysis is commonly used in this event to minimise the resulting bias.

### crossover design

A type of clinical trial comparing two or more interventions in which the participants upon completion of the course of one treatment are switched to another. For example, for a comparison of treatments A and B half the participants are randomly allocated to receive them in the order A, B and half to receive them in the order B, A. A problem with this design is that the effects of the first treatment may carry over into the period when the second is given. With this study design, all patients are given the two or more treatments under investigation, such that each patient acts as her/his own control. As a result, the sample size required is smaller than with a parallel design given the lesser degree of variability within the same subjects. Randomisation is used to assign the order in which the treatments are to be administered, mainly to avoid period effects. Main limitations of this type of design are the fact that it cannot be used with diseases that can be cured, with acute conditions or when treatment periods are too long as patients may be prone to dropout. There is also a potential for carryover effects resulting in treatment-period interaction. The latter should be given careful consideration in the planning stages of a trial so that it can be avoided by introducing appropriate washout periods. In the presence of a treatment-period interaction, data for the second period are usually discarded. The result is a parallel design trial that is likely not to be sufficiently large to ensure adequate power.

### cross-over trial^{1}

A type of clinical trial comparing two or more interventions in which the participants, upon completion of the course of one treatment are switched to another. For example, for a comparison of treatments A and B, half the participants are randomly allocated to receive them in the order A, B and half to receive them in the order B, A. A problem with this design is that the effects of the first treatment may carry over into the period when the second is given.

### cross sectional study^{2}

A type of observational study. As opposed to a follow-up study, subjects are observed on just one occasion. It is thus very difficult to infer a cause-effect relationship from such a study design. Descriptive cross-sectional studies are usually referred to as surveys. Common problems with this type of study are the choice of study sample, random sampling being the only way of ensuring a representative sample, non-response and volunteer bias all leading to selection bias. A cross-sectional design gives estimates of prevalence rather than incidence.

### cross tabulation

A table showing the relationship between two or more variables. A contingency table is an example.

### crude estimates^{2}

As opposed to adjusted estimates, these estimates are obtained without taking confounding factors into account. If confounding is suspected, the estimates may be misleading when making comparisons between populations. For example, different populations often have different age structures. Thus, comparisons made between these populations (e.g. mortality from heart disease) will often be biased.

### cumulative hazard^{2}

In the context of survival analysis. It summarises the risk of an event over a specified period of time or follow-up period. The cumulative hazard or failure rate is calculated from the Kaplan-Meier estimate of cumulative survival [S(t)] using the following formula: H(t) = -logS(t)

### cumulative meta analysis^{1}

In cumulative meta-analysis studies are added one at a time in a specified order (e.g. according to date of publication or quality) and the results are summarised as each new study is added. In a graph of a cumulative meta-analysis each horizontal line represents the summary of the results as each study is added, rather than the results of a single study.

### data saturation^{3}

In qualitative research: this is the point at which data collection can cease. This point of closure is arrived at when the information that is being shared with the researcher becomes repetitive and contains no new ideas, so the researcher can be reasonably confident that the inclusion of additional participants is unlikely to generate any new ideas. (Sometimes simply referred to as saturation).

### database^{1}

A collection of organised information, usually held on a computer. In some ways a database is similar to a filing system, but with important advantages: the information can be revised and kept up to date easily, and the computer can retrieve information from it very quickly. Electronic databases such as MEDLINE, EMBASE and the CDSR can be distributed on disk, CD-ROM or via the Internet.

### decision analysis^{1}

A technique used to aid decision-making under conditions of uncertainty by systematically representing and examining all of the relevant information for a decision and the uncertainty around that information. The available choices are plotted on a decision tree. At each branch, or decision node, the probabilities of each outcome that can be predicted are estimated. The relative worth or preferences of decision-makers for the various possible outcomes for a decision can also be estimated and incorporated in a decision analysis.

### degrees of freedom (d.f.)^{1, 2}

The number of independent comparisons that can be made between the members of a sample. It refers to the number of independent contributions to a sampling distribution (such as chi-square distribution). In a contingency table it is one less than the number of row categories multiplied by one less than the number of column categories; e.g. a 2 x 2 table comparing two groups for a dichotomous outcome, such as death, has one degree of freedom. In the analysis of quantitative data, test results will depend on the sample size(s) of the study group(s). The larger the sample size(s) the greater the power to prove a given result as statistically significant. For categorical data test results depend on the size of contingency tables used to summarise the association between two variables. Tables with excessive numbers of categories may have reduced power, compared to smaller tables (fewer observations in each cell). Sample and table sizes are expressed in terms of degrees of freedom. The way these are calculated depends on the statistical test in question.

### Delphi technique^{3}

A method for obtaining expert or consensus opinion on a particular topic, by using multiple 'rounds' or waves of questions whereby the results from the previous rounds are continually fed back to the same respondents to bring about a group consensus.

### dependability^{3}

In qualitative research: this is equivalent to reliability in quantitative work. Whereas positivist research has to assume an unchanging world, so that if an identical study were to be performed the assumption would be that the same findings would emerge, the naturalistic paradigm acknowledges that the world, especially the social world is constantly changing. The researcher might demonstrate the dependability of a study by discussing how and why issues change.

### descriptive research methods

These include qualitative and quantitative methods. They are mainly used to describe the distribution of characteristics in a population, and they are not designed to test a hypothesis. These methods are often used when there is limited previous research on the topic of interest, and they may generate new hypotheses for subsequent investigation by analytical methods. Qualitative methods include focus groups, ethnographic and anthropological studies; quantitative methods include ecological and cross-sectional studies. Case studies or series can be either qualitative or quantitative.

### detection bias^{1}

Systematic differences between comparison groups in how outcomes are ascertained, diagnosed or verified. (synonym: ascertainment bias). See selection bias.

### detection rate^{2}

See sensitivity.

### deviance

A statistic used to assess the goodness of fit of models fitted by the method of maximum likelihood (in fact, the badness of fit, since the greater the deviance the worse the fit of a model). Models that have many predictor variables may be simplified, provided important information is not lost in the process. This is tested by the difference in deviance between any two models being compared (likelihood ratio test): the smaller the difference in deviance the smaller the impact of the variables removed.

### diagnostic accuracy

The extent to which a particular diagnostic test correctly classifies "suspect" patients into diagnostic categories. It cannot be easily summarised in a single figure, but is the combination of a number of measures described here.

### diagnostic test

A test that classifies "suspect" patients into diagnostic categories.

### dichotomous data

Observations with two possible categories such as dead/alive, smoker/non-smoker, present/not present. (synonym: binary data)

### discrete variable^{2}

A quantitative variable which unlike continuous variables, can only take certain values, usually integers, whole numbers for example the number of children.

### discriminant analysis^{2}

A multivariate method of classifying subjects into known groups, on the basis of their profile of measurements (e.g. symptoms). It is a form of computerised diagnosis, also known as supervised pattern recognition (in artificial intelligence language). Linear discriminant analysis and logistic regression are the methods commonly used for this purpose. In addition to finding a discriminant rule or model it is important to assess its performance, i.e., its misclassification rate (or proportion of subjects incorrectly classified). Ideally, an independent sample of subjects should be used to estimate this error rate. The search for the best subset of predictor or explanatory variables can be done by a stepwise procedure.

### disproportionate stratified sampling

Stratified sampling where the number of units sampled within each stratum are not proportional to the size of the stratum.

### distributions^{2}

Probability distributions are used to calculate the theoretical probability of different values occurring under various assumed distributions of known theoretical form: Normal, Binomial, Poisson. These distributions are defined mathematically by one or more parameters. Parametric statistical methods rely quite strongly on the assumption that the data have an empirical distribution that approximates the theoretical ones. For quantitative variables, which are frequently required to have a Normal distribution, this can be checked by looking at a histogram depicting the shape of its relative frequency distribution, or at a Normal plot. When the required assumptions cannot be met, it is possible to resort to data transformations or to non parametric methods.

### double blind^{ 1}

Neither the participants in a trial nor the investigators (outcome assessors) are aware of which intervention the participants are given. The purpose of blinding the participants (recipients and providers of care) is to prevent performance bias. The purpose of blinding the investigators (outcome assessors, who might also be the care providers) is to protect against detection bias. See also blinding, single blind, triple blind, concealment of allocation. (synonym: double masked).

### double-barrelled questions

In a questionnaire: where two issues are raised in a single question.

### dropouts

See withdrawals.

### dummy variable^{2}

In the context of regression, dummy or indicator variables are created whenever a categorical variable needs to be incorporated in a model. If this step is not taken, a categorical variable such as blood group, whose levels are coded with labels say, from 1 to 4 (0, A, B, AB), will be interpreted as a quantitative variable, and a numeric meaning will be given to the labels. In the above example four new dummy variables are created.

** ecological study**

Similar to a cross-sectional study, but the unit of observation is a population (e.g. of a country, town, Health Authority, GP practice, or of the same geographical area at different times), rather than an individual person. This type of study is even easier to carry out than a cross-sectional study because it is often possible to use data at a population level which has been collected routinely for other purposes (e.g. census information, general health survey, Trade and Industry statistics). Its major additional disadvantage is that routine data are unlikely to exist for important confounding factors, so confounding cannot be excluded as an alternative explanation for the results.

### economic analysis^{1}

Comparison of the costs and outcomes of alternative health care interventions. See cost-benefit analysis, cost-effectiveness analysis and cost-utility analysis. (synonym: economic evaluation)

### economic evaluation

### effect size^{1}

1. A generic term for the estimate of effect for a study. 2. A dimensionless measure of effect that is typically used for continuous data when different scales (e.g. for measuring pain) are used to measure an outcome and is usually defined as the difference in means between the intervention and control groups divided by the standard deviation of the control or both groups. See standardised mean difference.

### effectiveness^{1}

The extent to which a specific intervention, when used under ordinary circumstances, does what it is intended to do. Clinical trials that assess effectiveness are sometimes called management trials. See also intention to treat analysis. See clinical effectiveness.

### efficacy^{1}

The extent to which an intervention produces a beneficial result under ideal conditions. Clinical trials that assess efficacy are sometimes called explanatory trials and are restricted to participants who fully co-operate.

### emic and etic^{3}

In qualitative research: these terms refer to the type of information being reported and written into an ethnography, whether the researcher reports the views of the informants (emic) or his or her own personal views (etic).

### empirical^{1}

Empirical results are based on experience (or observation) rather than on reasoning alone.

### epidemiology^{1}

The quantitative study of the distribution and determinants of health-related states and events in populations, and the application of this study to the control of health problems.

**epistemology**^{3}

^{3}

is the theory of knowledge. It includes methods of scientific procedure that lead to the acquisition of sociological knowledge.

### estimate of effect^{1}

In studies of the effects of healthcare, the observed relationship between an intervention and an outcome expressed as, for example, a number needed to treat, odds ratio, risk difference, relative risk, standardised mean difference, or weighted mean difference. (synonym: treatment effect)

### estimates^{2}

Summary measures calculated from samples. Estimates can be means proportions regression coefficients relative risks, etc. These can be more precisely termed 'point estimates'. Estimates are used to make inferences about target populations whose 'true' values or parameters are not known. Estimates should be quoted with their corresponding SEs (standard errors) usually translated into confidence intervals ('interval estimates') for ease of interpretation.

### ethnographic

An anthropological study. These are qualitative research methods, where the researcher aims to become 'immersed' in or part of the population being studied, so that he/she can develop a detailed understanding of the values and beliefs held by members of the population. This type of method might be appropriate to generate hypotheses for health promotion interventions or, after a quantitative trial, to explore why such an intervention worked or why it worked in some groups and not others. For example, a researcher might define a population with a high risk behaviour with respect to their health and, by being constantly with the group, try to understand why the population is so resistant to health promotion information.

### ethnography^{3}

(Greek: "writing the culture") a type of qualitative research that is used to study other cultures and is intended to provide scientific description of phenomena within their specific natural contexts or settings that are native to the phenomena. First developed by anthropologists.

**ethnomethodology**^{3}

^{3}

Is an arm of sociology first developed in the 1940s. It is based on a critique of how the social world is constructed. In ethnomethodology nothing is taken for granted and the minutiae of daily life is broken down and examined in detail. Conversational analysis is also part of ethnomethodology study.

### event rate^{1}

The proportion of participants in a group in whom an event is observed. Thus, if out of 100 patients the event (e.g. a stroke) is observed in 32, the event rate is 0.32.

### expected frequencies^{2}

As opposed to observed frequencies. In a contingency table they are the numbers or frequencies expected in each cell under the assumption that the Null Hypothesis is true, i.e., no relationship between exposure and outcome.

### experimental bias

A systematic distortion of the findings introduced by the action of a researcher.

### exposure

This is a term often used by epidemiologists to describe the suspect risk factor of key interest in a study designed to investigate whether or not there is an association between the risk factor and some outcome. Exposure can be defined as any of a subject's attributes or any agent with which he or she may come in contact that may be relevant to his or her health. For example, use of oral contraceptives would be considered to be an exposure in a study investigating whether there is an association between contraceptive use and some outcome, e.g. thrombosis or breast cancer. Exposures can be agents that: cause physiological effects, cause or protect from disease, confound the association, modify the effect of other agents, or determine outcome e.g. screening or treatment. Exposures may act: cumulatively over lifetime, immediately before disease onset, at critical time periods, or only if they are above a certain level or threshold

### external validity

The degree to which the results of an observation hold true in other settings. See also validity. (synonyms: external validity, generalisability, relevance, transferability)

### face validity

In a survey: when a measure seems relevant and sensible to those completing it.

### factor analysis^{2}

A multivariate method where correlations between sets of observed measurements are analysed, with the view to estimate the number of different factors which explains these correlations. For example, correlations between components of a composite intelligence rating scale are inferred to arise from the fact that they (the su-scales or sets of measurements) are all measures of intelligence (the factor). An exploratory factor analysis looks at these correlations, assesses the number of factors which might need to be postulated to provide an explanation for the correlations, and decides what variables might be indicators of what factors. A confirmatory factor analysis assesses whether a set of correlations can be adequately explained by a factor model specified a priori.

### factorial design^{1}

Most trials only consider a single factor, where an intervention is compared with one or more alternatives, or a placebo. In a trial using a 2x2 factorial design, participants are allocated to one of four possible combinations. For example in a 2x2 factorial, RCT of nicotine replacement and counselling, participants would be allocated to: nicotine replacement alone, counselling alone, both, or neither. In this way it is possible to test the independent effect of each intervention on smoking cessation and the combined effect of (interaction between) the two interventions.

### factual question

An attempt to elicit information which can be said to be correct or not.

### field notes^{3}

In qualitative research: these are taken by researchers to record unstructured observations they make 'in the field' and their interpretation of those observations.

### filter question

A question designed to ensure that people answer questions appropriately and do not try to answer questions where their experience or knowledge is lacking.

### Fishers exact test^{2}

A statistical test for comparing proportions. Used as an alternative to the Chi-squared test whenever the assumption regarding expected frequencies is not met or the total sample size is too small (< 30). The test gives exact probabilities (P-values) under a special distribution (the hypergeometric distribution).

### fixed effects^{2}

As opposed to random effects. This term is in the context of met-analysis when results from individual studies are combined (producing a single estimate) by weighting individual results, usually according to study size. The combined estimate is then applied to any subject originating from any of the target populations covered by the individual studies. The assumption is that the underlying 'true' population values are the same in each of the studies. Tests of heterogeneity are used to decide on the choice of a random or a fixed effects model. In the context of analysis of variance (ANOVA) the term is used to identify factors all of whose categories may be identified (e.g. gender, ethnicity, blood group), in contrast to random effects factors (e.g. patient, observer). A fixed effect model is a statistical model that stipulates that the units under analysis (e.g. people in a trial or study in a meta-analysis) are the ones of interest, and thus constitute the entire population of units. Only within-study variation is taken to influence the uncertainty of results (as reflected in the confidence interval) of a meta-analysis using a fixed effect model. Variation between the estimates of effect from each study (heterogeneity) does not effect the confidence interval in a fixed effect model. See random effects model.

### focus group

This is a qualitative research method, where a researcher brings together a small number of subjects to discuss the topic of interest. The researcher usually 'chairs' the group, to ensure that a range of aspects of the topic are explored. The discussion is frequently recorded, then transcribed and subjected to content analysis (searching for key themes, for example on the basis of the frequency with which they occur).

### focus groups^{3}

A method of collecting qualitative data from a group of people (ideally 6-8 people). The group is a concentrated discussion, normally on one topic, or particular area of the participants experience, by focusing on one topic over the time period of the discussion, it aims to promote self exposure, to get beyond the public self.

### follow up period^{2}

The length of time a subject is kept under observation in a particular study. A distinction is sometimes made between the actual follow-up period and the accrual period in which patients are recruited to the study. In a trial comparing survival times it is particularly important to count follow-up from randomisation, and not from the actual time treatments are given, since this may lead to differential follow-up between the comparison groups, and thus to bias. For example, if patients are randomised to receive either medical (drugs) or surgical treatment for the management of unstable angina, patients having surgery may have to wait longer for their treatment. Thus, if follow-up time is counted from beginning of treatment, patients receiving the medical treatment are given a clear advantage.

### follow up study

A longitudinal, prospective study: collects the relevant information by following subjects over a period of time, thus allowing temporal relationships to be investigated.

### fourfold table^{1}

A contingency table with two rows and two columns used in clinical trials to compare dichotomous outcomes, such as death, for an intervention and control group or two intervention groups. (synonym: 2x2 table)

### framework analysis^{3}

An approach to qualitative analysis first developed by Ritchie and Spencer in 1994. It has been used extensively in applied policy research. It is based on thematic analysis and uses *a priori* concepts as well as emerging concepts.

### funnel plot^{1}

A graphical display of sample size plotted against effect size that can be used to investigate publication bias. When many studies have been located that estimate the same effect, the distribution of points should resemble a funnel shape with a widening in the spread of effect sizes as sample size decreases. A gap on one side of the wide part of the funnel indicates that some studies have not been published or located. Funnel plots are not presently available in the Cochrane software.

### Gaussian distribution

See Normal distribution.

### generalisability

The extent to which the findings of one study can be applied more widely - across different routine health-care settings (e.g. hospitals, GP practices, the community), across different patient groups (e.g. age, sex, co-morbidity), and geographic regions.

### geometric mean^{2}

The anti-log of a mean calculated from observations which have been transformed to a log scale. Quantitative variables which display a positive skew may sometimes have a lognormal distribution i.e., their logarithmic transform has a Normal distribution (e.g. serum triglycerides). Parametric methods of estimation and/or significance testing can be applied to the log values, and the results back transformed to their original scale.

### gold standard^{1, 2}

The method, procedure or measurement that is widely accepted as being the best available against which new interventions should be compared. It is particularly important in studies of the accuracy of diagnostic tests. For example,* handsearching* is sometimes used as the gold standard for identifying trials against which electronic searches of * databases* such as *MEDLINE* are compared. In the context of diagnostic tests, it refers to a reliable and valid (1) diagnostic tool which consistently gives the correct diagnosis. Gold standard tests are often invasive or expensive diagnostic methods, but can be used in studies to assess the performance (sensitivity, specificity) of simpler and/or cheaper methods. Gold standard = reference standard in diagnostic tests.

### goodness of fit

Refers to how well a theoretical distribution or a specified model fit a set of data. It is based on the comparison of observed and expected frequencies. Many statistical tests such as the Chi-squared test and the likelihood ratio test (deviance) are in fact goodness of fit tests. Common applications of goodness of fit tests are the assessment of Normality and the Hosmer and Lemeshow Chi-squared test used in the context of logistic regression models, to assess the predictive ability of a given model.

### grounded theory^{3}

In qualitative research: this is a technique for analysing qualitative data and generating concepts and theories inductively, using a constant comparative method. This approach was developed by Glaser and Strauss in 1967. Grounded theory involves hypothesising inductively from data using the subjects' own categories.

**Hawthorn Effect**^{3}

^{3}

Is the changes that occur in a subject's behaviour or attitude as a result of being included in the study and being placed under observation. The term derives from industrial psychological studies that were carried out at the Hawthorne plant of the Western Electric Corporation in Illinois in the 1920s and were reported by Mayo. He found that whatever experimental environmental conditions were tried out on the workers, productivity always went up. He realised that it was the effect of actually being under study that resulted in a change of behaviour and so increased productivity.

### hazard ratio^{2}

Measure of relative risk used in survival studies. It is calculated as: HR = (O1/E1)/( O2/E2) where: O1 is the observed number of subjects with the event in group 1; E1 is the expected number of subjects with the event in group 1, under the hypothesis (Null Hypothesis) that the two groups being compared experience the same event hazard, i.e., the overall risk applied to total number of subjects in this subgroup; O2 and E2 as above, for group 2. An HR of 1 suggests that the hazard or risk of the event is the same in the two groups being compared. An HR greater than 1 means that group 1 is more likely to experience the event. The converse is true for an HR less than 1.

### heterogeneity^{1}

In systematic reviews heterogeneity or lack of homogeneity, refers to variability or differences between studies in the estimates of effects (when the results or estimates from individual studies appear to have different magnitude if not different sign or direction). In the presence of marked heterogeneity, a single summary of these individual results should not be produced. Statistical tests of heterogeneity are used to assess whether the observed variability in study results (effect sizes) is greater than that expected to occur by chance. However, these tests have low statistical power and can be misleading. Heterogeneity is best assessed by using one's judgement. A distinction is sometimes made between "statistical heterogeneity" (differences in the reported effects), "methodological heterogeneity" (differences in study design) and "clinical heterogeneity" (differences between studies in key characteristics of the participants, interventions or outcome measures). See also homogeneity.

### hierarchy of evidence^{3}

Study designs are often grouped into an hierarchy according to their validity, or degree to which they are not susceptible to bias. The hierarchy indicates which studies should be given most weight in a synthesis. Usually well-designed randomised controlled trials are seen as being at the top of the hierarchy, whereas observational studies or expert opinion are seen as low down.

### historical control^{1}

Person or group for whom data were collected earlier than for the group being studied. Because of changes over time in risks, prognosis, healthcare, etc. there is a large risk of bias (in studies that use historical controls) due to systematic differences between the comparison groups.

### homogeneity^{1}

In systematic reviews homogeneity refers to the degree to which the results of studies included in a review are similar. "Clinical homogeneity" means that, in trials included in a review, the participants, interventions and outcome measures are similar or comparable. Studies are considered "statistically homogeneous" if their results vary no more than might be expected by the play of chance. See heterogeneity.

### homoscedasticity^{2}

Or equality of variances. This term is used in the context of t tests, ANOVA or regression analysis, to refer to the assumption of equal variances among groups being compared, or to the assumption that the variability of the outcome variable is about the same for all values of a predictor variable.

### Hosmer and Lemeshow statistic^{2}

A statistic which assesses the predictive ability of a logistic regression model. To perform this test, expected probabilities of a particular event are derived for each observation, using the model obtained. These probabilities are grouped into regular intervals (e.g. ({ 10%, 10%+, 20%+, .90%+) and a cx2 contingency table is produced with the columns representing the outcome (yes/no type) and the rows representing risk of event categories as indicated above. The cells of this table contain the observed frequencies for each cross-tabulation. A Chi-squared test on F2 degrees of freedom tests whether or not predicted values are close to observed values (r is the number of risk of event categories). Small P-values indicate poor predictive models.

### hypotheses

See hypothesis

### hypothesis

A formal statement of the research question of interest.

### incidence^{2}

As opposed to prevalence, is a measure of the number of new cases of a disease occurring during a specified period of time. It can be expressed as incidence rate or incidence risk. A measure of disease (or other outcome) frequency - the number of new cases of a disease (or some other event or outcome) occurring in a population in a defined period of time. Incidence is usually expressed as the number of new cases per unit of population (e.g. per 100,000 people) per unit time (usually one year).

### incidence rate^{2}

A measure of morbidity. It is the number of new cases of a disease during a specified period of time related to the person-time at risk during that period. Usually multiplied by 1000 and expressed per 1000 person-time at risk (or, if event is rare, per 10,000 or 100,000).

### incidence risk^{2}

Measure of morbidity. It is the number of new cases of a disease during a specified period of time related to the number of persons at risk of contracting the disease at the beginning of that period. Usually expressed as a percentage. For rare diseases the incidence rate and the incidence risk will be approximately the same (provided average length of follow-up is similar).

### independence^{2}

of observations. Two observations or measurements made on the same subject or unit (or on individually matched subjects) should not be counted as two independent observations, but the 'pairing' between them should be acknowledged and taken into account. Independence is an assumption required by many statistical tests. In simple cases non-independence is dealt with by using tests such as the paired t test or the McNemar's test but in more complex cases, when there are several dimensions to the pairing, summarv statistics or formal methods for repeated measurements should be employed.

### in-depth interview^{3}

In qualitative research: face to face conversation with the purpose of synthesising information and dealing with conflicting evidence, with the aim of determining extent of agreement within a selected group. It takes an unstructured, qualitative approach. The questions asked will be mostly open-ended and overall the degree of control over both the order and content of the interview is less than in a structured interview.

### individual patient data^{1}

In systematic reviews, this term refers to the availability of raw data for each study participant in each included trial, as opposed to aggregate data (summary data for the comparison groups in each study). Reviews using individual patient data require collaboration of the investigators who conducted the original trials, who must provide the necessary data.

### induction^{3}

The process of inference from the Particular to the General. Inductive reasoning begins with empirical observations, which form the basis of theory building. Qualitative research often takes an inductive approach. It is the process by which the truth of a proposition is made more probable by the accumulation of confirming evidence, a common pattern in sociological and scientific research.

### information bias^{2}

A general type of bias which can occur in all types study designs due to systematic errors in measuring exposures or responses (outcomes), which leads to misclassification problems. Information bias can be caused by inadequate questionnaires (tiresome, difficult or biased questions), observer or interviewer errors (lack of blinding; surveillance bias due to differential follow-up of exposed and non-exposed in a follow-up study), respondent errors (recall bias due to different memory of past exposures between cases and controls in a case-control study lack of blinding; fear or shame) and instrument errors (e.g. a diagnostic test with poor performance).

### intention to treat analysis^{2}

in the context of RCTs. Patients are expected to receive the treatments to which they were allocated, unless they dropout or are withdrawn from the trial. Some patients ('crossovers') receive treatments other than the ones to which they were randomised. To minimise the bias arising from these situations patients should be analysed in the groups to which they were randomised. Including these patients in the treatment group they ended up joining, or ignoring them altogether may result in severe bias, and may even lead to an apparent reversal of treatment effects. An intention-to-treat analysis is one in which all the participants in a trial are analysed according to the intervention to which they were allocated, whether they received it or not. Intention-to-treat analyses are favoured in assessments of effectiveness as they mirror the non-compliance and treatment changes that are likely to occur when the intervention is used in practice, and because of the risk of attrition bias when participants are excluded from the analysis.

### interaction^{2}

An interaction between two or more factors or variables is said to exist if the effect of one variable is not constant across levels of the other. For example, smoking and obesity are risk factors for several diseases. A common scenario is for the effect of one of them, say smoking, to be greater among obese than non-obese people. Thus, adding to the independent effects of each of these two risk factors, there is a 'penalty' for being both a smoker and overweight, the end effect being greater than the sum of each effect. In this situation the two risk factors have a synergistic effect. In other situations risk factors can be antagonistic, their simultaneous presence resulting in an end effect which is smaller than the sum of the independent effects. (In regression analysis, a model with two main effects which interact with each other can be written as: y = a + B1X1 + B2X2 + B3X1X2 where B3 represents the regression coefficient for the interaction term. This will be positive if the interaction is synergistic or negative if it is antagonistic. See multiple regression for an explanation of the model.) Intercept in the context of regression models. The intercept is a constant value, specific to any given model, which represents the estimated value of the outcome variable when the predictor(s) is equal to zero.

### interim analyses^{2}

In the context of clinical trials. Interim analyses are carried out before the end of the study period, in order to assess whether the accumulating data are starting to demonstrate benefit of one treatment over the other with sufficient certainty This can avoid having extra patients randomised to an inferior treatment. One problem with interim analyses is the increased risk of false positive findings due to multiple significance testing. Sequential designs - group or continuous - are used to deal with this problem. In the case of group sequential designs, and depending on the number of interim analyses planned, nominal significance levels (constant or varying) are specified so that the overall chance of a - I error is kept at an acceptable level. Interim analyses raise many problems, and should always be carefully planned before commencement of a study. SEE POCOCK (1983) FOR EXAMPLES AND DISCUSSION.

### internal validity

See validity.

### interquartile range (IQR)^{2}

A measure of the variability of a set of measurements. It represents the interval delimited by the 25th and the 75th percentiles (also called lower and upper quartiles), and comprises 50% of the observations in a dataset. Used to describe the data when the standard deviation is not appropriate. The IQR is a robust measure in that it is not influenced by extreme observations.

### interval variable^{2}

A quantitative variable which does not possess a true zero, and which allows negative values. For these variables, unlike for ratio variables the ratio between two values has a different meaning depending on which scale measurements are made. A well known example of an interval variable is temperature measured in degrees Fahrenheit or degrees Celsius. A 10 percent increase in temperature from say, 50F to 55F does not represent a 10 percent increase on the Celsius scale: it represents a 28 percent increase from 10C to 12.8C.

### intervention study

See Clinical trial.

### intraclass correlation coefficient (ICQ)^{2}

A measure of reliability or agreement for quantitative measurements. It is used when replicate measurements have no time sequence (e.g. two white blood cell counts made on the same blood sample). The ICC is calculated using a similar but modified procedure to that used to calculate the Pearson's correlation coefficient. Like the latter, the ICC has an ideal value of 1. The ICC is more appropriate than the Pearson's correlation coefficient for assessing agreement. When the measurement in question can take only two values or categories, the ICC is equivalent to the kappa statistic.

### jackknifing^{2}

A method of validating or assessing the fit of a model using the same sample which was used to derive the model, as opposed to using an independent sample. In assessing the fit of a model, residuals are analysed in a number of ways. To use the residuals from a given model to assess the goodness of fit of the same model, leads to overoptimistic results. Thus, each residual is calculated from a model which includes all but its corresponding observation. Jackknifed residuals are sometimes called studentised residuals.

### Kaplan Meier method^{2}

A method of determining survival by calculating survival probabilities at the exact points in time where an event of interest has occurred. This information can be used to construct a survival curve, in which the probability of survival remains the same between events, only dropping to coincide with the occurrence of a new event, thus giving the appearance of 'steps'. Censored observations should ideally be marked on the curve at the times at which they occur. The number of subjects still at risk can be shown at regular time intervals. The method can be used to calculate an estimate of cumulative survival, which is used to compute the cumulative hazard rate. The survival curves of two separate groups can be formally compared using the log rank test. The Kaplan-Meier method differs from the life table method in which the 'time' variable is grouped.

### kappa statistic (*K*)^{2}

A measure of agreement for categorical variables. It can be used to assess the extent of agreement between two (or more) raters, or to assess the agreement between two alternative classification or diagnostic methods. As with reliability(R) K measures chance-corrected proportional agreement, i.e., the proportion of agreement over and above that which might be expected by chance alone: K = (observed agreement - chance agreement)/( 1 - chance agreement) = 1 -(observed disagreement)/(chance -expected disagreement) Chance agreement is calculated using the method to calculate expected frequencies for contingency tables. The expected frequencies cells denoting agreement can then be added up and divided by the total number of observations to give the proportion of agreement which is attributed to chance. K has a maximum of one when there is perfect agreement; zero represents agreement no better than by chance alone, and negative values, agreement worse than expected by chance. With ordinal variables the weighted kappa statistic can be calculated (Altman, 1991). A common and wrong practice is to test for an association when measuring agreement. This two concepts are not the same, so methods such as the Chi-squared test or rank correlation are not appropriate. Kappa is dependent on the proportion of subjects in each category and also the bias between raters (or methods), i.e., the fact that different raters may have a different assessment of the frequency of occurrence of the condition or feature in question. Thus, it is difficult to give values of K which reflect poor, moderate or good agreement.

### Kendall's tau^{2}

A non-parametric measure of association between quantitative or ordinal variables. Based on ranks (like Spearman's rho), it is particularly appropriate for small sample sizes.

### key informants^{3}

In qualitative research: these are the individuals with whom the researcher begins in data collection because they are well informed, are accessible and can provide leads about other information.

### Kruskal Wallis test^{2}

A nonparametric statistical test used to compare two or more groups when the assumptions for analysis of variance (ANOVA) cannot be met. It is an extension to the Mann-Whitney U test which applies when there are only two groups.

### least squares^{2}

Frequently used in regression analysis to find the line of best fit, i.e., the line (or model) which best describes the relationship between a quantitative outcome and one or more predictor variables. The method seeks to minimise the sum of squared residuals (i.e., vertical distances from each observation to the regression line). These are used to assess the goodness of fit of regression models.

### life expectancy^{2}

Or average length of survival from beginning of follow-up. Calculated using a life table. Life expectancy = 0.5x Σ(number of time units in interval x cumulative chance of survival).

### life table^{2}

A table in which the survival (or failure) experience of a group of people or cohort over a follow-up period is recorded. The cumulative chance of survival at the various time intervals can then be used to construct a survival curve. It can also be used to calculate life expectancy.

### likelihood ratio (LR)^{2}

In the context of diagnostic tests. The layout of the results of a diagnostic test is in the form of a contingency table with tests result (+ or -) as rows and disease (+ or -) as columns. Several quantities can be estimated from such a table: sensitivity, specificity and the test predictive values. The likelihood ratio expresses how likely it is to find a positive test result in a patient with the disease in question, in comparison with the likelihood of finding the positive result in a patient without the condition. A positive LR expresses the probability of finding a positive test result in a patient with the disease in question, in comparison with the probability of fading a positive result in a patient without the condition. A negative LR is the likelihood of finding a negative test in patients with the condition relative to the likelihood of the some result in patients without the condition. Likelihood ratios do not have the drawbacks of the other quantities mentioned above: they are not affected by changes in the prevalence of disease and they can be used when the test results are grouped into more than two categories. Another desirable property is the fact that they can be converted into the post-test Probability of disease by knowledge of the Pre-test probability of disease.

### likelihood ratio test^{2}

A significance test used in the context of logistic and Poisson regression. Often used to assess the statistical significance of one or more predictor variables in a model. The deviance is the statistic used.

### Likert scale

A rating scale consisting of usually 5 points where respondents indicate their degree of agreement with an attitude statement presented to them.

### literature review (of search).

Comprehensive search of published (and, if possible, unpublished) evidence about the subject of interest to you. You may find that evidence about your research question already exists, and you can then critically appraise the evidence to decide whether further research is justified - this avoids re-inventing the wheel! Knowing the literature about your subject area also helps you to put your own research in a broader context.

### log odds ratio

The (natural) log of the odds ratio. It is used in statistical calculations and in graphical displays of odds ratios in systematic reviews.

### log rank test^{2}

A significance test to compare the survival experience of two or more distinct groups, as expressed by their survival curves. It is a special application of the Mantel Haenszel Chi-squared test Chi-squared test where an overall comparison of the groups is obtained by summarising the significance of the differences in survival in each of the time intervals which form the follow-up period thus producing a single test statistic. The number of degrees of freedom for the test is the number of groups minus 1. The test can be adjusted for confounders but, in this situation, regression methods for survival data (Cox regression) may be preferable. A log rank test for trend can also be performed.

### logistic model^{1}

A statistical model of an individual's risk (probability of disease or some other outcome) as a function of a risk factor or intervention. This model has attractive statistical features and is widely used as a regression model for dichotomous outcomes. In meta-analysis (or meta-regression) the logistic model can be used to explore the relationship between study characteristics and study results.

### logistic regression^{2}

A regression method for modelling proportions i.e. categorical outcomes. The technique is especially useful when dealing with confounding or when assessing interactions with the advantage that continuous predictor variables can also be included in the model. The outcome variable in logistic regression is a binary variable (yes/no; alive/dead). The predicted outcome however, is not a binary variable or a proportion, but the logit transformation of the latter (i.e., the natural logarithm of the odds). This prevents models from predicting impossible values for a proportion, i.e., outside the range 0 to 1. Results from these analyses are frequently presented as odds ratios. If data are individually matched conditional logistic regression should be used, as an extension to the McNemar's test for paired (non independent proportions. Polytomous logistic regression and ordered logistic regression are used for nominal and ordinal outcomes. In systematic reviews logistic regression can be used to explore the relationship between key characteristics of the included studies and the results (observed effects) for each study.

### lognormal^{2}

A positively skewed distribution whose log values display a Normal distribution.

### Mann Whitney U test^{2}

A significance test for comparing the distribution of a given variable between two groups. The test is a non-parametric alternative to the independent samples t test. It is used when the data are ordinal or when the requirement for Normality is not met. For paired data the Wilcoxon matched pairs signed rank test should be used. These tests are based on ranks (ordering of the data) and not on the actual values. The Wilcoxon rank sum T test for independent samples is an equivalent to the Mann Whitney U test.

### MANOVA^{2}

A multivariate equivalent to analysis of variance (ANOVA). Commonly used in psychological research. It is used to test for group differences in profiles of measurements as opposed to the use of ANOVA to test for group differences in single measurements. MANOVA provides the significance test for linear discriminant analysis.

### Mantel Haenszel Chi^{2 }test^{ 2}

A significance test for comparing proportions or odds in the presence of confounding factors. For example, we may be interested in the risk (proportion) of cervical cancer in women taking the contraceptive pill for 10 years or more, compared to those who have taken it for less than 10 years. Age is related to the risk of cervical cancer. Also, older women are more likely to have been on the pill for longer. It is therefore necessary to separate the effects of the contraceptive pill on cervical cancer, from the effects of age. After stratification by categories of the confounding variable - age, results in each stratum are pooled together to produce a single summary test across all strata. The number of degrees of freedom for this test is always one. It is therefore a method which combines the relative risk estimates of several two-by-two tables to produce a single summary, which is a weighted average across the individual tables. The analysis of case-control studies and meta-analysis are common applications of the method. Weights are usually directly proportional to the precision (2) of the individual estimates (or inversely proportional to their variance). Thus, larger studies are usually given more weight than smaller ones.

### masking

see blinding.

### matched funding

Agreement between two or more organisations to contribute equal shares to fund a project. This is the basis on which R&D Support Units in the S&W Region are supported.

### matching^{2}

The selection of controls in case-control studies to ensure a similar distribution of important prognostic factors (frequently age and sex) in the two study groups (cases and controls). Matching can be (a) individual or pairwise, or (b) by stratum group or frequency matching. In the latter, an equal distribution of prognostic factors is achieved by ensuring that cases and controls have, overall, similar numbers of subjects with the same characteristics, for relevant prognostic factors (e.g. similar proportion of males, or of people over 65). As with paired proportions the McNemar's test and conditional logistic regression are appropriate for the analysis of individually matched designs. The Chi 2 test Mantel-Haenszel estimates the Mantel-Haenszel Chi2 test and logistic regression are indicated in stratum matching. When matching, it is important to be aware of the risk of overmatching.

### maximum likelihood^{2}

An alternative method of fitting regression models. Specially indicated for Cox, logistic and Poisson regression where the least squares method is not appropriate. [The term likelihood measures the probability of a body of data given that certain values are chosen as the model's parameters (Clayton and Hills, 1993). The values which maximise this probability are said to produce the maximum likelihood model for the data.]

### McNemar's test^{2}

Statistical test which is a special form of the Chi-squared test used in the analysis of paired proportions.

### mean

The average value, calculated by adding all the observations and dividing by the number of observations. (synonyms: arithmetic mean, average)

### median^{2}

The measure of the centre of a distribution. As opposed to the mean. It is said to be a robust measure, given that it is not greatly affected by the presence of outliers. When the data are sorted according to increasing values of the variable of interest, the median is the middle value, i.e., the value which divides the data in half: 50% of observations have values lower than the median and 50% have values greater than the median. If the total number of observations is an even number, then the median is the average of the two central values. The median is also referred to as the 5Oth percentile. Should be used when it is inappropriate to use the mean (e.g. skewed distributions). When the median is used the spread of the observations can be expressed by relevant percentiles, and commonly by the interquartile range (IQR).

### meta analysis^{2}

A statistical analysis which combines the results of individual studies used in a systematic review producing a quantitative summary across the different studies. It uses methods such as Mantel-Haenszel estimates and Peto's method to calculate these summaries. Meta-analysis has the virtue of increasing the sample size available to estimate, say, the benefits of a given treatment. The technique is commonly used for randomised controlled trials of therapies or interventions. However, it can also be used for studies on risk factors or diagnostic tests, for example. Issues around meta analysis are publication bias, heterogeneity (which also involves decisions on the use of fixed or random effects models), use of individual data from all studies involved (if obtainable) or aggregated data, i.e. data summaries such as odds ratios obtained in individual studies (commonly used).

### meta regression^{1}

Multivariate meta-analytic techniques, such as logistic regression, used to explore the relationship between study characteristics (e.g. allocation concealment, baseline risk, timing of the intervention) and study results (the magnitude of effect observed in each study) in a systematic review.

### methodological quality^{1}

The extent to which the design and conduct of a trial are likely to have prevented systematic errors (bias). Variation in quality can explain variation in the results of trials included in a systematic review. More rigorously designed (better 'quality') trials are more likely to yield results that are closer to the 'truth'. See also external validity, validity. (synonyms: validity, internal validity)

### minimisation^{1}

A method of allocation used, particularly in small trials, to provide comparison groups that are closely similar for several variables. It can be done with or without a component of randomisation. It is best performed centrally with the aid of a computer program to ensure allocation concealment. It is a quasi-random method of allocating patients to the different treatments in a clinical trial. The rationale behind minimisation is the need to produce treatment groups that have similar distributions for important prognostic factors. The method is useful specially when dealing with small size trials, where simple randomisation often produces unbalanced groups.

### model^{2}

In the context of regression analysis, a model is an equation which summarises the relationship between an outcome variable and one or more predictor variables. When there is a single predictor variable the general form of the equation is: y = a + bx = constant value + rate of change in y per unit of x * value of x where: y is the predicted value of the outcome variable, x is the value of the predictor variable, a is the intercept and b is the regression coefficient. The difference between observed and predicted values is the residual. When more than one predictor needs to be included in a regression model, a multiple regression model is obtained.

### multiple regression^{2}

As opposed to simple regression it is the process of fitting a regression model with more than one predictor variable. In particular, multiple regression is used in cases where it is necessary to adjust for confounders or check for the presence of interactions. The general form of the equation is: y = a + b1X1 + b2X2 + b3X3 + . where: y is the predicted value for the outcome variable x1, x2, x3 ... are the values of the predictor variables, b1, b2, b31 ... are the regression coefficients and a is the intercept or constant. Each predictor is now associated with a regression coefficient defining its relationship with the outcome variable.

### multiple significance testing^{2}

The process of conducting multiple significance tests on the same body of data. An example of this is subgroup analysis where an overall test may be performed and then repeated for sub-groups of subjects sharing similar characteristics. For example, in a clinical trial comparing an active drug versus a placebo for the treatment of hypertension, one may be interested in making this comparison within different age groups, since the drug in question could be effective if used say, in younger patients, but not in older patients. If three age groups are defined, three statistical tests will be performed, the likelihood of a type I error increasing with the number of tests carried out. Two approaches to dealing with this problem are the use of corrections (e.g. Bonferroni) or the adoption of a more stringent cut-off point for acceptance of statistical significance (e.g. 0.01 instead of the conventional 0.05). Ideally, such analyses should be planned a priori to avoid spurious findings.

### multiplicative model^{1}

A model in which the joint effect of two or more factors is the product of their effects. For example, if one factor multiplies risk by a and a second factor by b, the combined effect of the two factors is a x b. See also additive model.

### multistage sampling^{2}

Method where the selection of study units is sampling done in more than one stage, going form the larger to the smaller units in a population. For example, in two stage sampling, a random sample of General Practices may be taken in the first stage (first-stage units) and patients (second stage units) subsequently chosen (also at random) from the selected practices. The method should be performed with 'proportional probability to size' with replacement, if the first-stage units have different sizes, in order to give second-stage units the same probability of being selected. Multistage sampling differs from cluster sampling in which all units in the first-stage units selected are studied.

### multivariate methods^{2}

A term frequently used to refer to statistical methods used in any analysis involving more than one predictor variable. In the strict sense, it refers to methods for analysing two or more outcome variables simultaneously. Methods commonly used are multiple regression, Logistic regression, cluster analysis, discriminant analysis, factor analysis, MANOVA and principal components analysis. The analysis of repeated measurements can be seen as a special application of multivariate methods.

### naturalistic research^{3}

This is the systematic study of phenomena in their natural context without alteration of the phenomena or the context for research purposes.

### negative predictive value^{2}

(NPV) (in diagnostic tests) The probability of not having the disease given that the test is negative. See predictive values.

### negative study^{1}

A term used to refer to a study that does not have "statistically significant" (positive) results indicating a beneficial effect of the intervention being studied. The term can generate confusion because it refers to both statistical significance and the direction of effect, studies often have multiple outcomes, the criteria for classifying studies as "negative" are not always clear and, in the case of studies of risk or undesirable effects, "negative" studies are ones that do not show a harmful effect.

### NNT^{1}

Number-needed-to-treat: a measure of the impact of a treatment or intervention. It states how many patients need to be treated with the treatment in question in order to prevent an event that would otherwise occur. The risk of the event in question in patients given the new intervention, and in patients given standard treatment (or no treatment) need to be known. The NNT is calculated as: NNT =1/(risk2- risk1) = 1/ARR where: risk2 is the risk in the control group and risk1 is the risk in the intervention group (both expressed as proportions). Because it is based on the absolute difference in risks it is possible for a treatment of only moderate or little efficacy (in relative terms) to have a small NNT, and therefore considerable impact when used to treat common diseases. This term is becoming preferred to odds ratios because of its more straightforward clinical interpretation! It is the inverse of the risk difference (absolute risk reduction).

### nominal variable^{1}

Categorical variable whose categories are not ordered (e.g. eye colour, nationality, blood group).

### non parametric methods^{2}

These are statistical methods for the analysis of data which do not conform with the requirements for parametric methods. Common non-parametric tests are the Mann-Whitney U test the Wilcoxon matched pairs signed rank test the Kruskall-WaIIis test and rank correlation. Interquartile ranges are an example of a descriptive measure which is not based on any assumptions about the distribution of the data. These methods are based on ranks rather than the actual observations. Data transformations may be an alternative to using non-parametric methods.

### nonparametric

### normal distribution^{2}

Or Gaussian distribution. Theoretical distribution which has the form of a bell-shaped curve and is perfectly symmetrical about its centre. A histogram of a variable with an approximately Normal distribution has a bell-shaped theoretical Normal distribution. The Normal distribution is totally defined by two parameters: the mean (reflecting its centre) and the standard deviation (reflecting the spread of individual observations). Due to the mathematical properties of this curve, the probabilities of having a measurement above or below any given value can be obtained. Tables of the Normal distribution give these probabilities for the Unit Normal Curve (i.e. a Normal distribution with mean 0 and standard deviation 1). For Normal distributions with other values for the mean and standard deviation, the above probabilities can be obtained by converting the original observations into z scores. Biological and other measurements frequently follow an approximate (or empirical) Normal distribution. Since many statistical methods - methods) are based on the properties and/or assumption of a Normal distribution, this is a desirable property for such measurements.

### normal plot^{2}

Graphical display of an interval/ratio variable used to visually assess the assumption of Normality for a given variable. A Normal plot for a variable which is Normally distributed differs from that for a variable displaying a positive skew. The vertical (y) axis represents the values of the variable in the original scale, and the horizontal (x) axis gives the inverse normal for the same variable (i.e., assuming it follows a perfect Normal distribution). If the variable in question has a Normal (or approximately Normal) distribution, the plot results in a fairly straight line. A concave curve is produced for variables with a positive skew, and a convex curve for variables presenting a negative skew. A more formal way of assessing Normality is by the Wilk or Shapiro-Wilk test.

### null hypothesis 2

In the context of statistical tests. This hypothesis states (H0 or NH) that there is no relationship between variables (e.g. caloric intake and body mass index), or differences between groups (e.g. aspirin and placebo equally effective in preventing death in patients with acute myocardial infarction), in the relevant populations. Significance tests are carried out on the assumption that this hypothesis is true. It is then necessary to decide the deviation of data obtained from what would be expected if the H0 were true. This is expressed as a probability or P-value. The smaller the P-value, the lower the likelihood of obtaining the result observed in the sample (or a more extreme one) if the H0 is in fact true. In simplest terms, the null hypothesis states that the results observed in a study are no different from what might have occurred as a result of the play of chance.

### number needed to treat (NNT)^{2}

A measure of the impact of a treatment or intervention. It states how many patients need to be treated with the treatment in question in order to prevent an event that would otherwise occur. The risk of the event in question in patients given the new intervention, and in patients given standard treatment (or no treatment) need to be known. The NNT is calculated as: NNT = 1/(risk2- risk1) = 1/ARR where: risk2 is the risk in the control group and risk1 is the risk in the intervention group (both expressed as proportions). Because it is based on the absolute difference in risks it is possible for a treatment of only moderate or little efficacy (in relative terms) to have a small NNT, and therefore considerable impact when used to treat common diseases. This term is becoming preferred to odds ratios because of its more straightforward clinical interpretation! It is the inverse of the risk difference (absolute risk reduction).

### observational study^{1, 3}

An observational study is one in which nature is allowed to take its course; where subjects are not submitted to actual interventions. Instead, subjects can be kept under observation for a given period of time, during which measurements are taken and events registered, or they can be interviewed and/or examined at a particular point in time. Examples of this type of study are cross-sectional, cohort and case-control studies. Observational methods provide the strongest evidence of clinical effectiveness in situations where the researcher is unable to allocate subjects to intervention (exposure) and control groups - they include case-control and cohort studies. They are central to epidemiological studies because such studies usually focus on exposures which arise through individual choice (e.g. smoking, alcohol) or occupational circumstances (e.g. asbestos, organo-phosphate sheep dips). It is clearly not possible or ethical to allocate subjects to be exposed to suspected risk factors! There is a greater risk of selection bias than in experimental studies (randomised controlled trials).

### odds

The probability of an event happening divided by the probability of it not happening, or the ratio of the number of times an event occurs to the number of times it does not occur, out of a given number of chances. Odds are used to convey the idea of 'risk' although the two are only approximately the same when considering rare events (e.g. winning the lottery). When the probability of an event is small (less than 0.1 or 10%), odds are essentially the same as a risk. (Odds are frequently used instead of risks because they have attractive statistical properties.) For a common event, such as a newborn baby being a boy or a girl, the risk is roughly 0.5 or 50%, but the odds are 1(50:50).

### odds ratio (OR)

Literally, the odds of something happening in one group (usually the exposed or intervention group) divided by the odds of it happening in another group (usually a control group). It is used as a measure of how much more likely an outcome is in one group compared to another; hence, as a measure of relative risk (in an observational study) or clinical effectiveness (in a trial). If it is equal to 1, then the likelihood of the outcome in the exposure/intervention group is no different to the likelihood in the control group. If the OR is greater (or less) than 1, then the outcome is more (or less) likely in the exposed/intervention group than in the control group. If an intervention gives an OR of 1.8, this means that the outcome of interest is about 80% more likely in the intervention than the control group. If an intervention gives an OR of 1.8, this means that the outcome of interest is about 80% more likely in the intervention than in the control group; if the OR is 0.6, the outcome is about 40% less likely in the intervention than the control group. When interpreting an odds ratio (or any measure of relative risk or effectiveness), note that the outcome being measured may be either undesirable (e.g. death, disability) or desirable (e.g. stopping smoking). For undesirable outcomes an OR that is less than one indicates that the intervention was effective in reducing the risk of that outcome. In the context of a trial, number needed to treat may be a more clinically relevant way of thinking about clinical effectiveness. When the event rate is small, odds ratios are very similar to relative risks. Odds ratios are often used in epidemiological studies (in particular case-control studies) or clinical trials as a measure of relative risk to compare the odds in exposed vs non-exposed or in intervention vs control groups. If the odds are the same in the two groups their ratio will be 1. OR = (odds in exposed)/(odds in non -exposed) = ad/bc. For rare diseases the OR and the risk ratio will be very similar.

### one sided test^{2}

A significance test which only explores one of the alternatives to the Null Hypothesis. In making a comparison between, say, the risk of disease experienced by the intervention and placebo groups in a clinical trial, the Null Hypothesis states that the risk of disease is the same in the two groups. In a one-sided test, as opposed to a two-sided test the alternative hypothesis tested is that the risk in the intervention group is less than the risk in the control group. The possibility that the risk in the intervention group could be greater than in the control group is disregarded.

### one-sided test^{2}

A significance test which only explores one of the alternatives to the Null Hypothesis. In making a comparison between, say, the risk of disease experienced by the intervention and placebo groups in a clinical trial, the Null Hypothesis states that the risk of disease is the same in the two groups. In a one-sided test, as opposed to a two-sided test the alternative hypothesis tested is that the risk in the intervention group is less than the risk in the control group. The possibility that the risk in the intervention group could be greater than in the control group is disregarded.

### one-way ANOVA^{2}

Analysis of variance of data classified according to a single factor or characteristic. For example, one may wish to compare the mean birth weight among different ethnic groups (classifying factor). As with the independent samples, t test the assumptions of Normality of distributions and similar variances in the groups being compared are required. The Null Hypothesis of no difference between the groups is tested by the F test.

### open clinical trial^{1}

1. A clinical trial in which the investigator is aware which intervention is being given to which participant (random allocation may or may not be used). 2. A clinical trial in which the investigator decides which intervention is to be given (non-random allocation). Also called open label design. 3. A clinical trial with an open sequential design.

### open label design^{1}

A trial in which the investigator decides who receives which intervention rather than using random allocation. See also open clinical trial.

### open-ended question

In a survey: a question where no preset categories of response are given. The respondent’s reply is recorded in full. It allows the respondent the freedom to give their own answer to a question, rather than forcing them to select one from a limited choice. Open-ended questions are commonly used in in-depth interviews, but they can also be used in quantitative structured interviews.

### observation^{3}

In qualitative research: the systematic watching of behaviour and talk in a naturally occurring setting. When used as a qualitative strategy the researcher watches behaviour, interactions and other factors in an environment, and records them by describing them in words. *Participant observation* - observation in which the researcher also occupies a role or part in the setting in addition to observing.

### ordered logistic regression

Logistic regression which is used when the outcome variable is an ordinal variable.

### ordinal variable^{2}

A categorical variable where the different levels of the variable are ordered. Going through successive levels represents successive 'increases' or 'decreases' in the characteristic conveyed by the variable. Examples of an ordinal variable are 'severity of pain': no pain, minimal pain, moderate pain, severe pain; and 'smoking status': non-smokers, ex-smokers, light smokers and heavy smokers. Unlike interval or ratio variables the differences between any two consecutive levels of an ordinal variable do not necessarily represent increments of the same magnitude. For example, the difference between no pain and minimal pain is unlikely to be the same as between minimal pain and moderate pain. Nonetheless, it is important to analyse this type of data with special methods (e.g. Chi-squared test for trend Kruskall-Wallis test) and not with methods for nominal variables. Ordinal data are often reduced to two categories to simplify analysis and presentation, which may result in a considerable loss of information.

### outcome variable^{2}

Also dependent or response variable. It represents the characteristic or measurement that is used to test the main hypothesis in an investigation. In the context of regression it is usually called the y variable and represents the observed values of y. The predicted values of y are those obtained by any given regression model. (Health Outcome)

### outliers^{2}

In a set of observations, outliers are those values which are far away from the 'average'. The identification of outliers is important and can be done by simple graphical methods. An important consequence of the presence of outliers is that data will not be Normally distributed. Thus, when comparing groups using a t test or when constructing a reference range it may be necessary to transform the data or use non-parametric methods. In the context of regression and correlation outliers tend to dominate and affect measures of association between two variables, as expressed by the regression coefficient (and intercept), and the correlation coefficient. However, outlying observations should never be discarded without careful consideration. On a graphical display of a regression analysis, the regression line may be estimated without taking outlier observations into account. The effect of these outliers on the regression line would be as follows: 1. outlier in Y. Intercept: slightly increased, slope: slightly decreased, its own residual: moderate size. 2. outlier in X. Intercept: moderately increased, slope: moderately decreased, its own residual: small size.

### overmatching^{2}

In the context of matching it occurs when cases and controls are matched for variables which are not confounding factors. For example, if cases and controls are matched for parental alcohol consumption in a case control study investigating the relationship between parental smoking and asthma in children, they will be made' to have similar smoking exposures since smoking is frequently associated with alcohol consumption. However, asthma is children is not related to parental drinking habits, and as a result, it may be wrongly concluded that parental smoking and asthma in children are not related.

### overview

see systematic review.

### P value^{2}

In the context of significance tests the P-value represents the probability that a given difference (or even a more extreme one) is observed in a study sample (between means proportions etc.), when in reality such a difference does not exist in the relevant population. The probability (ranging from zero to one) that the observed results in a study, or results more extreme, could have occurred by chance. Small P-values indicate stronger evidence to reject the Null Hypothesis (NH). For example, a P-value =0.004 can be interpreted as a 4 in 1000 chance of observing a difference of a given magnitude (or a more extreme one), say between two means, when, in the population, the two groups being compared have the same mean. Conventionally, a difference is said to be significant if its P-value is <0.05. However, it is preferable to report exact P-values rather than the usual 'NS' (non-significant) or 'P<0.05': it is clear that the difference between P=0.049 and P=0.051 is too small to deserve such dichotomy. When looking at correlation or regression the NH being tested is that the correlation or regression coefficients are equal to 0 (no relationship between the variables in question). The P-value may also be thought of as the probability that a type 1 error has occurred. In a meta-analysis the P-value for the overall effect assesses the overall statistical significance of the difference between the treatment and control groups, whilst the P-value for the heterogeneity statistic assesses the statistical significance of differences between the effects observed in each study.

### paired data

See independence.

### paired design^{1}

A study in which participants or groups of participants are matched (e.g. based on prognostic factors) and one member of each pair is allocated to the experimental (intervention) group and the other to the control group.

### paired t test^{2}

Special form of the t test which is used to compare the means of two paired variables (i.e., not independent). A common example of paired data are measurements taken in the same group of subjects before and after some treatment or intervention. The number of degrees of freedom (d.f.) for the paired t test is n-i, where n is the number of pairs. An assumption of the paired t test is that the 'differences' (for example, 'after'-'before' difference) are Normally distributed. The Wilcoxon test for matched pairs can be used as an alternative or with paired ordinal data.

### parallel design^{2}

In the context of clinical trials. As opposed to crossover designs, in a parallel design two (or more) separate groups of subjects each receive just one of the treatments being compared, one of them acting as the control group. Some parallel trials have more than two comparison groups and some compare different interventions without including a non-intervention control group. (synonym: independent group design)

### parametric methods^{2}

Statistical methods of data analysis which rely on one or more distributional assumptions for the data being analysed, commonly Normality and homoscedasticity (i.e., constant variability). Examples are t tests and Pearson's correlation.

### Pearson's correlation coefficient (r)^{2}

Or product-moment correlation coefficient. It measures the strength of the linear relationship between two quantitative variables. r can take any value between -1 and +1, where r = -1 represents a perfect negative correlation, r = + 1 a perfect positive correlation, and r = 0 no linear relationship. Thus, the absolute value of r indicates the strength of the linear relationship and its sign the direction of the relationship. It should be noted that r-0 does not imply no relationship at all, but the absence of a linear one. Another way of assessing the strength of a relationship is to compute the square of r, i.e., r-squared (P). The statistical significance of the correlation coefficient can be assessed by computing an associated P-value. It should be noted that the latter cannot give any information on the strength of the relationship itself: a small P-value is not synonymous with strong association. An assumption of parametric correlation is that one or both variables (significance testing, confidence intervals) are Normally distributed due to the effect outliers may have on r. When required assumptions cannot be met, rank correlation can be used instead.

### percentiles^{2}

For a given variable sorted in ascending order, percentiles (or quantiles) are the values of the variable below which a certain percentage of the observations is found. Thus, for a given set of measurements, 100/0 of the observations have values below the value corresponding to the lOth percentile. Special percentiles are the median (50th percentile), and the 25th and 75th percentiles, also known as quartiles.

### performance bias^{1}

Systematic differences in care provided apart from the intervention being evaluated. For example, if patients know they are in the control group they may be more likely to use other forms of care, patients who know they are in the experimental (intervention) group may experience placebo effects, and care providers may treat patients differently according to what group they are in. Blinding of study participants (both the recipients and providers of care) is used to protect against performance bias.

### period effect^{2}

In the context of crossover trials. It refers to the effect of time on disease, as measured by relevant outcomes. This acknowledges the fact that disease, and therefore patients' responses, may vary from one period to the next, regardless of any concomitant treatments. This may be due to learning effects (for example, learning to cope with pain). The presence of a period effect is not as serious as the presence of a treatment-period interaction since treatment order is randomly allocated in crossover trials.

### person time at risk^{2}

The sum of the individual lengths of time each subject is under observation in a follow-up study (observational or experimental). It can be estimated as the number at risk of the event of interest (contracting a disease or dying) multiplied by the average length of the study period. Usually, time is expressed as years - person-years at risk (PYAR), but it can also be expressed as days, weeks, etc. Person time at risk is the denominator in the computation of rates.

### Peto method^{2}

A way of combining odds ratios that has become widely used in meta-analysis. The method is particularly indicated for the analysis of estimates from clinical trials where the sample sizes of the groups being compared tend to be fairly similar. The method can lead to biased results if the estimate OR is far from 1. Peto's method is based on observed and expected frequencies when the results are displayed in a two-by-two table. Mantel-Haenszel estimates are an alternative way of combining results in meta-analyses. The calculations are straightforward and understandable. In some circumstances the Peto odds ratio can differ substantially from the exact odds ratio. It is a fixed effect model.

### phase I studies^{1}

The first stage in testing a new drug in humans. Usually performed on healthy volunteers without a comparison group.

### phase II studies^{1}

Second stage in testing a new drug in humans. Often performed on healthy volunteers. These are sometimes randomised controlled trials.

### phase III studies^{1}

Studies that are a full-scale evaluation of treatment. After a drug has been shown to be reasonably effective, it is essential to compare it to the current standard treatments for the same condition. Phase III studies are often randomised controlled trials.

### phase IV studies^{1}

Studies that are concerned with post-marketing surveillance. They are often promotional exercises aimed at bringing a new drug to the attention of a large number of clinicians, and may be of limited scientific value.

### phenomenology^{3}

In qualitative research: the analysis and description of everyday life - the life-world and its associated states of conscious.

### placebo^{1}

A biologically inert treatment (substance or procedure) often given to control subjects in a clinical trial which is indistinguishable from the active intervention, so that all subjects in the trial do not know whether they are in the control or intervention group. This process of disguising which group a subject has been allocated to is usually called blinding. Helps prevent information biases since it enables both patients and researchers to remain blind to the treatments given.

### placebo effect^{1}

A favourable response to an intervention, regardless of whether it is the real thing or a placebo, attributable to the expectation of an effect, i.e. the power of suggestion. The effects of many healthcare interventions are attributable to a combination of both placebo and "active" (non-placebo) effects.

### point estimate^{1}

The results (e.g. mean, weighted difference, odds ratio, relative risk or risk difference) obtained in a sample (a study or a meta-analysis) which are used as the best estimate of what is true for the relevant population from which the sample is taken. A confidence interval is a measure of the uncertainty (due to the play of chance) associated with that estimate.

### Poisson regression^{2}

A regression method for the analysis of counts (e.g. number of cases of a rare disease in different geographical areas) and rates.

### polytomous logistic regression^{2}

Logistic regression which is used when the categorical outcome has more than two unordered categories. Also termed multinomial logistic regression.

### population

The total set of units (people, events, objects) for which answers are required. The population to whom the results of a given research study are to be generalised. In this case, a sample of subjects drawn from the population in question has been studied.

### positive predictive value

PPV (in diagnostic tests). The probability of actually having a disease or condition given that the test is positive. See predictive values.

### positive study

A term used to refer to a study with results indicating a beneficial effect of the intervention being studied. The term can generate confusion because it can refer to both statistical significance and the direction of effect, studies often have multiple outcomes, the criteria for classifying studies as negative or positive are not always clear and, in the case of studies of risk or undesirable effects, "positive" studies are ones that show a harmful effect.

### post test probability^{2}

In the context of diagnostic tests. It represents an individual's probability of having a given disease or condition, given a particular test result. It depends not only on the prevalence of the condition in question, but also on the likelihood ratio (LR) for that test result. It is calculated as: PostProb = (post test odds)/(post test odds + 1) where post test odds = LR x pre test odds (see pre test probability). The post-test probability of disease following a test result is also known as the predictive value of the same result. Likelihood ratios are used to convert pre-test probabilities into post-test probabilities. Note that pre- and post-test probabilities are not absolute. Having calculated post-test probability for one test this value can be the pre-test probability for the next.

### power^{2}

Probability of finding a difference, which actually exists, to be statistically significant. For example 80% power in a clinical trial of size N, represents an 80% chance of detecting a true difference in proportions - equal to some pre-specified value - with a small associated P-value. In many studies, observed differences may be simply due to chance, but real differences may not reach statistical significance in small sample size trials. The power of a particular study is increased by increasing its sample size. With quantitative outcomes, the greater the variability of the individual measurements, the lower is the power of the study. The level of significance required for the results (P-value or type 1 error) also determines the power of a study. The complement of the power is the type II error (power=100-beta).

### pre-test probability^{2}

In the context of diagnostic tests, it is the probability that an individual patient has got a disease or condition prior to the undertaking of relevant diagnostic procedures. The pre-test probability of having a disease is usually estimated by the prevalence of the disease. The following formula is used to convert pre-test probabilities (p) into pre test odds: p/(1 - p) (and p (odds )/(1 + odds) Likelihood ratios are used to convert pre-test odds into post-test odds, which in turn can give the post-test probability of disease.

### precision^{2}

The number of significant digits obtained for a measurement. It is important to note that a precise measurement is not necessarily an accurate one. Precision in the context of estimation refers to the magnitude of standard errors (SE). This is reflected in the width of the confidence intervals (CI) constructed around the same estimates. Wide CIs reflect a lot of uncertainty about the population values, and stem from small sample sizes and (or large variability. A precise estimate is only useful if it is unbiased. Precision can also be defined as 1. A measure of the likelihood of random errors in the results of a study, meta-analysis or measurement. Confidence intervals around the estimate of effect from each study are a measure of precision, and the weight given to the results of each study in a meta-analysis (typically the inverse of the variance of the estimate of effect) is a measure of precision (i.e. the degree to which a study influences the overall estimate of effect in a meta-analysis is determined by the precision of its estimate of effect). 2. The proportion of relevant citations located using a specific search strategy, i.e. the number of relevant studies (meeting the inclusion criteria for a trials register or a review) divided by the total number of citations retrieved.

### prediction^{2}

A forecast of the value for a variable, based on knowledge of the value of at least one other variable, and a model which links one variable (outcome) to the other (predictor).

### predictive values^{2}

(In diagnostic tests) measure how useful a test is in practice. The positive predictive value (PPV) of a test is the probability of actually having a disease or condition given that the test is positive. The negative predictive value (NPV) is the probability of not having the disease given that the test is negative. PPV = (all testing positive and diseased)/(all testing positive) = a/(a + b) NPV = (all testing negative and non-diseased)/(all testing negative) = d/(c + d) Predictive values are affected by changes in the prevalence of a condition. A lower prevalence results in a decreased PPV, and a higher prevalence results in an increased PPV. The converse is true for the NPV. Thus, good diagnostic tools (in terms of sensitivity and specificity) may lead to a large number of false positive diagnoses in circumstances of low disease prevalence.

### predictor variable^{2}

Explanatory or independent variable. In the context of regression it refers to a variable used to determine or predict the values of another variable called the outcome.

### prevalence^{2}

A measure of morbidity or of disease (or other outcome) frequency. As opposed to incidence, it is the total number of existing cases of a disease or condition at a particular point in time (point prevalence) or during some specified period (period prevalence), divided by the total population or by the total population at midpoint of the specified interval. Usually expressed as a percentage or per 1000, 10,000 or 100,000 if very small. In the context of diagnostic tests, the prevalence is often used as an estimate of the pre-test probability of disease. Prevalence is relatively quick and cheap to measure, but may be misleading. It may be a very unstable measure, since it can be dramatically affected by changes in (a) methods of detecting a disease, (b) mortality from the disease, or (c) the rate of cure from the disease.

### prevalence study

### principal components analysis^{2}

A multivariate method in which the original measurements are replaced by their weighted averages, called the principal components. These are termed first, second, etc., and should be uncorrelated with each other. The first principal component is the one that maximises differences between subjects. The eigenvalue is a measure of the amount of variation explained by each principal component.

### probability distribution

The function that gives the probabilities that a variable equals each of a sequence of possible values. Examples include the binomial, chi square, normal and Poisson distributions.

### probing

In questionnaires and interviews, a follow-up to an answer given but in a systematic way.

### prognostic factors^{2}

Patient or disease characteristics which influence the course of a particular condition.

### proportion^{2}

The ratio of the number of subjects with a given characteristic to the total number of subjects in a group.

### proportional hazards model (Cox model)^{1}

A statistical model in survival analysis that asserts that the effect of the study factors (e.g. the intervention of interest) on the hazard rate (the risk of occurrence of an event, such as death, at a point in time) in the study population is multiplicative and does not change over time.

### proportionate stratified sampling^{1}

Stratified sampling where the number of units sampled within each stratum are proportional to the size of the stratum. See Disproportionate stratified sampling.

### prospective study

Longitudinal study where subjects are followed up and data collected forward in time from the start of the study. This term can be meaningfully applied to a prospective cohort study, and clinical trials. Prospective studies mean that the researcher can usually obtain much more valid and complete data, but it can be an expensive exercise if subjects need to be followed for long periods of time. Case control studies are never prospective. In epidemiology a prospective study is sometimes used as a synonym for cohort study. Concurrent cohort studies are prospective studies, whereas historical cohort studies are not (see cohort study). See retrospective study. The term prospective is also used to refer to a study that uses newly collected data, as opposed to existing information. In this sense, case control studies can sometimes be carried out prospectively, in particular case control studies that are nested in cohort studies.

### protocol^{1, 3}

A research protocol is a plan or set of steps to be followed in a study. A clinical protocol is a formal description of a procedure for patient care. An audit protocol describes the steps involved in an audit. A protocol for a systematic review should describe the rationale for the review; the objectives; and the methods that will be used to locate, select and critically appraise studies, and to collect and analyse data from the included studies.

### psychometric properties

These desirable features of a questionnaire, Quality of Life measure or other instrument are validity, reliability and sensitivity.

### publication bias^{2}

A type of bias which arises due to selective publication in medical journals of articles which report statistically significant results. Given that statistical significance is not synonymous with quality, validity (2), or clinical significance this practice can cause studies of poor quality and misleading results to have much greater impact on clinical and policy decisions than they merit. Also, good studies that have conclusively demonstrated a lack of treatment effect or a lack of association may never get to be published. The importance of such studies is often underestimated. If a study has been planned and conducted in an appropriate way, to provide answers to important questions, the results it produces are both reliable and important, regardless of the magnitude or significance of the same. The issue of publication bias is central to systematic reviews, which should not be conducted without an exhaustive search for all published and unpublished studies on any particular subject.

### purposeful/purposive or systematic sampling^{3}

In qualitative research: this is deliberate choice of respondents, subjects or settings as opposed to *statistical sampling*, concerned with the representativeness of a sample in relation to a total population. * Theoretical sampling* links this to previously developed hypotheses or theories.

### qualitative research^{3}

This research paradigm deals with the human experience and is based on analysis of words rather than numbers. Qualitative research methods seek to explore rich information usually collected from a fairly small sample. It includes methods such as in-depth interviews, focus groups, action research and ethnographic studies. These are in-depth studies of relatively few individuals using unstructured and open-ended data collection methods (e.g. tape and video recordings) to provide the greatest possible insight into the underlying beliefs, needs and opinions of the subjects studied. Qualitative research may be an invaluable way of exploring the factors which one might choose to measure quantitatively in a larger sample of subjects. It can also be a fruitful way of generating hypotheses to test by quantitative methods. It may be helpful to think of qualitative and quantitative research paradigms complementing each other at different stages, as knowledge is acquired in a particular field. Thus qualitative research can inform quantitative research at an early stage of knowledge, or help to understand the findings of a quantitative study at a more advanced stage; the latter understanding may then lead on to a further quantitative study.

### quality

### quality of life (QoL)

A concept representing individual responses to the physical, mental and social effects of illness on daily living which influence the extent to which personal satisfaction with life circumstances can be achieved" (Bowling, 1991). Measured using generic instruments such as the Nottingham Health Profile or the SF36 or disease-specific instruments.

### quality score^{1}

A value assigned to represent the validity of a study either for a specific criterion, such as allocation concealment, or overall. Quality scores can be use letters (A, B, C) or numbers. An advantage of using letters is that the order of best to worst may be more obvious than for numbers.

### quantitative research^{3}

Is essentially concerned with numerical measurement and numerical data. All experimental research is based on a quantitative approach. Quantitative research tends to be based on larger sample sizes in order to produce results that can be generalised to a wider population.

### quantitative variable

As opposed to a categorical variable. A count (discrete variable e.g. number of children, number of times visited GP, etc.) or measurement (continuous variable e.g. weight, length of hospital stay, etc.). For such variables, there is usually a true zero representing the absence of a quantity or a zero count, and in addition, it is sensible to talk about doubling or halving the measurements or counts. See also interval and ratio variables.

### quartiles^{2}

For a given variable sorted in ascending order, the 25th percentile is the value below which 25% of all observations fall, and the 75th percentile is the value below which 75% of the observations fall. The range of values falling between the quartiles is known as the interquartile range.

### quasi-random allocation^{1}

A method of allocating participants to different forms of care that is not truly random; for example, allocation by date of birth, day of the week, medical record number, month of the year, or the order in which participants are included in the study (e.g. alternation). A quasi randomised trial is one using a quasi-random method of allocating participants to different forms of care. There is a greater risk of selection bias in quasi-random trials where allocation is not adequately concealed compared with randomised controlled trials with adequate allocation concealment.

### quota sample

Stratification of the population whereby the number of units selected within each stratum are in proportion to the size of each stratum but sample units within them are selected non-randomly.

### r squared^{2}

The square of the Pearson's correlation coefficient. Used in the context of correlation and regression. It represents the proportion of total variability in a variable (the outcome in regression) which is explained by another variable or variables (predictors in regression). In other words, it states how much of the value of one of the variables can be attributed solely to the value of the other variable. It is a useful way of assessing the clinical significance of the association between two or more variables. A better measure is the adjusted -, which is corrected for chance predictions, thus enabling the comparison of models with different number of predictor variables.

### random allocation^{1}

A method that uses the play of chance to assign participants to comparison groups in a trial, e.g. by using a random numbers table or a computer-generated random sequence. Random allocation implies that each individual or unit being entered into a trial has the same chance of receiving each of the possible interventions. It also implies that the probability that an individual will receive a particular intervention is independent of the probability that any other individual will receive the same intervention. See also concealment of allocation, quasi-random allocation, randomisation.

### random effects^{2}

As opposed to fixed effects. This term is used in the context of meta-analysis, when results from individual studies are combined (producing a single estimate). Confidence intervals for these estimates are computed by adding extra uncertainty (random effect) to that which is always associated with estimation. Both within-study sampling error (variance) and between-studies variation are included in the assessment of the uncertainty (confidence interval) of the results of a meta-analysis. If there is significant heterogeneity among the results of the included studies, random effects models will give wider confidence intervals than fixed effect models. The assumption is that the studies being summarised are just a random sample of all possible studies, the underlying 'true' value for the population varying from study to study. Tests of heterogeneity are used to decide on the choice of a random or a fixed effects model. In the context of analysis of variance (ANOVA), the term is used to refer to factors (e.g. subject or observer) whose values do not take fixed values (unlike gender, for example). See Fixed effect model.

### random error^{1}

Error due to the play of chance. Confidence intervals and P-values represent the probability of random errors, but not systematic errors (bias). (synonym: sampling error)

### random permuted blocks^{1}

A method of randomisation that ensures that, at any point in a trial, roughly equal numbers of participants have been allocated to all the comparison groups. Permuted blocks are often used in combination with stratified randomisation.

### random sample

A sample that is not biased, thus not displaying any patterns or trends which are different from those displayed by its source population. Governed by chance. Having the quality of something which has no defined pattern. See randomisation.

### random sampling

The process of obtaining a sample that is not biased, thus not displaying any patterns or trends which are different from those displayed by its source population. Governed by chance. Having the quality of something which has no defined pattern. See random sample

### random selection^{1}

A method of obtaining a representative, unbiased group of people from a larger population. Every member of the population has an equal chance of being included in the sample. Random selection which is not related to how participants are allocated to comparison groups is frequently used in cross-sectional and cohort studies, which are not randomised controlled trials, and it is frequently not used in randomised controlled trials. In older trial reports, however, the term is occasionally used instead of random allocation or randomisation. (synonym: random sampling)

### randomisation^{1, 2}

The process of allocating treatment units (patients) to the alternative treatments in a clinical trial. Usually, a sequential list of numbers is prepared in advance of a project starting, and a treatment code is randomly allocated to each number. As subjects are recruited, they are given the next number on the list (to identify them anonymously) and are allocated a group according to the treatment code. The list and associated treatment codes should not be prepared by anyone actively involved in the trial, and the codes should be 'hidden' until a subject is definitely recruited. (Treatment codes are sealed in envelopes.) Random allocation of a large sample of subjects (in excess of 100) is the only way to ensure that all possible confounding factors are balanced between groups. Note that 'pseudo' random methods of allocation, like alternation, coin-tossing when the subject is present, or any system which allows the researcher to have prior knowledge of the allocation for a subject are frowned upon, because they can all allow bias to creep in. (For example, if a researcher knows that the next subject should be allocated to the intervention, he or she can alter the order in which eligible subjects are considered, or reject a subject as ineligible, so that subjects with certain characteristics don't receive the intervention - resulting in bias!) The method of randomisation should be distinguished from concealment of allocation because of the risk of selection bias despite the use of randomisation, if there is not adequate allocation concealment. For instance, a list of random numbers may be used to randomise participants, but if the list is open to the individuals responsible for recruiting and allocating participants, those individuals can influence the allocation process, either knowingly or unknowingly. Randomisation is one of the main ways of avoiding selection biases the purpose of randomisation is to produce comparable treatment groups. Its main advantage is that treatment allocation can be carried out blindly before patient entry into a trial, i.e., without knowledge of who the patients may be, the order in which patients will appear or the treatments they are being allocated to. Simple random allocation does not always produce the desired effects, especially when sample sizes are small. Modifications to the simple procedure are sometimes necessary. Minimisation is a quasi-random allocation procedure that ensures similar distribution of important prognostic factors in the treatment groups, and is especially good for small samples. Stratified random allocation is used to the same effect, especially with larger samples. Random allocation may sometimes produce groups with unequal sample sizes. This problem may be eliminated by using restricted randomisation (with random permuted blocks). See concealment of allocation.

### randomised controlled trial^{2}

(RCT) A clinical trial where at least two treatment groups are compared, one of them serving as the control group, and treatment allocation is carried out using a random, unbiased method. see RCT. The results are assessed by comparing outcomes in the treatment and control groups. NOTE: when using randomized controlled trial as a search term (publication type) in MEDLINE, the US spelling (randomized) must be used.

### range^{2}

The interval that goes from the minimum to the maximum value in a set of quantitative measurements. Commonly reported as a single figure, e.g. 6, preferably both the minimum and maximum should be quoted (e.g. 11 to 17).

### rank correlation^{2}

A nonparametric method of assessing the association between quantitative or between ordinal variables. Spearman's and Kendall's rank correlation are the methods commonly employed. The resulting coefficients (rho and tau) are to be interpreted in the same way as the Pearson's correlation coefficient. However, rank correlation methods assess linear relationships between the ranks given to the values of the variables in question.

### ranks^{2}

The relative position of the observations of a given variable (ordinal or interval/ratio). For example, if one had the variable 'age' with 5 observations: 65, 49, 31, 57 and 49 (reordered: 31, 49, 49, 57, 65), these would be given the ranks: 1, 2.5, 2.5, 4 and 5. When values are ordered according to size into a 'league table', the rank of a given value represents its position in the table. Non-parametric methods of analysis are frequently based on ranks.

### rate^{2}

A summary measure which conveys the idea of risk over time. The denominator is expressed as person-time at risk and the numerator is the number of occurrences of a particular event. Rates can be used as measures of mortality (mortality rates) or morbidity (incidence rates).

### rate ratio^{1}

In epidemiology, the ratio of the rate of disease in the exposed population compared to that in the unexposed population. See also: hazard ratio) (synonymous with relative rate)

### ratio variable^{2}

A quantitative variable which has a true zero. Unlike interval variables the ratio of two values has the same meaning regardless of the scale used to make the measurements. An example of this type of variable is weight, a 10% increase in weight from 30 to 33 pounds representing the same 10% increase when measurements are expressed in Kilograms (approximately from 15 Kg to 16.5 Kg).

### RCT

See randomised controlled trial.

### recall bias

see information bias.

### reference range^{2}

A range of values which measures the variability of a given measurement among ‘normal’ individuals (thus, sometimes called 'normal range'). 'Normal' usually refers to non diseased subjects, but the definition of normal' may vary with the context in which it is used. Thus, a clear description of the characteristics of the sample used to construct any reference range is very important. Within a 95% reference range we find 95% of all individual observations for a given measurement, 2.5% lying outside of either end of the range. To be sure that this range is calculated with a fair degree of certainty, it is important to use a large enough sample (some authors suggest at least 200). If the measurements follow an approximately Normal distribution their mean and standard deviation can be used to construct the reference range: 95% Reference range = mean +/- 1.9 x SD

### regression^{2}

A statistical method used specially for the purpose of prediction. In simple linear regression, the relationship between the outcome variable (V) and the predictor variable (x) both interval/ratio is summarised by means of a model or line. The regression model is used to predict the value of the outcome variable given the value of the predictor. In other words, the model specifies by how much the value of y will go up (or down) for each unit increase in the value of x. 'By how much' is given by the regression coefficient or slope of the best fit line. Another characteristic of this line is its intercept. The line of best fit is found using the least squares method, which seeks to minimise the total sum of the squared differences (i.e., vertical distances or residuals) from each observation to any given straight line going through the data points. Residuals are used to assess the goodness of-fit of regression models. Categorical predictors may also be used, either on their own (equivalent to ANOVA) or together with quantitative predictors (ANCOVA). Generally, multiple regression allows the use of more than one predictor variable. When the outcomes are categorical, logistic regression is indicated. Cox regression is used for the analysis of survival times and Poisson regression to analyse counts and rates.

### regression coefficient^{2}

Or slope of the line of best fit. It represents the increments predicted in the outcome variable for each unit increase in the predictor variable. When the predictor is a categorical variable the regression coefficient represents the average difference between any given level of the variable and the level taken as the baseline or standard (e.g. smokers vs non-smokers). As with correlation coefficients a slope of 0 represents no relationship between the variables. However, regression coefficients are not restricted to take values between -1 and +1.

### regression diagnostics^{1}

Checks carried out after developing a in order to detect problem areas which could result in its wrong interpretation. One important aspect concerns the assumption of a linear relationship between outcome and predictor variable. The presence of outlying observations may exert undue influence on a regression model. Plots of residuals are widely used to detect these situations. Data transformations (e.g. to a logarithmic scale) can often deal with many of the problems encountered.

### regression model^{1}

A mathematical representation of the relationship of a dependent variable (outcome) to a combination of explanatory variables (sometimes called predictor variables or covariates).

### relative risk (RR)

A term which is often used (rather imprecisely) to describe how much more likely an outcome is in one group compared to another (see odds ratio); like an odds ratio, it takes values above or below 1, with the value of 1 representing no difference between groups. For undesirable outcomes an RR that is less than one indicates that the intervention was effective in reducing the risk of that outcome Different measures of the differences between groups (e.g. odds ratio, risk ratio, rate ratio) may all sometimes be referred to as relative risk; the imprecise use of relative risk has arisen because, when the probability of an outcome is rare, then all of these measures take approximately the same value (synonym: risk ratio)

### relative risk reduction

(rgwr) alternative way of expressing relative risk (RR). It is calculated as: rgwr = (1 - RR) x 100 % The rgwr can be interpreted as the proportion of the initial or baseline 'risk' which was eliminated by a given treatment or intervention, or by avoidance of exposure to a risk factor. When the RR gives values greater than 1, what is calculated is the 'excess relative risk': ERR = (RR - 1) x 100 %

### reliability^{3}

This is concerned with the consistency and dependability of a measuring instrument, i.e. it is an indication of the degree to which it gives the same answers over time, across similar groups and irrespective of who administers it. A reliable measuring instrument will always give the same result on different occasions assuming that what is being measured has not changed during the intervening period. In the context of clinical measurement, it refers to a method of measurement which is not only accurate but also consistently so. Thus, reliability requires repeatability reproducibility and accuracy . An index of reliability can be calculated from the variability of the repeated (or paired) measurements (see repeatability): R = 1 - (observed disagreement)/(chance - expected disagreement) where: observed disagreement = Variance of errors (see repeatability) and chance-expected disagreement = variance of all measurements, ignoring the pairing. R takes values from 0 (no reliability) to 1 (perfect reliability). There is a parallel between this measure of 'agreement' for quantitative measurements and the kappa statistic used to assess agreement between measurements on a categorical scale: R measures the proportion of the observed variability in the measurements which is over and above that due to measurement error (i.e., the proportion which is due to variability in the subjects being studied). R (like kappa) is population dependent. For the same measuring device or method, the value of R will vary according to the variance of the measurements in different populations. Greater variability has the effect of increasing the value of R. The reliability of a measuring method gives information on how good the method is at ascribing the correct measurement value to individuals in a population. Another measure of reliability is the intraclass correlation coefficient. Lack of reliability can arise from divergences between observers or measurement instruments, or instability in the attribute being measured.

### repeatability^{2}

In the context of clinical measurement, it refers to the variability of repeated measurements taken under similar conditions. Repeatability can be expressed by the standard deviation of the measurement errors (sometimes called 'standard error of measurement') where: difference = measurement1 - measurement2. The SD of errors can be used to calculate, say, 95% 'limits of agreement' for the repeatability of measurements. Their interpretation is similar to that of reference ranges. The estimate of the SD of errors should therefore come from a large unbiased sample of individuals. Repeatability is important in the assessment of reliability (in the formulae for R, observed disagreement - Variance of errors).

### repeated measurements analysis^{2}

An analysis of measurements taken on one or more groups of subjects, where more than one measurement per subject is taken, usually over a period of time. The main issue here is the lack of independence of observations pertaining to a single subject. Data of this sort are commonly analysed using inadequate methods (including incorrect graphical displays), such as multiple significance testing (multiple comparisons at different time points), analyses of variance in which the lack of independence of the observations is not taken into account, and graphs showing the average for each group at the different time points, thus 'hiding' possibly important individual patterns. Although special repeated measures analysis of variance methods do exist, which deal with the above problems, other straightforward and effective methods may also be used, requiring solely the choice of sensible summary measures. These summaries reduce the multiplicity of data to fewer 'observations' (the chosen summaries), which in turn may be analysed by simple methods. For example, heart rate measurements over a period of 3 hours (measured at 10 minute intervals - 18 measurements per patient), following the administration of two different anxiolytic drugs, may be averaged, producing a single post-treatment measurement for each patient in the trial.

### reproducibility^{2}

In the context of clinical measurement, it refers to the variability of repeated measurements taken under different conditions, for example, the comparison of two alternative methods of measurement. Repeatability within each method is an important determinant of reproducibility, and should always be assessed.

### residuals^{2}

In the context of regression residuals are the numerical differences between observed and predicted values. The analysis of the pattern of residuals is useful in determining the appropriateness of a particular model to the data it is meant to describe (regression diagnostics).

### respondent validity^{3}

In qualitative research: this is the process by which a researcher will double-check a respondent's view or understanding after the interview. The respondent is offered the opportunity to make changes. This is also known as member checking.

### retrospective study^{2, 3}

An observational study where information on outcome or presence or absence of disease is first collected and subjects are investigated for possible past exposure to a risk factor of interest. Case control studies are in this category. The term is also used to refer to a study that uses data collected prior to the set up of the investigation, as opposed to newly collected data. In this sense, cohort studies can sometimes be carried out retrospectively. In a retrospective study, data for subjects are collected from historical sources, i.e. retrospectively (e.g. hospital or GP case records, by asking subjects to recall past experiences and exposures); note that the subjects in the study may still be recruited as they are identified, i.e. prospectively. This term is only meaningful in the context of case studies and series, case-control studies, and retrospective cohort studies. Because no follow-up is involved, such studies are relatively quick and cheap to carry out compared to prospective ones, but there can be major problems with incomplete data and biased recall of information. See prospective study.

### review^{1}

1. A systematic review. 2. A review article in the medical literature which summarises a number of different studies and may draw conclusions about a particular intervention. Review articles are often not systematic. Review articles are also sometimes called overviews. 3. To referee a paper. See referee, referee process, external peer reviewer.

### risk

The probability of a given event happening.

### risk difference

The absolute difference in the event rate between two comparison groups. A risk difference of zero indicates no difference between comparison groups. For undesirable outcomes an RD that is less than zero indicates that the intervention was effective in reducing the risk of that outcome. (synonym: absolute risk reduction)

### risk factor^{1}

An aspect of a person's condition, lifestyle or environment that increases the probability of occurrence of a disease. For example, cigarette smoking is a risk factor for lung cancer.

### risk ratio^{2}

The ratio of the risk of an event in one group (exposure or intervention) to that in another group (control). The term relative risk is sometimes used as a synonym of risk ratio. If there is no difference in risk between the two groups, the risk ratio will be 1. A risk ratio greater than 1 suggests a greater risk of the event in the exposure group. The converse is true if the risk ratio is less than Risk ratio = risk in exposed group/risk in control group = (a/(a + c))/ (b/(b + d)) robust method descriptive term for a measure, significance test or method of estimation which is not grossly affected by influential outlying observations. Medians, confidence intervals based on ranks and non-parametric tests are common examples.

### ROC

Receiver Operating Characteristic plot.

### routine data collection

The collection of data systematically and routinely, e.g. information on length of stay for hospital in-patients (and other details), cancer and notifiable disease registrations, GP prescribing (in some practices!); beware that routine data may be biased. These data can be a useful way to test hypotheses (analytical cross-sectional, ecological or retrospective cohort study) cheaply and quickly; e.g. after controlling for the reason for admission and the age of patients, is there a trend for elective surgical patients who live further away from a hospital to stay in hospital longer? Note that interesting findings should then be pursued using a method which is less susceptible to bias.

### run in period

A period before a trial is commenced when no treatment is given. The data from this stage of a trial are only occasionally of value but can serve a valuable role in screening out ineligible or non-compliant participants, in ensuring that participants are in a stable condition, and in taking baseline observations. A run-in period is sometimes called a washout period if treatments that participants were using before entering the trial are discontinued.

### sample^{2}

A group of subjects selected or sampled from a wider group or population according to some pre-specified criteria. A sub-set of the total population.

### sample size^{2}

A calculation performed in advance of carrying out a study to estimate the sample size required to allow the study to detect a clinically important difference between the groups being compared. A sample size calculation should even be carried out for a descriptive study, e.g. to estimate the prevalence of a disease; in this case, the sample size should be chosen to allow the study to achieve the required size of confidence interval. The sample size is the number of participants required in a study so that differences thought to be clinically important can be detected as statistically significant (at a given level of alpha or type l error), if indeed they do exist. In some instances, sample sizes are calculated for the purpose of estimation in which case the issue is not power but the precision (2) (or width) of confidence intervals constructed around the observed quantities (means proportions. differences, etc.). Such calculations produce larger required sample sizes, as compared to 'power calculations'.

### sampling^{2}

The process of selecting a group of subjects (sample) from a population. Information provided by the sample allows conclusions to be drawn about the population. In surveys random and non-random methods of sampling are used. Among the former, common methods are simple random sampling, stratified sampling, cluster sampling and multistage sampling. Systematic sampling and quota sampling are examples of non-random methods, frequently used in market research. The ability to generalise from sample to population relies on its representativeness or lack of bias.

### sampling error

See random error.

### sampling frame

A list or other collection of members of the population which can be used in the selection of a sample. An ideal sampling frame contains every member of the target population once and once only.

### SD

See standard deviation.

### SE

See standard error.

### selection bias^{1, 2}

1. In assessments of the validity of studies of healthcare interventions, selection bias refers to systematic differences between comparison groups in prognosis or responsiveness to treatment. Random allocation with adequate concealment of allocation protects against selection bias. Other means of selecting who receives the intervention of interest, particularly leaving it up to the providers and recipients of care, are more prone to bias because decisions about care can be related to prognosis and responsiveness to treatment. 2. Selection bias is sometimes used to describe a systematic error in reviews due to how studies are selected for inclusion. Publication bias is an example of this type of selection bias. 3. Selection bias, confusingly, is also sometimes used to describe a systematic difference in characteristics between those who are selected for study and those who are not. This affects the generalisability (external validity) of a study but not its (internal) validity. In the context of surveys, selection bias refers to systematic differences between a sample and its source population. It is usually caused by inappropriate sampling (sampling bias). Conclusions drawn from such a sample are unlikely to be generalisable to the entire population. Case control studies are also prone to selection bias: if cases are frequently 'suspected of being cases' (and therefore diagnosed and included in studies) because their unusually high levels or high prevalence of exposure to a particular risk factor result in more intensive investigation (detection bias). In the context of clinical trials selection biases occur due to methods of treatment allocation which lead to imbalances between treatment groups, with respect to important prognostic factors. Problems occurring after patient entry into a trial may also lead to selection biases (dropouts, 'crossovers' withdrawals etc.).

### self-completion questionnaire

A questionnaire completed by the respondents without assistance from an interviewer or researcher, other than that contained in the covering letter and other instructions supplied. Usually administered by mail but may be handed out to respondents.

### semi-structured questionnaire

A pre-determined list of questions which may be varied, re-phrased and supplemented depending on the responses during the course of an interview. Probing is often permitted. Questions are usually open-ended. Not suitable for self-completion.

### sensitivity (detection rate)^{2}

In the context of psychometric properties, this is the ability of an instrument to measure change over time (e.g. it is sensitive enough to demonstrate that someone is recovering from depression, or has improved in their social life). In the context of diagnostic tests, it measures how good a test is in detecting those individuals who are truly diseased or have some condition (true positives). Sensitivity = (all testing positive and diseased)/(all diseased)= a/(a + c). The complement of sensitivity is the false negative rate: c/(a + c). Like specificity, sensitivity is usually not affected by changes in prevalence. However, it can be affected by spectrum bias. When a test has a high sensitivity, a negative test value rules the diagnosis out (SnNout).

### sensitivity analysis^{1, 2}

A 'what if' exercise to explore how robust are the conclusions of a study; the repetition of a particular procedure under different assumptions, with the intention to assess their impact on the results of a study or on logistic requirements. For example, in follow-up studies statistical analysis will be first carried out with the data available, which excludes data from patients lost to follow-up, and then repeated to include all subjects originally in the study. Outcomes for the missing observations are imputed, allowing for the best or worst scenarios, as appropriate to the aims of the study in question. This second analysis of the data may reveal results that are not consistent with the former analysis. In this case, the results first obtained should be carefully considered. This can be especially important if you are worried about the quality of your data or any assumptions that you may have made - which is often the situation when trying to collect information about the cost of different treatments. The principle is to see what happens if, instead of basing your conclusions on the particular assumptions you have made and the actual estimates of critical variables which you observed, e.g. clinical effectiveness or cost, you change your assumptions or substitute the lowest and highest plausible estimates for the critical variables (e.g. lower and upper ends of the confidence interval). Sensitivity analyses are best done using a spreadsheet programme (Excel or Lotus), that allows you to set up calculations and then systematically vary the assumptions/numbers on which the calculation is based. Sensitivity analysis is also used to calculate the sample sizes required given different scenarios, where either of the following may change: type l error power of the study, ratio between number of unexposed and exposed expected differences between groups, degree of variability of measurements, etc. In meta-analysis where some individual studies may be of lower quality than others, i.e., less valid (2) sensitivity analysis is used to assess the impact of removing such studies from the analysis.

### sequential trial^{1}

A trial in which the data are analysed after each patient's results become available, and the trial continues until a clear benefit is seen in one of the comparison groups, or it is unlikely that any difference will emerge. The main advantage of sequential trials is that they will be shorter than fixed length trials when there is a large difference in the effectiveness of the interventions being compared. Their use is restricted to conditions where the outcome is known relatively quickly.

### Shapiro Wilk test^{2}

A significance test used to assess departures from a Normal distribution. If the Shapiro Wilk test for a given variable gives a small P-value (say, <0.05), the assumption of Normality is usually rejected. A statistic frequently reported in addition to the test statistic W is V, which takes the value of 1 if a variable has a Normal distribution, or greater than 1 if not. An equivalent test is the Shapiro Francia test.

### significance tests

See statistical tests.

### simple random sample

A sample selected by random sampling.

### simple regression^{2}

A regression in which a single predictor variable is used in a model predicting an outcome.

### single blind

The investigator is aware of the treatment/intervention the participant is getting, but the participant is unaware. See also blinding, double blind, triple blind. (synonym: single masked)

### skewness^{2}

The quality of a distribution which has a relatively long left (negatively skewed) or right (positively skewed) hand tail. Positively skewed distributions can sometimes be converted into Normal distributions by taking logs of the original values. Such variables are said to have a lognormal distribution. Alternatively, data that display a skew can be analysed using non-parametric methods.

### slope

### SMR (standardised mortality ratio)^{1, 2}

This is the ratio between observed and expected numbers of an event (death or other), multiplied by 100: SMR= (observed/expected) xlOO SMRs can be computed by direct or, more commonly, indirect standardisation. An SMR of 100 suggests the rate of occurrence of the event in the study population is the same as in the standard or base population.

### snowballing^{3}

This is a non-probability method of sampling commonly employed in qualitative research. Recruited subjects nominate other potential subjects for inclusion in the study.

### social anthropology^{3}

This encompasses social scientific studies of peoples, cultures and societies; particularly associated with the study of traditional cultures.

### Spearman's rho^{2}

A nonparametric correlation coefficient based on ranks. It can be used on interval variables/ratio variables or ordinal variables.

### specificity^{2}

In the context of diagnostic tests, it measures how good a test is in detecting those individuals who are not diseased or do not have some condition (true negatives). Specificity = (all testing negative and non -diseased)/(all non -diseased) = d /(b + d) The complement of specificity is the false positive rate: b/(b + d). Like sensitivity specificity is not usually affected by changes in prevalence. However, it can be affected by spectrum bias. When a test has a high specificity, a positive test value rules the diagnosis in (SpPin).

### spectrum bias^{2}

A bias which occurs when estimating sensitivity and specificity in patients with different manifestations (severity) of the disease for which a given diagnostic test is intended. Spectrum bias may explain why different studies which evaluate the same diagnostic test, give different results. This type of bias is sometimes overlooked given the widespread belief that the sensitivity and specificity of a diagnostic test are immutable properties of the test.

### standard deviation (SD)^{2}

This reflects the spread of individual observations in a distribution. It is the square root of the variance or can be described as the average distance of individual observations from the mean. It is usually employed in conjunction with the mean to describe interval or ratio data.

### standard error (SE)^{2}

A statistic which indicates the degree of uncertainty in calculating a summary estimate from a sample. The sample size and the variability of individual measurements are the main determinants of the magnitude of standard errors. Standard errors are more easily interpreted if used to construct confidence intervals.

### standardisation^{2}

A statistical method used to compare rates in different populations. The rationale for standardisation is the potential for confounding that may lead to biased results. Standardisation is usually performed to adjust for different age and sex distributions in populations being compared. There are two methods: direct and indirect. Direct standardisation is used when studying large populations, and it involves the calculation of standardised event rates (commonly age standardised). These are calculated by applying the age specific rates observed in the study population (for example, the population in a particular country, region or town) to the age structure of some predetermined standard population (for example, the population in England & Wales is often used as the standard population in studies looking at different Regional Health Authorities). In the indirect method, the number of expected events in the study population is calculated, under the assumption that each age group in this population has experienced the same mortality or morbidity rates as the standard population. The ratio between observed and expected number of events produces a standardised event ratio (e.g. SMR). The study population can now be compared either to the standard population, or to another study population whose SMR has also been computed. This method is especially appropriate for the study of small study populations, since the study population's event rates for the different age groups will not be estimated with enough precision (2). It this cases it is preferable to work with the event rates for the standard population.

### standardised event rate^{2}

The adjusted mortality or morbidity rate (commonly age and sex-adjusted) obtained using direct standardisation methods.

### standardised mean difference^{1}

The difference between two means divided by an estimate of the within-group standard deviation. When an outcome (such as pain) is measured in a variety of ways across studies (using different scales) it may not be possible directly to compare or combine study results in a systematic review. By expressing the effects as a standardised value the results can be combined since they have no units. Standardised mean differences are sometimes referred to as a d index.

### statistical inference

See statistical tests.

### statistical power^{1}

The probability that the null hypothesis will be rejected if it is indeed false. In studies of the effectiveness of healthcare interventions, power is a measure of the certainty of avoiding a false negative conclusion that an intervention is not effective when in truth it is effective. The power of a study is determined by how large it is (the number of participants), the number of events (e.g. strokes) or the degree of variation in a continuous outcome (such as weight), how small an effect one believes is important (i.e. the smallest difference in outcomes between the intervention and the control groups that is considered to be important), and how certain one wants to be of avoiding a false positive conclusion (i.e. the cut-off that is used for statistical significance).

### statistical significance^{1, 2}

An estimate of the probability of an association (effect) as large or larger than what is observed in a study occurring by chance, usually expressed as a P-value. It is generally assumed that a result from a statistical test is statistically significant when the P-value associated with the same result is below a pre-determined (but arbitrary) cut off point, conventionally set at P=0.05. For example, a P-value of 0.049 for a risk difference of 10% means that there is less than a one in 20 (0.05) chance of an association that is as large or larger having occurred by chance and it could be said that the results are "statistically significant" at P = 0.05). The cut-off for statistical significance is usually taken at 0.05, but sometimes at 0.01 or 0.10. These cut-offs are arbitrary and have no specific importance. Although it is often done, it is inappropriate to interpret the results of a study differently according to whether the P-value is, say, 0.055 or 0.045 (which are quite similar values, not diametrically opposed ones). For correct interpretation, it is good practice to avoid expressions such as P<0.05 or "NS" when reporting the results of a test, and to quote exact P-values instead. In addition, confidence intervals should always be obtained for a better assessment of the clinical significance of the results.

### statistical tests^{2}

Tests performed with the purpose of assessing the plausibility of a given hypothesis. These hypotheses stem from questions such as "are smokers at greater risk of having lung cancer?" and "is drug A better than drug B in treating depression?" etc. The test will assess the compatibility of the data with the Null Hypothesis. A P value is produced.

### statistically significant difference^{1}

The size of difference in clinical effectiveness (or other outcome) between groups which would be unlikely to have been observed if, in truth, there were no difference. The crucial word here is unlikely - this is described in terms of probability, and the conventional definition of unlikely is a probability of 5% (0.05). So, if a difference between groups is statistically significant at p=0.05, this means that there is only a 5% chance, or a probability of 0.05, of obtaining the observed results if there were no difference between the groups.

### stepwise regression^{2}

A method of selection of variables to be included as predictors in multiple regression models. This can be carried out as forward or backward selection, and most statistical packages will perform the procedure in an automated way. The rationale behind it is the need to find predictors that relate independently to the outcome variable and to simplify explanatory or prediction models, thus avoiding having highly correlated predictor variables in the same model. When researchers have collected information on several potentially explanatory variables, they may start by finding which of these is most strongly associated with the response variable (forward selection). The residuals resulting from fitting a model with just this one variable are then correlated with the other predictor variables in turn, the one most strongly correlated with the residuals being added to the model. These steps are repeated until no more variables are found to make a statistically significant contribution to the model. In backward elimination all variables are included and subsequently dropped from the model if found to make no contribution to it. Best subsets regression is possibly a better alternative to stepwise methods, but fewer statistical packages perform this method. Because models are developed on the basis of data observed in a particular sample they need to be validated against an independent set of data, or against a random subset of the study data (not used to develop the model).

### strata^{2}

Levels of a categorical variable (or categorised quantitative variable). Each stratum corresponds to a single level or to a combination of levels of two or more factors.

### stratification^{2}

Computation of estimates or statistical tests for each stratum of a classifying variable. The rationale for stratification is confounding. Results from each stratum are summarised to produce single estimates or single test statistics across all strata. The Mantel-Haenszel chi squared test and Mantel-Haenszel estimates are methods used to obtain overall tests of significance and to pool estimates across strata.

### stratified random sampling

Aims to produce a sample that is representative of all strata in a given population. Usually, this is done by choosing the same proportion of individuals from each stratum so that the structure in the population is replicated in the sample. The population is first divided into strata and random sampling is conducted within each stratum (see proportionate stratified sampling and disproportionate stratified sampling)

### stratified randomisation^{1}

In any randomised trial it is desirable that the comparison groups should be as similar as possible as regards participant characteristics that might influence the response to the intervention. Stratified randomisation is used to ensure that equal numbers of participants with a characteristic thought to affect prognosis or response to the intervention will be allocated to each comparison group. For example, in a trial of women with breast cancer, it may be important to have similar numbers of pre-menopausal and post-menopausal women in each comparison group. Stratified randomisation could be used to allocate equal numbers of pre- and post-menopausal women to each treatment group. Stratified randomisation is performed either by performing separate randomisation (often using random permuted blocks) for each strata, or by using minimisation.

### strength of evidence

The confidence that you should attach to a study result. This depends on 2 things: (a) the size of difference observed between groups (usually this will be reflected in the 'p' value, increasingly small p values corresponding to stronger evidence) and (b) the rigour of the method used. If the rigour of the method used is poor (as may be the case for a descriptive or observational study), you can't exclude the possibility that the observed result arises from bias - unless the observed difference is huge! So these 2 things need to be considered together. They also need to be considered when choosing a research method for a study - if you suspect that the effect you are investigating is huge, then you may be able to convince people of its importance using a less rigorous study. Conversely, if you suspect that the effect of interest may be small, you will probably need to use a RCT to convince people of its importance. (Case studies/series of the clinical effectiveness of penicillin are the classic example of what constitutes a huge effect. No formal comparison was ever carried out, but before penicillin was available most patients with severe injuries died from infection, while most of those who were treated with penicillin survived.)

### structured questionnaire

A questionnaire with a uniform set of questions. Self-completion questionnaires must be structured. Structured questionnaires may be used with interviews and interviewers are normally only allowed to re-phrase or interpret questions within tightly defined limits. Questions are often closed-ended, but may be open-ended.

### study design^{2}

Chosen method of collecting the information necessary to answer a particular research question. It involves decisions on whether to intervene actively (clinical trial) or simply describe what is observed (observational study), on the timing for collecting information on exposure and outcome (follow-up or case-control study), on the choice of controls (parallel or crossover design), the required sample size etc.

### study validity

See validity.

### subgroup analyses^{2}

Analyses of subsets of data, composed of subjects with specific characteristics (e.g., females, elderly, etc.), with the view to assess the treatment effects for that particular subgroup, or to make comparisons with other sub groups. Although there is a justifiable clinical interest in doing so, such analyses are seldom carried out in an appropriate way. In particular, the issue of multiple significance testing should be considered at the planning stages. Also, the comparison of subgroups based on the comparison of P-values (obtained from within-group analyses) should be discouraged in favour of correctly testing for interaction (i.e., interaction between the 'treatment' and the characteristic defining the 'subgroups').

### summary measures^{2}

Or summary statistics, such as means, proportions, standard deviation etc., which summarise the information contained in several data values with a single value. Summary measures are frequently used in the analysis of repeated measurements. In this context, the choice of summary will depend on the way the variable of interest changes with time. Surveillance bias see information bias.

### surrogate endpoints^{1}

Outcome measures that are not of direct practical importance but are believed to reflect outcomes that are important; for example, blood pressure is not directly important to patients but it is often used as an outcome in clinical trials because it is a risk factor for stroke and heart attacks. Surrogate endpoints are often physiological or biochemical markers that can be relatively quickly and easily measured, and that are taken as being predictive of important clinical outcomes. They are often used when observation of clinical outcomes requires long follow-up. (synonym: intermediary outcomes; surrogate outcomes)

### surveys^{2}

Observational studies aimed at describing one or more characteristics of a given population. These can be the prevalence of a disease or the average value of a given measurement. Surveys are usually conducted by studying a cross section of the target population. Random sampling is of paramount importance in the conduct of surveys.

### survival analysis^{2}

Survival studies (where the outcome may be death or any other event of interest) are usually concerned with predicting length of survival given a number of characteristics or prognostic factors or with comparing the survival experiences of two or more groups of individuals. Censoring occurs frequently in this type of study. Duration of follow-up may also be different from subject to subject. Thus, methods used for proportions (e.g. "how many people died after 3 years?") or methods for quantitative data (e.g. "what was the mean length of survival?") should not be used. The log rank test is used for making comparisons between groups. In addition, Cox regression produces predictive models. Other methods of analysis include the construction of life tables and survival curves (Kaplan-Meier method). When planning a survival study, the sample size required depends upon the number of events and the rate at which they are expected to occur in the accrual and follow-up periods.

### systematic error

See bias.

### systematic random sampling

A sample in which every kth unit is selected. The starting point should be selected randomly from the first k units.

### systematic review^{1}

A review of a clearly formulated question that uses systematic and explicit methods to identify, select and critically appraise relevant research, and to collect and analyse data from the studies that are included in the review. The term encompasses the whole process of doing a systematic review, i.e., identification and selection of studies, assessment of their validity, and description of results. It may also includes the use of special statistical methods to obtain overall single estimates (for example, of the effect of a particular treatment versus a placebo) known as meta analysis. Publication bias is a common problem when conducting systematic reviews or overviews. See also Cochrane Review. (synonym: systematic overview)

### t test^{2}

Statistical tests used to compare the means of two groups. Paired or independent samples t tests are appropriate depending on study design. The number of degrees of freedom for the independent samples test is N-2, where N is the total sample size, n1 + n2. Both the test statistic and the corresponding degrees of freedom are referred to tables of the t distribution for assessment of statistical significance. An assumption of the independent samples t test is that the variable of interest has a Normal distribution and equal variances in the two groups being compared. Modifications to the standard test allow for unequal variances. (synonym: Student t-test).

### target population^{1, 2}

The total set of units (people, events, objects) for which answers are required. The population to whom the results of a given investigation (which studied a sample of subjects drawn from the population in question) are to be generalised.

### tender

An invitation to submit a proposal to carry out research in an area specified by the funding body (i.e. commissioned research). There are a number of commissioned research programmes funded directly by the NHS Executive R&D Directorate, e.g. mother and child health, health technology assessment.

### test for trend^{2}

When assessing the relationship between two categorical variables the Chi-squared test is normally used. If, however, one of the variables is a binary variable and the other is on an ordinal scale (2xc contingency table), it is of interest to compare not just the proportions but also to look for a trend, i.e. whether the proportions with a particular outcome increase or decrease linearly across levels of the ordered variable. This is called the chi squared test for trend, which has 1 degree of freedom. An example would be a cross-tabulation where chronic respiratory disease (yes/no) is the outcome and smoking (non, light smoker, moderate smoker, heavy smoker) represents exposure. The questions asked are a) is the risk of chronic respiratory disease different for different types of smokers and b) does the risk of chronic respiratory disease increase from non-smokers to heavy smokers. The chi squared trend statistic is always smaller than the standard chi squared statistic. The difference between the two statistics, on c-2 degrees of freedom, tests the departure from the assumption of linear trend (i.e., differences that are not explained by the linear trend). For quantitative data the Cuszick's test, ANOVA (with linear contrasts) or regression can all be used. Non-linear trend (U shaped curves, for example) can be assessed using regression methods.

### test of association

### theoretical sampling^{3}

In qualitative research, this is a sampling method whereby the sample is selected on the basis of the theory and the needs of the emerging theory. It does not seek to be representative. It is a sampling method in which the researcher selects new cases to study according to their potential to expand on or refine the concepts and theory that have already been developed. Data collection and analysis proceed together.

### theoretical saturation^{3}

In qualitative research: this is the point at which the researcher is not discovering new information from the theoretical sampling of cases.

### Thick description^{3}

In qualitative research: this refers to capturing the meanings and experiences that have occurred in a problematic situation. It reports meanings, intentions, history, biography and relevant relational, interactional and situational processes in a rich, dense and detailed manner. It creates the conditions for interpretation and understanding, in contrast to * thin description*, which is factual.

### thin description^{3}

In qualitative research: a description lacking detail; a simple reporting of acts, independent of intentions or the circumstances that organise an action; a gloss.

### theme^{3}

This is a recurring issue that emerges during the analysis of qualitative data.

### therapeutic trial

See clinical trial.

### topic guide^{3}

In qualitative research: this is a list of topics to act as an aide-memoir for a qualitative researcher when conducting an in-depth interview or focus group. Qualitative researchers would normally use a topic guide rather than a structured questionnaire.

### transferability^{3}

In qualitative research: the extent to which a qualitative study is transferable to a similar setting elsewhere i.e. transferability is equivalent to generalisability. It is often argued that no claims can be made about the applicability of the findings of a qualitative study to other settings. If other researchers wish to generalise from a study to other situations, the onus must be on them rather than the original researcher to demonstrate a study's applicability elsewhere.

### transformations^{2}

Of data which are not Normally distributed can often be performed so that this assumption can be satisfied. Parametric methods of analysis can then be used on the transformed data. In particular, data with a positive skew may benefit from a logarithmic transformation. Geometric means are calculated after a log transformation, by back transforming the arithmetic mean obtained on the log values. Another common transformation is the logit, in the context of logistic regression.

### treatment period interaction^{2}

In the context of crossover trials, this type of interaction occurs when the difference observed between any two treatments varies, depending on whether the comparison is made in the first or in the second period of the trial. This is usually due to the effect of one of the treatments given in the first period being carried over into the second period. Thus, in planning crossover trials, it is important to allow for sufficiently long washout periods.

### triangulation^{3}

In qualitative research: the term used in a research context to describe the use of a variety of data sources or methods to examine a specific phenomenon either simultaneously or sequentially in order to produce a more accurate account of the phenomenon under investigation.

### triple blind^{1}

An expression that is sometimes used to indicate that knowledge of which study participants are in which comparison group is kept secret from the statistician doing the analysis as well as from the study participants and investigators (outcome assessors). See also blinding, single blind, double blind. (synonym: triple masked)

### two-sided test^{2}

A significance test which explores both alternatives to the Null Hypothesis. For example, if making a comparison between the means of two groups, the Null Hypothesis is that the means are similar. The Alternative Hypothesis, as opposed to a one-sided test is that the mean in one of the groups can be either greater or smaller than the mean in the other group.

### two-way ANOVA^{2}

Analysis of variance of data classified according to two factors or characteristics (e.g. ethnic group and gender). Here, the total sum of squares is partitioned between main effects (the factors), and residual. When measurements (e.g. blood pressure) are replicated for each subject, it is also possible to check whether there is an interaction between the two factors.

### two-way table

Contingency table with two rows and two columns (i.e. 4 cells).

### type I error

See P value. The probability of rejecting a true Null Hypothesis.

### type II error^{2}

The probability of failing to reject the Null Hypothesis when the latter is false. This probability becomes smaller with increasing sample size. The greater the probability of a type II error, the weaker the power of a study to detect differences as statistically significant when such differences exist.

### unit of allocation^{1}

The entity that is assigned to different comparison groups in a trial. Most commonly, individuals are allocated, but in some trials people are assigned to the intervention and control groups in groups to avoid contamination or for convenience; for example, practices, hospitals or communities can be allocated. See unit of analysis error.

### unit of analysis error^{1}

In some studies people are allocated in groups instead of individually (e.g. by practice, by hospital or by community). Often when this is done the unit of allocation is different from the unit of analysis, i.e. people are allocated by groups and analysed as though they had been allocated individually. This is sometimes called a unit of analysis error. Effectively, using individuals as the unit of analysis when groups of people are allocated increases the power of the studies by increasing the degrees of freedom. This can result in overly narrow confidence intervals and false positive conclusions that the intervention had an effect when in truth there is greater uncertainty than what is reflected by the P-value. In the context of a review, it can result in studies having narrower confidence intervals and receiving more weight than is appropriate.

### units

Individual members of the target population in a survey. Units may be people, events, objects etc.

### utility^{1}

In economic and decision analysis, the desirability of an outcome, usually expressed as being between zero and one (e.g. death typically has a utility value of zero and a full healthy life has a value of one).

### validity^{1, 2, 3}

Literally, the truth of a study - hence, its soundness or rigour. A study is valid if the way it is designed and carried out means that the results are unbiased - i.e. it gives you a true estimate of clinical effectiveness. (Note that the study gives a true estimate, but not the truth, because of sampling error - see confidence interval above.) The term "validity" is also used in the context of a measure successfully assessing what it sets out to measure, usually accompanied by a qualifying word or phrase; for example, expressions such as "construct validity", "content validity" and "criterion validity" are used. The expression "internal validity" is sometimes used to distinguish validity (the extent to which the observed effects are true for the people in a study) from external validity or generalisability (the extent to which the effects observed in a study truly reflect what can be expected in a target population beyond the people included in the study). See also methodological quality, random error. (synonym: internal validity). In the context of clinical measurement, this term refers to whether a particular measurement does in fact measure the characteristic which is of interest (for example, does forced expiratory volume at 1 minute reflect lung function?). A valid measurement must be accurate and reliable in order to be useful. However, these are necessary but not sufficient conditions for validity. Validity is also used to refer to a measurement or assessment that is not biased. In surveys validity is achieved mainly by random sampling, and in clinical trials by randomisation. Randomisation ensures the internal validity of the results, whereas the composition of the study sample determines the generalisability or external validity of the results. In diagnostic tests, validity is the soundness or rigour of a study. A study is valid if the way it is designed and carried out means that the results are unbiased - that is, it gives you a 'true' estimate. In qualitative research: validity refers to how representativeness of the sample chosen in relation to the population under investigation. How applicable is it to those outside the location of the study (external validity) and is the researcher representing the views of those studied (internal validity)? These are some of the issues raised by problems of participant observation: how reliable and replicable it is. Yet for exponents the key issue is how to get close enough to the group studied to understand their meanings properly, and without changing their behaviour.

### variability^{2}

Variability is present when differences are observed among different individuals or within the same subject, with respect to any characteristic or feature which can be assessed or measured. The main purpose of statistics is to unravel underlying patterns which may be obscured by natural and random variation. Commonly used measures of variability or spread are. Standard deviation variance range interquartile range among others.

### variable^{1}

Any quantity that varies. A factor that can have different values.

### variance^{2}

In the context of quantitative measurements, it is a measure of the spread, variability, or variation shown by a set of observations, defined by the sum of the squares of deviations from the mean, divided by the number of degrees of freedom in the set of observations. The square root of the variance is the standard deviation.

### Venn diagram

A pictorial presentation of the extent to which two or more quantities or concepts are mutually inclusive and mutually exclusive.

### volunteer bias^{2}

A type of bias which occurs particularly in cross sectional, studies when it is left to participants to provide the information being collected. For example, in a study where questionnaires are sent to all residents of a particular area, or to all patients registered with a given General Practice, some people will return the study questionnaires (responders or volunteers) and some people will not (non-responders). Studies have shown volunteers to be different from non-responders, in terms of demographic characteristics and risk factors for disease (and therefore, likely outcomes). Thus, in this type of study, non-response should be kept at very low levels.

### wash out period^{2}

In the context of crossover trials, it refers to the period of time allowed between two consecutive treatments, to prevent the effect of treatments given in one period being carried over into the next period. The effect of the treatments given in the second period can then be assessed independently, without contamination. Washout periods are usually necessary because of the possibility that the intervention administered first can affect the outcome variable for some time after treatment ceases. A run in period before a trial starts is sometimes called a washout period if treatments that participants were using before entering the trial are discontinued.

### weighted least squares regression^{1}

In meta-analysis, a meta-regression technique for estimating the parameters of a multiple regression model, wherein each study's contribution to the sum of products of the measured variables (study characteristics) is weighted by the precision of that study's estimate of effect.

### weighted mean difference^{1}

In meta-analysis, a method of meta-analysis used to combine measures on continuous scales (such as weight), where the mean, standard deviation and sample size in each group are known. The weight given to each study (e.g. how much influence each study has on the overall results of the meta-analysis) is determined by the precision of its estimate of effect and, in the statistical software in RevMan and CDSR, is equal to the inverse of the variance. This method assumes that all of the trials have measured the outcome on the same scale. See also standardised mean difference.

### Wilcoxon matched pairs signed rank test^{2}

A non-parametric significance test used to compare paired ordinal or interval/ratio variables when the assumption of Normality for the paired t test cannot be met.

### Wilcoxon rank sum test^{2}

A significance test which has the same purpose and is mathematically equivalent to the Mann Whitney U test.

### withdrawals^{2}

In the context of clinical trials, withdrawals are subjects who do not follow the trial protocol, either because their clinicians have decided to remove them from the trial, or because the patients themselves choose to dropout (possibly due to factors associated with the intervention). 'Crossovers' are another example of protocol violation. Intention to treat analysis minimises the potential bias that arises from these situations.

### x variable^{2}

Also termed independent, explanatory or predictor variable. In scatter diagrams it is plotted on the horizontal axis (ordinate).

### y variable

Also termed dependent, response or outcome variable. In scatter diagrams it is plotted on the vertical axis (abscissa).

### z scores^{2}

Measurements (e.g. height) which are expressed in units of standard deviation (SD). For example, if the mean height of a group of people is 172 cm with SD 10 cm, a person measuring 182 cm has a z score of 1 (i.e., 1 SD away from the mean). When the mean is subtracted from individual measurements that follow a Normal distribution, and the result divided by the SD, the measurements are converted into a distribution with mean 0 and SD 1: z score = (observation - mean)/SD

### z test

A significance test which is used for comparing means or proportions between two groups.