Home : Evidence Based Medicine : Evaluate : Evaluating Research : Is the Research Believable?
The issues around determining the validity of studies are different to that of syntheses. Therefore, you need to be clear about what you are evaluating. If you have not yet done so, read the section on on Classification of Medical Research to delineate these types of research, and then depending what you are evaluating read the individual section on Studies or Syntheses below.
For studies we first need to evaluate what the intervention is compared against. Ideally, an intervention should be compared against another intervention, or at least placebo, to evaluate whether it works. If no comparison group is included (such as in case reports or case series) one cannot be sure whether the findings are really due to the intervention or the natural history of the disease.
If a comparison group is included we next eliminate other factors that could explain the findings, in particular chance, bias or confounding. If these explanations are eliminated the findings are believable and more likely due to the intervention itself.
What we want to know is whether the reported differences or lack of differences are likely due to chance. This is where statistics are helpful. However, few of us are statisticians. At the very least, understanding the parameters of p-value and power helps determine whether findings are due to chance.
In brief, when one sees a finding, it should be associated with a statistic called a p-value, which represents the probability the finding occurred by chance. It is generally accepted that when the p-value < 0.05 the result is considered statistically significant. This means one agent is different from another (i.e., either more effective, safer or better tolerated) and that this difference is unlikely due to chance (i.e., in simplest terms there is less than a 5% probability that the finding occurred by chance).Bias happens in the design or implementation of the study, making the findings look bigger or smaller than they really are.
Bias can occur in 5 main areas of a study:
When subjects are studied they are assigned to or observed in different groups. Thus, the first question relating to bias is, “Are the study groups well randomized?” In other words, did the allocation of individuals to their respective groups result in their having similar characteristics? You evaluate for this by looking at the table reporting the demographic information of participants in the study (usually presented in Table 1). If the groups appear to be dissimilar, such as relates to their disease severity, the findings could be biased by these differences rather than due the intervention itself.
Just because the groups are randomized, this doesn’t mean they will still be equal when the study is analyzed. Patients drop out of studies. Therefore, we next ask, “How many participants dropped out of each arm of the study and were dropouts sicker in one arm than the other?” In general, if more than 20% of the study participants dropout, this raises concern about bias, especially if the dropouts in one of arm of the study are sicker than the other, which could result in the groups ultimately having different severity of disease when the study is analyzed.
Here we look at issues around how the interventions were given and taken in each group. The issue of “blinding” is important here. In “double-blinded” studies, both the patient and researcher do not know what intervention was given. In “single-blinded” studies the patient does not know what was given but the researcher does, which can lead to bias. Thus, with single blinded studies we need to ask, “Were there any other interventions (co-interventions), given to a particular study group?” For example, did one group received additional education around managing their disease or more intensive coaching on how to take their medications? Another question we need to ask, “What was the compliance and was it equal in each study group?” If more participants in one group took medication than the other this would also bias the results.
Again, the issue of blinding is relevant here. In single-blinded studies we ask, “Were the outcomes measured any differently in one study group than the other?” The measurement of outcomes is not always clear cut and often relies on the interpretation of the researchers. In single blinded studies researchers’ knowledge of who received what interventions can sway their interpretations of outcomes and bias the findings. Double-blinding prevents such bias here.
Here we are somewhat at the mercy of the statisticians. Unless we have a strong background in statistics, it will be difficult to determine whether an incorrect approach was used to analyze the data. This is one of the reasons to see if the study was published in a reputable journal, where statisticians are often asked to review the study before it is published to ensure it was analyzed properly. However one question that is important, especially in pharmaceutical industry-sponsored trials, is, “Was the study triple-blinded?” Here we want to make sure that not only did patients and researchers not know to which groups the participants were assigned, but also that the statisticians analyzing the data were unaware. This decreases the likelihood of post-study analytic decisions being made that can bias the findings.
As you identify each of the above biases it’s also important to assess its direction of effect. To determine this you ask, “In the absence of the bias would the findings be stronger or weaker than that reported?” In doing so you may discover that, rather than undermining the study, the bias makes the reported findings even more compelling.
Unlike bias which makes the result look bigger or smaller than it really is, in confounding, the result is real. It’s just that the confounder, another factor that is associated with the intervention and is itself a cause of the outcome, is what really explains the findings.
For example, imagine a study on whether there is a relationship between carrying matches and lung cancer, and the study found that individuals who carried matches were 18 times more likely to have lung cancer than those who did not. Assuming the result is statistically significant and unlikely due to chance and that bias did not affect the finding, we can assume this finding is indeed true. However, in this case, something else is clearly responsible for the outcome. Smoking tobacco, which is associated with carrying matches and the real cause of the lung cancer, is the confounder.
Researchers attempt to eliminate or adjust for confounders, in the latter case using statistics such as logistic regression or multivariate analysis. However, under this method, only known confounders can be adjusted for. There is a possibility that an unknown confounder may be lurking out there and that, rather than the intervention, this factor is the true cause of the outcome. This explains the strength of a randomized-controlled trial (RCT) over all other study designs. In a well-done RCT, where the groups are equal in their characteristics at the time of analysis, both known and unknown confounders are randomized equally to the individual study groups, thereby eliminating confounding from explaining the findings.
Thus, when it comes to evaluating for confounding we need to ask, “Was this a well-done randomized-controlled trial?” If so, confounding is unlikely. If not, we need to also ask, “Did the study adjust for all known confounders?” (Consider all the factors you can think of that could be associated with the interventions and be a cause of the outcome, and check whether these were included.) Yet, even if the study did account for all known confounders you would still be left with the possibility of unknown confounders.
In summary, if the study is unlikely to be explained by chance, bias or confounding, the findings are believable. In addition, if the study is supported by numerous other studies showing similar findings and there is a good physiologic rationale (i.e., biological plausibility) to explain what was found, this further elevates a belief in the findings.
If so, this provides some reassurance that the researchers have not simply cherry picked articles that support their position or opinion.
Ideally the search was conducted in multiple databases and using other approaches too. What makes the Cochrane Collaboration impressive is that their researchers not only search the MEDLINE database, but also others like EMBASE and CINAHL. Further, they hand search the literature and then contact the pharmaceutical industry and experts in the field to identify all published and unpublished literature. The latter is particularly important because negative trials may be less likely to be published and their absence in syntheses can result in publication bias that makes the intervention look more impressive than it really is.
A conclusion of a synthesis is only as good as the research it incorporates (i.e., garbage in, garbage out). It is therefore important to look at whether and how the researchers evaluated the strength of the studies they incorporated into the synthesis.
Researchers may differ in the way they interpret and extract information from studies. To ensure this process is reliable check to see whether multiple researchers independently extracted the data, whether there was general agreement, and if not, how they resolved their disagreements.
This is important to ensure that recommendations made are balanced and take multiple perspectives into account. For example, that of healthcare providers, allied healthcare professionals and patients.
Overviews and Meta-Analyses synthesize information from studies to provide summary findings. Practice Guidelines synthesize information from studies and other syntheses to provide recommendations. The issues to consider in whether this was done appropriately are a little different for each.
For Overviews and Meta-Analyses:It’s important to ensure that the summary finding is not simply a matter of counting positive and negative studies. Rigorously done meta-analyses summarize their findings with mathematical formulas that give more weight to the largest studies. (For example, they incorporate the standard deviation into the formulas used to combine the data.)
A good meta-analytic synthesis should combine apples and apples, not apples and oranges. There are specific statistical tests that evaluate for this. For example, when you read about “tests of heterogeneity” and the authors’ comments that the findings were “homogeneous” you can feel some reassurance that the studies were sufficiently similar that they could be combined.
Recommendations by healthcare providers and patient choices are influenced by the risk and benefit of a particular course of action. The strength of the recommendation should reflect whether the benefits outweigh the risk of harm.
It’s important to ensure that the strength of the recommendation is tied to the strength of the underlying evidence and not simply a matter of expert opinion. There are numerous rating schemes (which can make this a little confusing). In general, they all rank rigorously performed meta-analyses and randomized control trials as the strongest type of evidence; observational trials, like cohort and case control studies, weaker; and expert opinion as the weakest type of evidence.
Once you are satisfied that what you are reading is believable you are ready to consider what are the important findings.