CS 594: Empirical Methods in HCC Introduction to Meta Analysis Dr. Debaleena Chattopadhyay Department of Computer Science [email protected] debaleena.com hci.cs.uic.edu Agenda
What is a meta-analysis? Why study meta-analysis in HCI? When to do a meta-analysis? How to do a meta-analysis? Reality check
Consider the following results: 1. Planned contrasts revealed that Flow menu (3466 ms) took significantly more time than the Finger Count menu (3095 ms), p <. 001. Planned contrasts revealed that Marking menu (3646 ms) took significantly more time than the Finger Count menu (3395 ms), p <. 05. 2. Planned contrasts revealed that both Flow menu (3466 ms) and Marking menu (3646 ms) took significantly more time than the Finger Count menu (3335 ms), p <. 01. How would your interpretation about the results differ in
each case? (about which menu is more efficient) Reality check Consider the following results: 1. Planned contrasts revealed that Flow menu (3466 ms) took significantly more time than the Finger Count menu (3095 ms), p <. 001. Planned contrasts revealed that Marking menu (3646 ms) took significantly more time than the Finger Count menu (3395 ms), p <. 05. 2. Planned contrasts revealed that both Flow menu (3466 ms) and Marking menu (3646 ms) took significantly more time than the
Finger Count menu (3335 ms), p <. 01. All is interpreted the same. That there is a high probability that Flow menu and Marking menu will be significantly faster than Finger Count menu when used by a participant randomly chosen from the population. p-value does NOT measure how much 1. Planned contrasts revealed that Flow menu (3466 ms) took significantly more time than the Finger Count menu (3095 ms), p <. 001. Planned contrasts revealed that Marking menu (3646 ms) took significantly more time than the Finger Count menu (3395
ms), p <. 05. 2. Planned contrasts revealed that both Flow menu (3466 ms) and Marking menu (3646 ms) took significantly more time than the Finger Count menu (3335 ms), p <. 01. 1 would NOT mean that Flow menu is more efficient than Marking menu compared with Finger Count menu . Beyond p-value 1. Planned contrasts revealed that Flow menu (3466 ms) took
significantly more time than the Finger Count menu (3095 ms), p <. 001 with a high effect size d = 0.8. Planned contrasts revealed that Marking menu (3646 ms) took significantly more time than the Finger Count menu (3395 ms), p <. 05 with a low effect size d = 0.2. 2. Planned contrasts revealed that both Flow menu (3466 ms) and Marking menu (3646 ms) took significantly more time than the Finger Count menu (3335 ms), p <. 01 with a low effect size d = 0.3. How would your interpretation about the results differ in each
case? (about which menu is more efficient) Beyond p-value 1. Planned contrasts revealed that Flow menu (3466 ms) took significantly more time than the Finger Count menu (3095 ms), p <. 001 with a high effect size d = 0.8. Planned contrasts revealed that Marking menu (3646 ms) took significantly more time than the Finger Count menu (3395 ms), p <. 05 with a low effect size d = 0.2. 2. Planned contrasts revealed that both Flow menu (3466 ms) and Marking menu (3646 ms) took significantly more time than the Finger Count menu (3335 ms), p <. 01 with a low effect size d = 0.3.
1 would mean that Flow menu is more efficient than Marking menu compared with Finger Count menu. Or more formally, the differences between FM and FCM is more or stronger than the differences between MM and FCM. 2 would not. What is a meta-analysis? What is a meta-analysis?
What is a meta-analysis? Meta-analysis refers to the statistical synthesis of results from a series of studies. While the statistical procedures used in a meta-analysis can be applied to any set of data, the synthesis will be meaningful only if the studies have been collected systematically. If the effect size is consistent across the series of studies, these procedures enable us to report that the effect is robust across the kinds of populations sampled, and also to estimate the magnitude of the effect more precisely than we could with any of
the studies alone. Meta-analyses are conducted to synthesize evidence on the effects of interventions and to support evidence-based policy or practice. Narrative reviews to systematic reviews to meta-analysis Prior to the 1990s, the task of combining data from multiple studies had been primarily the purview of the narrative review. An expert in a given field would read the studies that addressed a question, summarize the findings, and then arrive at a conclusion.
One limitation is the subjectivity inherent in this approach, coupled with the lack of transparency. A second limitation of narrative reviews is that they become less useful as more information becomes available. Beginning in the mid 1980s and taking root in the 1990s, researchers in many fields have been moving away from the narrative review, and adopting systematic reviews and meta-analysis. For systematic reviews, a clear set of rules is used to search for studies, and then to determine which studies will be included in or excluded from the analysis. Not all systematic reviews are meta-analysis.
Narrative reviews to systematic reviews to meta-analysis Unlike the narrative review, where reviewers implicitly assign some level of importance to each study, in meta-analysis the weights assigned to each study are based on mathematical criteria that are specified in advance. While the reviewers and readers may still differ on the substantive meaning of the results (as they might for a primary study), the statistical analysis provides a transparent, objective, and replicable framework for this discussion.
Met-analysis is commonly used in medicine, pharmaceutical studies, education, psychology, criminology, and business. For example, In the field of education, meta-analysis has been applied to topics as diverse as the comparison of distance education with traditional classroom learning. HCI is catching up What is a meta-analysis? A statistical analysis which combines the results of several independent studies considered by the analyst to be combinable --- Huque, 1988
The statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings --- Glass, 1976 Meta-Analysis Presentation: Forest Plot Meta-Analysis Presentation: Forest Plot What is a meta-analysis?
How do we choose these studies? How do we calculate the weight? How do we calculate the summary estimate? Why study meta-analysis? What is the direction of effect? What is the size of effect? Is the effect consistent across studies? What is the strength of evidence for the effect?
Why study meta-analysis? Why study meta-analysis in HCI? Although HCI or HCC is not equivalent to medicine, i.e., we do not deal with life and death on the daily basis, computing technologies are becoming more integral to daily life than before. Consider the autonomous car, the wearables to supplement drug therapy, or immersive technologies for improving educational outcomes. Evidence based adoption will become more
important than ever before in human-centered computingvery soon. When to do a meta-analysis? When you want to know: strength of evidence combine results quantitatively When more than one study has estimated an effect When the differences in the study characteristics
are unlikely to affect the intervention effect When the treatment effect have been measured and reported in similar ways (or when the data are available) When not to do a meta-analysis? A meta-analysis is only as good as the studies in it Beware of reporting biases Studies must address the same question. Though the question can, and usually must, be broader Mixing apples with oranges
Not useful for learning about apples or oranges, although useful for learning about fruit! An analysis of all your prior studies *only* is not a meta-analysis. How to do a meta-analysis? Overview of steps
Frame your research question Develop search protocol Run search strategy Retrieve and de-duplicate citations Screen titles/abstracts
Conduct qualitative synthesis Conduct meta-analyses Write report of systematic review How to do a meta-analysis? Broadly speaking
Identify research question Search and identify a set of studies Conduct qualitative synthesis Conduct quantitative meta-analyses Identify research question Conceptualize Operationalize RQ Identify research question Conceptualize Operationalize RQ Type question
Example Prevalence What is incidence of autonomous vehicle use by older adults in urban areas compared to rural areas? Intervention
Are fitness trackers effective in reducing obesity? Diagnosis Are avatar based screening tests effective in detecting social anxiety disorder? Etiology social media uses causally associated
farIsbetter an approximate answerwith to the right teen depression? question, which is often vague, than an exact answer to the wrong question, which can always be made precise. Search and identify a set of studies Ways similar studies can differ:
User population Intervention composition/ timing Outcome definition Experimental design and execution Analysis
Components of well-constructed questions: Population Intervention
Comparison group(s) Outcome Time Settings Search and identify a set of studies Search and identify a set of studies Conduct qualitative synthesis Document your protocol
Conduct quantitative meta-analyses First, we work with effect sizes (not p-values) to determine whether or not the effect size is consistent across studies. The terms treatment effects and effect sizes are used in different ways by different people. You will need to know the effect size and the precision of the observed effect size. How to choose an effect size?
The effect sizes from the different studies should be comparable to one another in the sense that they measure (at least approximately) the same thing. Estimates of the effect size should be computable from the information that is likely to be reported in published research reports. That is, it should not require the reanalysis of the raw data (unless these are known to be available). The effect size should have good technical properties. For example, its sampling distribution should be known so that variances and confidence intervals can be computed.
What parameters to use? If the summary data reported by the primary study are based on means and standard deviations in two groups, the appropriate effect size will usually be either the raw difference in means, the standardized difference in means, or the response ratio. If the summary data are based on a binary outcome such as events and non-events in two groups the appropriate effect size will usually be the risk ratio, the odds ratio, or the risk difference. If the primary study reports a correlation between two
variables, then the correlation coefficient itself may serve as the effect size. Effect sizes based on Means Use Cohens d or Hedges g; The standardized mean difference (d or g) transforms all effect sizes to a common metric Factors affecting precision of the effect size: Sample size Study design
Studies that yield more precise estimates of the effect size carry more information and are assigned more weight in the meta-analysis. Fixed effect vs. Random Effects A studys true effect size is the effect size in the underlying population, and is the effect size that we would observe if the study had an infinitely large sample size (and therefore no sampling error). A studys observed effect size is the effect size that is actually observed. Under the fixed-effect model we assume that there is one true effect size (hence the term fixed effect) which underlies all the studies in the
analysis, and that all differences in observed effects are due to sampling error. Under the random-effects model we allow that the true effect could vary from study to study. For example, the effect size might be higher (or lower) in studies where the participants are older, or more educated, or healthier than in others, or when amore intensive variant of an intervention is used, and so on. The effect sizes in the studies that actually were performed are assumed to represent a random sample of these effect sizes (hence the term random effects). Fixed effect vs. Random Effects
Keep in mind that the summary effect is nothing more than the mean of the effect sizes, with more weight assigned to the more precise studies. Fixed effect vs. Random Effects Measuring the weight for each study (RE) Measuring the weight for each study (RE) tau-squared is the between-studies variance
Measuring the weight for each study (RE) Measuring the summary effect Measuring the summary effect Meta-Analysis Presentation: Forest Plot What is a meta-analysis? How do we choose these studies?
How do we calculate the weight? How do we calculate the summary estimate? More to read Introduction to Systematic Review and MetaAnalysis