7 Experiments

By S.R. Gubitz

7.1 Introduction

Perhaps you have wondered what might have happened along a path you did not take. Or perhaps you have speculated about how differently history would turn out if some key event did not occur the way it did in reality. For example, if the Supreme Court of the United States in 2000 had allowed presidential election recount efforts to continue in Florida, might Al Gore have beaten George W. Bush to become president? Or, if you had not skipped breakfast this morning, might you feel a little less groggy right now? In practice, a path not taken is the same as one that never existed; we do not get to run history twice like a computer simulation to observe the path not taken. But, in some scientific contexts, you can observe both paths at once. This is the nature of an experiment; we get to cheat history, time, and space to observe the otherwise un-observable.

Political science finds experiments especially useful, because oftentimes the path not taken has serious political or societal ramifications. For example, what happens to voter turnout rates when political parties decide to ignore communities of color, and what might happen if they do not? In politics, many wish that they can turn back the clocks and run history twice. In political science, that is sometimes entirely possible in an experimental research design.

7.2 Background

The first recorded experiment occurred in 1835, in Nuremberg, Germany (Jamison 2019). The researchers conducting this experiment were interested in the effects of a certain homeopathic medicine: the inclusion of small amounts of salt in water. The researchers divided 100 local residents into a treatment group of 50 that received salt in a vial of water, and a control group of 50 that only received a vial of water. Participants were later examined to see the effect of the salt water on any physical ailments; the researchers found that the intervention had no effects.

It took nearly 100 years for political science to attempt its first recorded experiment (although the discipline was fairly new at the time, having just split off from history and economics). Harold Gosnell conducted an experiment around the 1924 US presidential election to test the effects of mailed postcards on voter turnout (Gosnell 1927). Gosnell sent out postcards to certain wards, randomizing which half of the ward would get the postcards and which would not. He found modest effects, setting the stage for future work on randomized get-out-the-vote (GOTV) efforts.

But Gosnell’s work was not immediately appreciated by political scientists, and it took nearly another 60 years before serious experimental work began to be taken seriously in the discipline. Shanto Iyengar, Mark Peters, and Donald Kinder invited participants to watch television news programs at the University of North Carolina, Chapel Hill (Iyengar, Peters, and Kinder 1982). What the participants did not know, however, is that they were randomly assigned to watch some newscasts that had been edited to emphasize certain stories over others. These manipulations resulted in the control group assessing the president’s performance differently, based on the issues they were exposed to in the television news program.

And from that point, experiments began steadily becoming a mainstay research method in political science. The figure below shows the number of experiments published every year in the American Political Science Review, one of the top publications in political science. As the figure shows, following Iyengar, Peters, and Kinder’s work in the 1980s, experiments started to be published in this journal at an increasing rate. However, it is important to keep in mind that experiments are not even close to being the dominant research method. If you have already read ch06, then you already know a good deal about the most dominant quantitative research method: surveys.

totals are aggregated per decade

Figure 7.1: totals are aggregated per decade

7.3 Method: setup/overview

Experiments, or RCTs (randomized control trials) as they are often called, involve the randomized assignment of individuals into one of two groups (in the most basic design): a treatment group that receives some intervention; and a control group that does not. This design allows the researcher to determine the effect of some intervention (e.g., individualized tutoring) by comparing the value of some outcome (e.g., test scores) in the treatment group to the control group that did not receive the intervention. Do note, however, that most experiments have several treatment groups and some even have several control groups depending on to what they need to make their comparison.

It is important to differentiate the random assignment necessary to conduct an experiment from the random sampling you learned about in ch06. In that chapter, it was explained that random sampling is when you randomly select individuals from some population you are interested in studying to be a part of your survey. But random assignment has nothing to do with that sampling technique. Rather, random assignment means taking your sample, however it was collected, and randomly assigning them to your treatment group(s) or control group(s). This random assignment is necessary to ensure that the treatment and control groups have nearly the same odds of being comparable to each other in terms of demographic characteristics of your overall sample. So, if you have 50 African Americans in your sample of 250 people, and you have four treatment groups and one control group, then random assignment should result in groups of near 50 people each, with 10 African Americans per group.

A consequence of this requirement that the sample be randomly assigned to experimental groups means that how the sample was collected is less important. This, again, is a deviation from what you learned about surveys in ch06, where sampling mattered a great deal. But a good experiment can rely on what is called a convenience sample, or a sample that is made up of easy to reach people. For instance, for much of the 20th century, researchers often placed newspaper advertisements to construct their samples. Nowadays, however, there are entire online services built around getting researchers participants for their experiments. But these convenience samples do not undermine the legitimacy of the experiment, unless the researchers are trying to generalize to a certain population. If you had a convenience sample of your friends and family, you would be hard pressed to say that any results from an experiment on that sample could generalize to the population of a country.

Check-in Question 1: Is “random assignment” another way of saying “random sampling” and vice versa? If not, how are they distinct from one another?

7.4 Method: detail (types of experiments)

As you can probably gather from the few examples provided so far, there are many different types of experiments. Each offer their own unique way of answering certain scientific questions that the others cannot. In the following sections, we will review four types of experiments and example from political science for each.

7.4.1 Surveys vs Survey Experiments

The growth of online survey platforms, such as Survey Monkey and Qualtrics, have resulted in a similar growth in the use of survey experiments. While you might already be familiar with surveys thanks to ch06, it is important to differentiate survey experiments. That is, survey experiments are experiments that are embedded in surveys. Within such a survey, participants answer questions and read materials just as they would in any survey. But at some point, respondents are randomly assigned to a treatment or control group.

What is advantageous about experiments embedded in a survey like this is that it is extremely easy to do. Because the surveys are often disseminated online, that means the researcher does not need a physical space where the experiment will be administered. Further, survey respondents are quite easy to obtain while providing modest compensation. Survey firms like YouGov provide samples to respond to researcher-provided surveys for a few dollars a respondent; the samples can even be constructed in ways that resemble national representativeness without true random sampling. For the more-frugal researchers, survey experiments can be administered on Amazon’s Mechanical Turk (MTurk) service, which people perform various tasks online for monetary compensation; these are usually menial tasks like testing website functions. But, in recent years, political scientists have been using MTurk to disseminate their survey experiments, as the service is far more affordable than a professional survey firm.

For all of its advantages, survey experiments suffer from the limitations of their medium; that is, a researcher can only craft treatment interventions that can be disseminated via survey materials. Oftentimes, this means text-based treatments that require the respondent to read (a burden that any undergraduate student can sympathize with); this can be troublesome when the researchers cannot prove the respondents read the treatment text, leading to false conclusions about the effectiveness (or lack thereof) of the experiment. Also, survey experiments rely on self-reported outcomes to draw their conclusions, which means that the respondent was allowed to report how they felt or thought at that time. This is problematic if there is greater incentive to not be entirely forthcoming on the survey, or if people are simply bad at assessing certain psychological states (e.g., "how angry are you right now on a scale of 1-7?").

For example, Dingding Chen, Chao-yo Cheng, and Johannes Urpelainen (Chen, Cheng, and Urpelainen 2016) study how different ways of framing renewable energy in China affected Chinese citizens’ support for such programs. They disseminated a survey online to over 2,000 Chinese citizens collected by a professional company (i.e., this was a convenience sample). Respondents were assigned to one of eight groups (six treatment, two control) that varied the argument being used to support or oppose greater investment in renewable energy in China. They measured support for greater investment on a 1-5 scale, with 5 being the highest support. The researchers find that support was at its highest when respondents were exposed to a frame that presented greater investment in renewable energy as a means of energy security, rather than as a means of combating global warming or air pollution.

While this finding is only possible thanks to an experimental design, it is also only feasible with the sample size they had (n = 2,000) because of how cheap and efficient survey experiments can be. But, we should ask whether or not the weaknesses of survey experiments harmed the study in question. For instance, since support for renewable energy was self-reported, might there have been some degree of bias toward greater support, regardless of the respondents’ true feelings? But, because this is a survey experiment, we can never be entirely sure that the respondents were truthful in their responses. But simply, survey experiments remove a great deal of control from the experimenters.

7.4.2 Laboratory Experiments

While survey experiments lack control of the environment in which the experiment takes place, laboratory experiments exercise near-complete control. These experiments take place in controlled environments, or a laboratory. Usually, for university professors, this means some room that their department or university provided for that purpose. But, oftentimes, labs can be made out of just about any room, so long as the experiment is not disturbed. In fact, recent research has tried taking mobile labs to communities that have been traditionally difficult to reach, all in order to better study and understand those communities and the people living in them (Lewis Jr 2019).

The greatest advantage of the laboratory experiment is that the researchers have a greater degree of control than in survey experiments. They can ensure that treatment is administered correctly, and that the people participating in the experiment are actually people and not automated survey takers (a serious concern for some online surveys). Further, while survey experiments are limited to mostly text-based treatment designs, laboratory experiments can be far more inventive. One of the classic reasons to conduct a laboratory experiment is because the research question requires studying some complex social interaction. A lab allows for researchers to create a physical space in which they can observe nearly real-life social interaction. And one last benefit of this method is that researchers can directly observe and record real behaviors and speech, and are not limited to self-reported attitudes in the same way survey experiments are.

That being said, the inherent problem with lab experiments is finding the people to fill the lab. As opposed to ease in which a survey can be filled out, participants in a lab experiment must actually travel to and from the lab in order to participate. This may sound trivial, but imagine the difficulty and expense for some to travel to lab sites. In some international contexts, lab experiments can be extremely costly because proper compensation must sometimes cover people’s missed wages from a day’s work. The end result of this is that lab experiments usually have rather small sample sizes, usually somewhere between 100 and 200 participants. Compare that to the thousands of participants that a survey experiment can garner for roughly the same cost and you begin to realize the issue with wanting to exercise greater control. Smaller sample sizes often mean fewer people per experimental group, which means less statistical power to calculate meaningful effects.

Again, it is worth noting that some questions are answerable in a lab setting. For example, Ismail White, Chryl Laird, and Troy Allen (White, Laird, and Allen 2014) conducted a lab experiment two months before the 2012 presidential election between Barack Obama, a Democrat, and Mitt Romney, a Republican. This experiment took place in a historically black college/university (HBCU), which gave the researchers access to a somewhat unique sample of exclusively young African American living in a Black institution of higher learning. This was a boon for the study because they were interested in seeing whether economic self-interest could undermine these young people’s partisanship. African Americans in the United States are largely members of the Democratic Party (over 80 percent), but these researchers were interested in testing the limits of this partisan loyalty.

Participants were brought to the lab and instructed after a brief interview to allocate $100 between the two candidates running for President, Obama and Romney. What they did not tell the participants (aside from the money not actually being donated) was that some were randomly assigned to a cue that told them that for every $10 they gave to the Romney campaign, the participant would receive $1 for themselves, paid in cash. This meant that participants in this condition, if they gave all $100 to Romney, thus undermining their likely Democratic Party loyalty, they would receive $10. But what made this lab experiment unique was the inclusion of another treatment group, identical to the economic incentive condition, that stipulated that their donation amounts would be made public in the university’s student newspaper, thus revealing their behavior to their peers; this is the type of social interaction that is often impractical to replicate in a survey setting. Ultimately, those in the control group donated an average of $90 to Obama, while those in the economic incentive condition’s average was $68, and those with the additional stipulation had an average of $86. In short, because of the lab setting, the researchers could leverage the presence of a Black institution in the minds of young African Americans to potentially dissuade them from giving into their economic self-interest.

Check-in Question 2: What is the primary disadvantage of conducting a lab experiment, compared to a survey experiment?

7.5 Field Experiments

Oftentimes, researchers do not have the liberty of controlling where and when they want to conduct their experiments. For instance, if you want to conduct an experiment that tries to increase Asian American voter turnout in Los Angeles, that means that you necessarily have to conduct the experiment weeks or months before the election in question. These experiments are what are called field experiments, or experiments that take place in the physical location you are interested in studying.

The biggest advantage of this sort of experiment is that sometimes they are the only option and offer a unique means of gleaning certain information about the real world. That is, these experiments are much closer to observing real world behaviors and outcomes than survey and lab experiments, in most cases. However, a serious downside of a field experiment is that they are incredibly expensive to run, even more so than a lab experiment. These experiments often involve many researchers who must be compensated, and treatment materials that often bear an additional cost than simply an online survey. For instance, even if you have disseminating a survey in the field, you either have to print it and collect those finished surveys via pre-paid mail, or bring a tablet device for participants to use there. Needless to say, the costs quickly add up and the sample sizes can vary quite a bit depending on available resources.

But in those instances where there is no other option, a field experiment can find incredible results. For example, Victoria Anne Shineman (Shineman 2018) conducted a field experiment in San Francisco, CA during a 2011 local municipal election. Shineman wanted to study how voter mobilization efforts not only increased voter turnout, but also voter knowledge. She invited 178 subjects to complete two surveys, one before the election and one afterward, in exchange for $25. It was during the first survey that Shineman randomly assigned participants to receive different types of mobilization assistance (or none at all for the control), some of which included the necessary forms to register to vote. After the election, Shineman found that not only had mobilization been increased as is typical in these GOTV experiments, but that those exposed to mobilization efforts also exhibited greater political knowledge than those in the control, as measured by the second survey conducted after the election. Without being able to go into the field like this, Shineman could not answer the question of whether mobilization effects spilled over to political knowledge, as this would be impossible to glean from just a survey or in a lab. However, it is worth noting just how expensive this study was for the researcher; the amount of resources required to conduct a field experiment is a serious disadvantage.

7.6 Natural Experiments

The final major type of experiment to review is one that is the most infrequently used by political scientists, and not for lack of trying. Natural experiments are experiments that are not exactly conducted by the researcher; that is, the randomization process necessary to call it an experiment was done by someone or something other than the researcher. This could be an "act of God," like a natural disaster’s effects on voter turnout, or a local municipality’s property tax’s effects on desegregation efforts in the local school system. No matter how the randomization happened, if it was not the researcher who did it, then it is a natural experiment.

The advantage of analyzing a natural experiment (again, it is not accurate to say one "conducts" a natural experiment) is that the outcomes observed could not be any more realistic. In social science contexts, natural experiments produce effects on real people in the real world; the stakes are at their highest. However, there are a whole host of problems that come attached to a natural experiment. First and foremost, because natural experiments are conducted by nature or some third party, that means that you have to find the natural experiments that have already happened. This entails identifying the cause, some manner of measuring who was affected by the cause, and some manner of measuring the outcomes you are interested in. If any of that information is unavailable for any reason, you cannot analyze the natural experiment. Further, and again because of the nature of natural experiments, the randomization process may not be truly random, especially when it comes to policy decisions, which are informed by political processes that are hardly random.

Consider, for example, Maimonides’ Rule and its effects on educational outcomes. Maimonides was a rabbinic scholar in the 12th century who posited that the maximum class size was 40 students per instructor, as any more than that and the single instructor would be overwhelmed. Israel adopted this informal rule and codified it in its public school system such that any class with 41 students or more received an additional instructor. Joshua Angrist and Victor Lavy (Angrist and Lavy 1999) identified this as a possible natural experiment. After all, the difference between classes of 40 and 41 students was essentially random, but it had the potential to affect educational outcomes like test scores. It stands to reason that going from a 40:1 student-teacher ratio to a 20:1 ratio is a meaningful difference. the researchers, when comparing the test scores of these classes just on the cusp of the 40-student cutoff, found that test scores were higher for the classes just over the limit who had a better student-teacher ratio.

But, let us consider the potential issues with this research design. First, we need to ask ourselves, was the assignment treatment truly random? In this case, treatment was receiving the extra teacher while only gaining a small increase in the total number of students. What would happen if certain parents were able to take advantage of Maimonides’ Rule and bend the rules of their public school to get their child into these classrooms with a better student-teacher ratio? Randomization would be broken because students being assigned to the treatment group would likely be from families from better socioeconomic backgrounds (i.e., parents capable of moving their children to advantageous classes are more than likely well to do, financially). This means that the natural experiment was not really an experiment at all and that the researchers were finding a spurious relationship, or a relationship between treatment and outcome that was better explained by some confounding factor. Indeed, the same researchers went back to replicate their research and found that recent research that tried to replicate the original study have found artifacts of such manipulation of an otherwise clever natural experiment, leading to null effects once accounted for (Angrist et al. 2017).

Check-in Question 3: What steps must be taken to conduct a natural experiment?

7.7 Advantages of Method

What is hopefully made clear in the above review of the major types of experiments is that experiments are versatile; as long as you can randomize your participants into different groups and measure outcomes, you can conduct an experiment. And perhaps the greatest advantage of experiments over other methods of social inquiry is that experiments are the best at causal inference, bar none. Because of the randomized assignment process, you can often be confident that an analysis that compares the outcomes between treatment and control groups is measuring the causal effect of the treatment (the dependent variable) on the outcome (the independent variable). This means that experiments often have very good internal validity, or answering the question of whether the independent variable is actually affected by the dependent variable and not some unseen confound. However, as noted in the section on natural experiments, this is not always the case.

In short, no other research method in this textbook is quite as good as a simple experiment at assessing causality or at achieving good internal validity. Better yet, there are no complicated statistics necessary to analyze the results of experiments, in most cases anyway; just a simple comparison of averages.

7.8 Disadvantages of Method

That being said, there are plenty of disadvantages to be aware of when it comes to experiments. While they are often seemingly easy to design, the reality is less so. A great deal of work must be taken on the front-end to ensure good construct validity, or the ability of the experiment to actually speak to the theory or research question at hand. Just because you can design an experiment easily does not mean that it is necessarily the best approach or that it will provide good evidence for your hypotheses. The study of the effects of Maimonides’ Rule on educational outcomes is a great example of a clever experimental design that, upon closer inspection, is not actually measuring the effect of this rule on test scores; rather, it is testing the effects of wealthy parents’ ability to get their children into ideal classrooms.

Further, and most importantly, a flaw of experiments is their often weak external validity, or whether the experiment’s results can be generalized beyond the case being studied. Sometimes we have to seriously worry about the artificial settings we place participants in during an experiment. Consider the modal survey experiment: how often are you really assessing your own attitudes on certain political subjects on a 1-5 scale, or reading news articles about issues you may not really care about, like oil pipelines? Or, better yet, consider the lab experiment example discussed above: how often do you go into a strange room, and donate $100 given to you by strangers among two different presidential candidates, and how often are you being given cash payouts for supporting a particular candidate over another?

Few of the activities asked of experiment participants are realistic in any sense, but some types of experiments are inherently better suited to good external validity than others. As discussed above, field experiments and natural experiments usually have much better external validity than their counterparts because the effects being measured are on actual human behavior in real-world situations. Further, recent work finds that external validity may not be a serious concern for some survey experiment designs, as some findings from artificial settings have been found to better approximate real-world equivalents (Hainmueller, Hangartner, and Yamamoto 2015). In Table 1, you can compare and contrast how each major type of experiment we reviewed performs on construct validity, internal validity, and external validity; note how across all of them, construct validity varies because it is largely incumbent on the researchers to design good experiments that speak to what they are interested in studying.

Lab Experiment Field Experiment Survey Experiment Natural Experiment
Construct Validity depends depends depends depends
Internal Validity high depends high low*
External Validity low high high high*

Note: * denotes a "maybe," as assessing these types of validity for natural experiments depends on a case-by-case basis.

Check-in Question 4: What is the difference between external and internal validity?

7.9 Broader significance/use in political science

Experiments, as you have seen throughout this chapter, are a flexible research method with some limitations. It is important to note how you will likely encounter experiments in your studies and in the real world. While experiments can vary in sample size, experiments are often only conducted once, and even when they are conducted again, it is rarely on the same sample as the first time. This means that experiments provide snapshots of political processes, results that are very likely time-bound in the moment and political situation in which they are captured. All of this means that you are unlikely to see the same experiment repeated over time. Some researchers mitigate this feature of experiments by using them to complement their other research methods conducted to answer the same question. For example, you could conduct a focus group to understand how a group of women engage with news media, and subsequently conduct an experiment to verify their stated behaviors and preferences. This means that experiments do not always need to be the only research method used in order to answer complex questions about politics.

7.10 Conclusion

We do not get to run history twice to see what might be different along a path not taken; that is just common sense. But, in some scientific contexts, we can effectively cheat history and observe both paths at once to determine the effect of some cause. This is thanks to the experiment, the best research method available to us to assess causal effects. It is a research design with as many forms as there are minds to imagine them, and with little exaggeration. However, that does not mean that experiments are always the best research method for the question at hand, and it does not mean that other research methods cannot perform better in some areas than an experiment. Experiments are deceptively easy to design and conduct, but great care must be taken in order to design a meaningful experiment that actually measures the effects it is supposed to, and can be generalized to the world outside an artificial setting like a lab or survey. What we, as scientists, are trying to do is study complex social interactions and the messy process of politics. Experiments allow us to answer some questions about those messy processes, but it is not a supplement for good theory and brilliant thinking.

7.11 Application Questions

Application Question 1:

Given what you know about the different types of experiments, what type of experiment was the first recorded experiment (the one on homeopathic medicine at the start of the chapter)?

Application Question 2:

Suppose you wanted to provide evidence that huge changes in average temperatures affected people’s perceptions about climate change. Briefly describe how you would design an experiment using one of the four types discussed in this chapter in order to do so.

7.12 Key Terms

Totally fine to add/subtract terms – just check with me as there are pre-designed quizzes to accompany the text!

  • control group (x)

  • construct validity (x)

  • convenience sample (x)

  • external validity (x)

  • field experiments (x)

  • internal validity (x)

  • lab experiment (x)

  • natural experiment (x)

  • spurious relationship (x)

  • survey experiment (x)

  • treatment group (x)

7.13 Answers to Application Questions

Answer 1

The experiment on the effects of homeopathic medicine was primarily a field study, but one could argue that it was a lab experiment as well because treatment and control were administered in a controlled environment. So, in other words, this was a so-called "lab in the field" experiment. These are common in political science, especially in recent years as the need to study difficult-to-reach populations has increased.

Answer 2

Example responses:

Survey experiment: providing some respondents with information about above average summer temperatures and below average winter temperatures and comparing their attitudes toward climate change to those who did not receive that information.

Lab experiment: put people in a particularly warm room on a hot summer day and see if their attitudes toward climate change are different than those in a different, air-conditioned room.

Field/natural experiment: go to areas experiencing huge shifts in average temperatures and compare people’s attitudes toward climate change in these areas to those in areas whose temperatures have remained stable over time.

References

Angrist, Joshua D., and Victor Lavy. 1999. “Using Maimonides’ Rule to Estimate the Effect of Class Size on Scholastic Achievement.” The Quarterly Journal of Economics 114 (2): 533–75. https://doi.org/10.1162/003355399556061.

Angrist, Joshua D, Victor Lavy, Jetson Leder-Luis, and Adi Shany. 2017. “Maimonides Rule Redux.” Working Paper 23486. National Bureau of Economic Research. https://doi.org/10.3386/w23486.

Chen, Dingding, Chao-yo Cheng, and Johannes Urpelainen. 2016. “Support for Renewable Energy in China: A Survey Experiment with Internet Users.” Journal of Cleaner Production 112 (January): 3750–8. https://doi.org/10.1016/j.jclepro.2015.08.109.

Gosnell, Harold F. 1927. Getting Out the Vote: An Experiment in the Stimulation of Voting. The University of Chicago press.

Hainmueller, Jens, Dominik Hangartner, and Teppei Yamamoto. 2015. “Validating Vignette and Conjoint Survey Experiments Against Real-World Behavior.” Proceedings of the National Academy of Sciences 112 (8): 2395–2400. https://doi.org/10.1073/pnas.1416587112.

Iyengar, Shanto, Mark D. Peters, and Donald R. Kinder. 1982. “Experimental Demonstrations of the ‘Not-so-Minimal’ Consequences of Television News Programs.” American Political Science Review 76 (4): 848–58. https://doi.org/10.1017/S000305540018966X.

Jamison, Julian C. 2019. “The Entry of Randomized Assignment into the Social Sciences.” Journal of Causal Inference 7 (1). https://doi.org/10.1515/jci-2017-0025.

Lewis Jr, Neil A. 2019. “Studying People in Their Local Environments.” APS Observer 32 (3). https://www.psychologicalscience.org/observer/studying-people-in-their-local-environments.

Shineman, Victoria Anne. 2018. “If You Mobilize Them, They Will Become Informed: Experimental Evidence That Information Acquisition Is Endogenous to Costs and Incentives to Participate.” British Journal of Political Science; Cambridge 48 (1): 189–211. https://doi.org/http://dx.doi.org.turing.library.northwestern.edu/10.1017/S0007123416000168.

White, Ismail K., Chryl N. Laird, and Troy D. Allen. 2014. “Selling Out?: The Politics of Navigating Conflicts Between Racial Group Interest and Self-Interest.” American Political Science Review 108 (4): 783–800. https://doi.org/10.1017/S000305541400046X.