Jim Russell

James A. Russell, Department of Psychology, Boston College

August 2015 – People frown, smile, laugh, grimace, wince, scowl, pout, sneer, and so on. In turn, observers interpret facial muscle movements, inferring what the expresser is doing (thinking, feeling, perceiving, faking, and so on). Basic Emotion Theory (BET) offered an account of certain facial movements and their interpretation in terms of discrete emotions. Here I offer a skeptical view of BET’s prospects.

BET is an elaboration of a folk theory that dates back at least to Aristotle. As such, it captures many of our commonsense, taken-for-granted presuppositions about facial expressions – presuppositions that underlie the way those of us in the Western tradition think about and perceive facial movements and that make certain claims seem obvious. Adding an evolutionary account, a neural mechanism, and a famous trek in the highlands of Papua New Guinea made BET a highly influential and plausible theory. BET became the dominant research program in the field of affective science and stimulated much valuable research.

A scientific theory often begins with a folk theory, but then changes as nature is probed for unpredicted facts and anomalies and conceptual problems in the theory. A clear example of this development comes from physics. Aristotle based his physics on the folk theory of the four elements, but observations and analyses led eventually to the qualitatively different physics of today. How far from obvious are nature’s ways.

BET suffers from unresolved conceptual issues, and observations and experiments have uncovered unpredicted facts and anomalies about faces that demand a similar move beyond folk theory and BET. Today, researchers must examine non-emotional aspects of facial movements and their interpretations (see Figure 1).


Figure 1: Neither production nor recognition of facial expressions conforms to Basic Emotion Theory. In panel A is a photograph of a spontaneous facial movement that would be scored as an expression of surprise by Basic Emotion Theory. In panel B is the same facial expression shown in its actual context. The photograph is of LeBron James, a professional basketball player, looking up at the basketball hoop. It is unlikely that James is surprised to find a basketball hoop on the court, and it is unlikely that observers would interpret James as surprised when they see the context in which his facial expression occurred.

Even with respect to the role of emotion, researchers must choose between revising BET or, as I suggest, take a different approach entirely. We must separate issues of the sender’s production of facial movements from the issues of an onlooker’s interpretation of those movements. After all, we perceive melancholy in the baying of wolves and joy in birdsong; we cannot always use what we perceive to infer the true cause.

The Sender’s Production of Facial Movements

Faces move, obviously. We need a descriptive system of facial movements. Ekman, Friesen, and Hager’s (2002) elaboration of Hjortsjö’s (1969) anatomically-based catalog of facial movements was a major advance. Still, much (but not all) of the research inspired by BET has focused on a small number of exaggerated facial configurations. How often the facial configurations seen in, for example, Ekman and Friesen’s (1975) Pictures of Facial Affect actually occur remains unknown, but they are likely rare.

Gaspar and Esteves (2012) recorded the facial behavior of 3-year-olds during emotional episodes. They found much facial movement, but rarely the configurations seen in the prototypical BET faces. Configurations “matching the prototypical expression of joy/happiness are the highest, reaching 27% . . . The surprise matching proportion is 5%, anger 0%, and fear 11%” (p. 353). Carroll and Russell (1997) found similar results with adults. We need to go beyond the facial configurations seen in Pictures of Facial Affect.

We also need an account of what produces facial movements. BET follows folk wisdom in presupposing that, except in cases of deliberate deception, happiness makes us smile, anger makes us frown, fear makes us gasp, disgust makes us scrunch our noses, and so on. Surprisingly little evidence supports the production side of BET’s account.

Reporting the “first evidence” alleged to support the emotion-face link, Rosenberg and Ekman (1994, p. 223) wrote, “Our results provide the first evidence that there is coherence between facial expression and self-report of emotion at specific moments.” Viewers of one (of four) film clips of a disgusting event had a significantly higher probability (.50 vs .30) of showing a facial expression of a specific emotion at the moment in the film that they reported having felt that emotion than at other moments. Analysis of a second film clip failed to replicate this result. The study (a) was correlational (thereby unable to test causality), (b) failed to include an analysis of two of the four clips shown, and (c) failed to specify precisely which facial expressions were scored as corresponding to which emotion.

Improved research on the emotion-face link followed, but continued to find evidence at odds with folk wisdom and BET. Reisenzein, Studtmann, and Horstmann (2013) reviewed the laboratory evidence, Fernandez Dols and Crivelli (2013) the field evidence. In brief, happy people do not always smile, and smiles occur without happiness. Smiles are easily posed, do not always correlate with the smiler’s emotional state (Fridlund, 1991; Krumhuber & Manstead, 2009), and can be caused by negative experiences such as losing a game (Schneider & Josephs, 1991), being embarrassed (Keltner), or being in pain (Kunz, Prkachin, & Lautenbacher, 2009). Similar problems arise for other emotion-face associations.

We need to explore other possible sources of facial movement both for the complete story of how facial expressions are produced and as a way to test BET. As BET agrees, the sources of human facial movements are many. As we talk, eat, breathe, exert effort, smell, feel pain, or reach orgasm – our faces move. Our faces move as part of certain reflexes (gag, orienting, startle, and so on), of perception (looking, tasting, and so on), and of social interaction (social greeting, threatening, exerting dominance or submission). Our faces move as we unconsciously imitate others. Our faces move as part of information processing and of subsequent behavior. All such sources of movement are potential confounds when testing BET’s assumption that discrete emotions cause facial movements.

Besides BET, there are various possible accounts of the production of facial movement, including the following: (1) Perception involves bodily movements (reaching to feel, turning to look), and facial movements are part of this process. For example, BET’s “fear expression” might enhance visual exposure (Susskind et al., 2008). (2) Cognitions (appraisals of current events) might produce facial movement (Scherer, 1992). Ortony and Turner (1990) noted that a frown (brow contraction) often occurs when one is uncertain or puzzled. (3) Fridja proposed that facial movements are part of the preparation for action. (4) As social animals, a large part of our behavior is negotiating social interaction. Fridlund (1994) suggested that facial movements signal to an audience projected plans and goals including contingencies. (5) Facial movements are part of paralanguage. Chovil (1991) offered a taxonomy for paralanguage in which facial movements are part of speech communication. An example is substituting a “disgust face” for the words “that stinks.” (6) Core affect – a neurophysiological state consciously accessible as simply feeling good or bad, energized or quiescent – might produce facial movement.

Conceivably, all these accounts, including BET, are complementary. Certainly, some of the six proposals listed above maintained some link to emotion (appraisal, action preparation, and core affect have been listed as components of an emotion). But science requires a more critical stance. In evaluating BET, the question is whether one or more of these alternative sources better account for facial movements. Some of these suggested sources of facial movement are likely correlated with emotional state, and the question is what happens when they are disentangled.

Kraut and Johnston’s (1979) study of smiling bowlers is the prototype. They agreed that people often smile when happily interacting with others, but asked what happens when happiness and interacting are teased apart. Smiles when happy but not interacting were found to be rare (see also Fernandez Dols & Ruiz-Belda, 1995, and Ruiz-Belda, Fernández Dols, Carrera, & Barchard, 2003). (The objection that we smile at pleasant thoughts when alone was answered by Fridlund, 1994, who provided various reasons to believe that the smiler may be physically but not psychologically alone.) Thus, from a scientific perspective, research on alternative sources of facial movement provides a needed test of BET. And, so far, BET’s prospects are not good.

In short, the production of facial expressions is sometimes correlated with discrete emotions, although weakly, but there are alternative explanations to the theory that the emotions are causal, since emotions are confounded with other sources. Thus, we have no convincing evidence that emotions cause facial movements: the (weak) correlation between emotions and facial movements may have other underlying causes.

The Observer’s Interpretation of Facial Movements

We open our eyes and see that this person is happy, that one angry, and so on. BET articulated the common belief that people “recognize” happiness, anger, disgust, and so on in the faces of others. Many studies purported to demonstrate consensual recognition by asking people to match a photo of a static facial expression to one of BET’s predicted emotion terms. Such demonstrations, even if reliable, would not show that people spontaneously recognize the predicted emotion but that, once told that one of a number of emotions is expressed, they can select the predicted one. In other words, “emotion recognition” scores are merely matching-to-sample scores.

Even more troubling, the high matching scores found may be partly due to design methods that favored finding them. No single design problem need be fatal, but cumulatively they combine to push scores in the predicted direction: within-subjects designs, posed exaggerated facial expressions (devoid of voice, motion, body, and information about the expresser’s context), and the use of forced-choice response format (Russell, 1994). For example, when observers see spontaneous rather than posed faces, matching scores plummet. We recently found that people can achieve a high matching score between a label and a face, without recognizing any emotion. Instead, they used an elimination strategy: after matching several standard faces with standard labels, both children and adults chose a non-word, “pax,” from the list as the emotion expressed by a novel face (DiGirolamo & Russell, 2014; Nelson & Russell, 2014). If so, then such an elimination strategy may account for high matching found for some (but not all) emotion labels.

Outside the laboratory, the observer does not use someone’s facial expression alone to infer that person’s emotion, but facial expression in light of the expresser’s situation and other aspects of the face’s context including the expresser’s body (Fantoni & Gerbino, 2014). So, removing the context in a “recognition” experiment stands in the way of understanding how observers typically interpret facial expressions. More important, specifying context as well as face in such experiments can provide a test of BET.

BET implies that the facial expression is more powerful for “recognition” of emotion than is its context because BET theorizes that the facial expression is an automatic signal of the specific emotion (or that facial expression is part of that emotion), whereas context can provide only probabilistic information because different individuals respond differently to the same situation. To the contrary, in determining the emotion seen by the observer, context is more powerful than the face (Carroll & Russell, 1996): a person in an anger-inducing situation who showed BET’s “fear face” was interpreted as angry rather than afraid.

I also followed folk theory in predicting that, according to my valence-based theory of facial expressions, face trumps context on the judgment of valence (whether the expresser’s emotion is seen as pleasant or unpleasant). Alas, I was wrong: context trumps face even on judgments of valence (Aviezer et al., 2008; Kayyal, Widen, & Russell, 2015).

In the “universality thesis,” BET emphasized the uniformity of recognition: basic emotions were claimed to be easily recognized from the predicted facial expressions by all people whatever their culture, language, or education. Yet, meta-analyses have found that matching scores vary with culture, language, and education (Nelson & Russell, 2013; Trauffer, Widen, & Russell 2013). Jack et al. (2012) used a psychophysical technique and again found cultural differences in what facial configurations were matched to specific emotions.

BET presupposed that the English words fear, anger, disgust, and so on express universal categories in terms of which recognition proceeds; evidence indicates that the way in which emotions are categorized is not universal: emotion categories expressed in different languages are in some ways similar to but in some ways different from those in English (Russell, 1991; Wierzbicka, 1999).

As Ekman and Friesen (1971) emphasized, the most telling test of universality involves societies remote from Western culture and media. The few such studies showed a large cultural difference in matching scores (Russell, 1994). In a recent study of a remote society, Gendron, Roberson, van der Vyver, and Barrett (2014) similarly found weak to non-existent support for BET’s prediction of uniformity of interpretation of facial expressions. Diversity needs our attention as much does as uniformity.

Some writers emphasize that BET’s hypotheses are supported to a statistically significant degree: observers select BET’s predicted emotion label more often than they would if they chose emotion labels randomly. But then no one predicts that humans are random in interpreting faces. Ruling out the null hypothesis of random responding does not rule in the experimenter’s hypothesis. There are many ways to explain non-random responding; Russell (1994) offered eight alternative accounts, and surely there are more. (Aristotle’s physics based on the four elements makes some valid predictions: put earth, water, and air in a beaker, shake, and watch the elements settle: earth at the bottom, water in the middle, and air at the top – just as his theory predicts.) Folk theories and the scientific theories inspired by them provide first approximations, not random associations.

What the classic BET studies called “recognition” is interpretation. Observers may use facial information to make inferences not just about emotion but about any psychological state. The interpretation of the face is influenced by many factors, some rarely studied (color of the sclera), some more studied: by the observer’s situation (state, interests, motives), by the face’s context (the expresser’s context, words, gaze, vocal prosody, body position and proxemics, motor behavior, underlying physiognomy), and by features of the experimental method. Further, the observer does more than interpret. Rendall, Owren, and Ryan (2009) suggested that some facial movements influence the emotional state of the observer directly: receiving a smile might simply make you feel better.

I suggested an alternative account – called minimal universality – of an onlooker’s interpretation of facial expressions (Russell, 1995). Universally, humans perceive others in simple general terms (valence and arousal): Is the person feeling good or bad, energized or quiescent? This part of the proposal is consistent with above-chance matching of faces with emotion labels, because the meaning of an emotion label includes, among other elements, valence and arousal. (This part of the proposal is also consistent with Osgood’s theory that we perceive everything in terms of simple affective dimensions of evaluation and activity. And he may well be correct that we perceive facial movements in terms of potency as well.)

Young children interpret faces in terms of valence (Widen and Russell, 2008). For example, the typical young 3-year old uses the same one label (typically angry) for four of BET’s canonical faces: those for fear, anger, sadness, and disgust. As children develop, they add new emotion concepts by differentiating: feeling bad is divided into feeling bad because of loss vs. feeling bad because of receiving a hostile action. The end product is a set of adult emotion concepts, which are similar but not uniform across individuals, languages, and cultures (Russell, 1991). In interpreting facial expressions, older children and adults go beyond valence and arousal, including categorization by discrete emotions. On my initial proposal, the face is typically relied on to provide the values of valence and arousal, but context provides the specific emotion. The hypothesis that the face provides valence, however, was recently found wanting, as I reported above.

In short, sufficient evidence has now accumulated to conclude that BET’s claims about universal recognition of a specific discrete emotion from its facial expression are unwarranted. Research should shift to the broader topic of how a person’s facial movement influences an observer, including, but not limited to, the interpretation that the observer makes for the face and the many factors that influence that interpretation. We need to study not just English folk terms for emotion (happiness, anger, disgust, etc.) but many more psychological categories and how their accessibility or even existence varies with language and culture.

Can Basic Emotion Theory Be Salvaged?

One response to the evidence mentioned here might be to revise BET. This tack appears less viable in light of evidence on other aspects of the theory. There is no consensually agreed upon confirmatory evidence for emotion-specific signatures in the autonomic nervous system (Cacioppo et al., 2000) or specific behavioral responses (Baumeister, Vohs, DeWall, & Zhang, 2007). BET predicts tight coherence among each emotion’s components, but such components turn out to be surprisingly weakly correlated (Reisenzein, 2000).

Caution is also warranted when revising BET because the revision may introduce problems as much as solutions. For example, evidence of cultural differences led Ekman (1972) to embrace Klineberg’s (1938) hypothesis of cultural rules prescribing or proscribing facial expressions. On Ekman’s treatment, these display rules render his theory immune to evidence: Happiness leads to smiles, except when it doesn’t, in which case a display rule intervened. Without a prior specification of the display rules, no evidence could falsify the theory.

There are also deeper problems with BET. Modern understanding of evolution by natural selection raises doubts about BET and provides an alternative (Buss, 2014; Fridlund, 1994). Automatic signaling of one’s true emotion to enemies would incur heavy costs, and evolution is likely to have produced deceptive as well as veridical signs because of conflict of interest between expresser and observer.

BET’s problems are deeper still. I do not know exactly how BET defines emotion. On one interpretation, emotion is a package of components. At least in the Western cultural tradition, we tend to “see” emotions by packaging together various components. Indeed, the key concepts in BET (anger, fear, etc.) originated in folk psychology, concepts that are vaguely defined, heterogeneous, culture-specific, and permeated with questionable assumptions. A similar tendency can be seen when ancient astronomers “saw” constellations made up of stars that were actually unrelated cosmologically. Packaging disparate phenomena into a discrete emotion may make the world seem simpler and serve cognitive economy, but the packages may be merely convenient fictions.

On another interpretation, an emotion is an entity that causes the components (e.g., Tomkins’, 1962-63, affect program): Emotion makes us flee, makes our heart race, makes us feel a certain way, and moves our faces. The “affect program” is simply a metaphor from computers to the brain. If the affect program is a hypothesized brain circuit dedicated to a specific emotion and only that emotion, then it is relevant that neuroscientists are abandoning the notion of hardwired emotion-specific brain circuits (LeDoux, 2012, 2014; Lindquist et al. 2012). The theory that an observable emotional component is explained by an affect program is reminiscent of faculty psychology in which an observable event is explained by a faculty of the same name: remembering is explained by the memory faculty, imagining by the imagination faculty, and moral behavior by the morality faculty.

In short, BET initially seemed plausible, even obvious, built as it was on our intuitive folk theory about emotions and faces, combined with an early understanding of brain mechanisms and of evolution by natural selection. Subsequent scientific scrutiny, however, has not supported its predictions. Its evolutionary presuppositions and neural basis lack support. Hypotheses about peripheral physiology and instrumental behavior also lack support. Co-occurrence of emotional components has been found much less frequent than predicted.

Alternative Approaches to Emotion

As it often happens with scientific progress, conceptual alternatives to BET begin with different assumptions and tend to be more complex and less intuitive, because they part ways with folk theory. Examples are Fridlund’s (1994) behavioral ecology view based on modern evolutionary theory and Scherer’s (1992) and Ortony & Turner’s (1990) appraisal theories based on links between perception-cognition and specific muscle movements.

In psychological construction (Russell, 2003; Barrett and Russell 2014), I offer an alternative account of emotion and other affective phenomena that explicitly abandons certain common sense presuppositions, although it retains all the observable facts. People get angry or scared, obviously. Such folk terms as “emotion,” “anger,” and “fear” point to important phenomena, and the terms express concepts that often play a role in those phenomena. All the same, the question is how to develop a scientific account of those phenomena.

On my proposal, the term “emotion” is treated as a folk rather than a scientific term. Episodes called “emotional” consist of changes in various component processes (peripheral physiological changes, appraisals and attributions, expressive and instrumental behavior, subjective experiences), no one of which is itself an emotion or necessary or sufficient for an emotion to be instantiated. Emotion is not invoked as the cause of the components nor as the mechanism that coordinates the components. Each component has its own semi-independent causal process.

One hypothesis, for example, is that the production of facial expressions is accounted for by one or more of the six alternative sources discussed above, not by a discrete emotion or affect program dedicated exclusively to emotion or to a specific emotion. Facial “expression” is at most modestly correlated with other components of the emotional episode. The components are coordinated, as are all human processes, but, again, not by an affect program. Although emotion is not an entity causing the components, still, a witness, scientist, or the person having the emotion might categorize the episode as a specific emotion: we see emotions in others and experience emotions in ourselves. That categorization too is a process to be studied, and is neither necessary nor sufficient for an emotion to be instantiated.

Psychological construction abandons the assumption that emotional episodes are pre-fabricated; it proposes instead that they are assembled in the moment to suit current circumstances. An emotional episode is not qualitatively different from any other behavioral episode, and it is assembled in the same way as is any other behavioral episode, although often with a more extreme dose of valence and arousal.


