Facial Behavioral Analysis: A Case Study in Deception Detection

Aims: To establish a rich Facial Action Coding System (FACS) coded database and to investigate the use of the facial visual cues for deception detection. Study Design: A within-participants design experiment was conducted, using immigration as a scenario for asking questions of participants in controlled experimental conditions. The study design required participants to answer questions on two topics, one as themselves and one based on a learned scenario. Data regarding visible images of facial movement were collected and analyzed against cues identified as indicative of deceit. Place and Duration of Study: With the ethical approval from the University of Bradford, 32 volunteer undergraduate students and research assistants took part in the study, from March 2011– June 2011. Methodology: We included 32 participants (27 men,


INTRODUCTION
Detection of human behaviors to reveal an individual's intent is an emerging theme of interest for security agencies. The deliberate intents include the attempt to deceive authorities to enter a country illegally, smuggle good, being involved in a malicious act such as a terrorist bombing or as harboring the intention to carry out such a malicious act at a later time. Detecting hotspot in an individual will aid in the apprehension of suspect individuals, before they are able to carry out malicious acts. The relevant literature was reviewed to establish behaviors that might plausibly be used for the operational identification of malicious intent: modeling these behaviors, patterns or cues will provide a significant base for a tool in detecting suspicious individuals.
The evidence from psychology experiments shows that, on average, people only discriminate liars from truth tellers in about 54% of the cases [1,2]. This performance does not represent a very meaningful improvement over chance [3,4]. However, evidence shows that the performance in deception detection is higher for high-stakes scenarios [5]. Researchers [6,7] do suggest liars behave differently from truth tellersbecause the process of lying initiates three psychological constructs: emotion [8,9], content complexity [9,10] and attempted control [10]. For instance, people who are lying might be expected to experience 'emotions' including guilt, fear and duping delight [8]. They will also experience 'content complexity' due to having to 'check their story' to ensure its consistency and believability. This includes thinking of plausible answers to questions, avoiding contradictions, making sure lies are compatible with other available information and remembering what they have said so they can repeat it later and will increase the cognitive workload in comparison to someone telling the truth [9][10][11]. Liars will also be concerned about behaviors that could give them away, so need to control their actions. Research shows that this often creates an overcompensation [3,10,12] which might be detectable and also reinforces the increased cognitive load associated with lying. Indicators that an individual is experiencing any one of these psychological constructs might therefore indicate their attempt to deceive and so identify them for further questioning. Moreover, it is likely that the dominance of each construct over the others will vary through the narrative of a security process. Appreciation of this variation will vastly enhance the effectiveness of any tool used to detect those with malicious intent.
Cues related to anxiety, for example may be more difficult to detect in less trait-anxious individuals [13] or those who are experienced at deception. Furthermore, innocent individuals may display signs of anxiety since emotions are likely to always 'run high' in security settings, for a variety of reasons. In terms of emotion expression within the face, some researchers believe there are different elements of specific expressions corresponding with specific emotions [14]. Others argue for more general aspects [15]. Cultural display rules affect the relationship between feeling and display, people can exaggerate or hide expressions to conform to accepted patterns [8] and there are questions about whether emotions can be expected to have basic links to expressions or whether the face is simply a tool for communicating intentions [16,17]. Research [18,19] suggest that rich media or multimodalities provides more clues in terms of synchronicity and consistency of the communication. In communication theory, deception principles were merged with interpersonal communication principles [20].
An experiment was constructed to establish a baseline of the specified behaviors in truthful and deceitful conditions [21]. A rich FACS-coded (Facial Action Coding System) database was established from the baseline data to support future development of a tool for operational detection of cues to malicious intent. The detailed description of FACS and the annotation is described in section 3.2. The authors have presented the collection of database in a workshop [21]. Please note that the focus of this paper is on the analysis of the FACS data in visual domain.

Protocol
The experiment was constructed as two interview scenarios. Participants were interviewed by an 'Examiner' who was introduced by the 'Facilitator' as having recently trained in techniques to detect lies. Participants were told it was important that they appear honest throughout. For one session, they were asked to answer questions as themselves. For the other, they were given a character profile to learn and were asked to answer the questions as if they were the character in the profile. Some questions went beyond the information in the profile, requiring participants to create plausible answers.
Each session consisted of an introduction period, followed by a series of five baseline questions (for example, 'what is your name?') asked by the Facilitator, followed by an interview with the second experimenter: the 'Examiner' who asked 10 set questions on the relevant topic. Throughout the experiment, certified FACS coders coded the data regarding visible images of facial behavior.
A within subjects approach was employed with two independent variables: interview topic (university study and career, dwelling hobbies personality and family) and honesty (self, character). Condition orders were counterbalanced and the interviewer was blind to the condition to prevent bias. Participants were invited for two interview sessions, one in the morning and one in the afternoon of the same day. This provided separation between the two topics and the truthful and deceitful conditions. The questions were designed to elicit answers of 2 to 10 seconds in the majority of questions. It was anticipated that this would be sufficient, combined with measurement of facial behavior during the question period, to represent the range of facial behavior satisfactorily.

Equipment Setup
The experiment was conducted in a darkened room with controlled lighting condition. Fig.  1(a) illustrates the session, while the facilitator was giving the instructions to the participant. Fig. 1(b) illustrates the environment during the interview stage. The participants facial activitities were recorded by using a high definition visual camera, as illustrated in Fig. 1(c). The camera used in this experiment is a JVC-GY-HM100E with a resolution of 1280 by 720 pixels.

Facilitator and Examiner
The experiment used scripted participant introduction and instructions. The facilitator mentioned the 'examiner' and informed the participant that the examiner has been trained in techniques for detecting lies. Then the facilitator explained that the examiner would interview the participant on two topics and informed the participants that the trial is designed to investigate methods for detecting when someone is lying.
During the interview, the examiner dressed formally to reinforce the impression of authority. The examiner was blinded to which condition a participant would be in. He was not involved in the day to day running of the project. To enable rewards to be given to participants as an incentive, the examiner recorded his judgment as to whether each participant was telling the truth but was not told whether his judgment was correct. Although not the focus of the experiment, it may be noteworthy that the examiner who took part in the study is an expert in crime scene reconstruction and forensic science.
Finally, the facilitator reminded the participant of the importance of presenting themselves as honest throughout the entire interview and if appropriate, staying consistent and in character for the relevant topic. The participant was informed by the facilitator that there was a small reward available for those participants who convince the examiner that they were truthful throughout the interview.

Participants
With the ethical approval from the University of Bradford, 32 volunteer undergraduate students and research assistants took part in the study. Among them 27 were male and 5 were female. They ranged from 18 years to 33 years.

Self-report
At the end of each session, the participant was asked to confirm whether they had followed the instructions correctly and answered as themselves or the character (as appropriate) for each question. The facilitator also thanked the participant for their participation, informed the participant of the examiner's judgment and provided a small reward if the participant was successful in convincing the examiner that they were truthful throughout the interview.
For detailed design of the experiment, please refer to Yap, et al. [21]. In the following section, we discuss the results from the data analysis.

RESULTS AND DISCUSSION
We discuss the results from two perspectives: firstly analysis based on human judgment (examiner's judgment) using verbal and non-verbal cues and secondly explaining the process in the database preparation and the performance of the computer algorithms based on the FACS-coded database, to aid human decision.

Analysis on Examiner's Score
Research showed that average person spots liars at approximately 54% accuracy [1], while specialized groups (trained psychologist, police etc.) score approximately 60% accuracy in identifying deception [22]. The confusion matrix of the examiner's score in detecting deception is presented in Table 1, which shows that the Examiner achieved 56.25% accuracy in detecting truth tellers and 56.25% in detecting deceit. The sensitivity and specificity of 56.25% revealed the weakness of humans in deception detection.

FACS Coding Annotation
The Facial Actions were coded using FACS [23]. FACS provides a comprehensive and objective way to analyze expressions into elementary components. It has been used widely in behavioral sciences. All the action units were coded by certified FACS coders. In our investigation, the duration of an action unit is the total time taken from onset, apex and offset. Besides the standard AUs, we also analyzed behaviors related to anxiety such as gaze, stuttering, swallowing and lip biting. For FACS annotation, we used ELAN (The Language Archive) [24,25]. Fig. 2 illustrates the annotation software, with a video of a subject in the top left corner and the coded AUs below the video. After annotation, the data was exported to an excel spread sheet as shown in Fig. 3.

Result Analysis and Discussion
From 32 subjects, we filtered out the subjects who were confused by the instructions and uncertain about their own intention in the interview sessions. After filtering, there were 28 subjects available for analysis. We found 70 facial AUs in our dataset-56 AUs from the standard FACS coding and another 14 AUs were defined to match the clues from the literature review. Table 2 lists the AUs with the respective meaning. The first 56 AUs are the standard AUs in Ekman & Friesen's guidelines [18] and the last 14 AUs (Italic and bold) are our additional labeled AUs to represent other cues found in the dataset. Unusual behavior found in our study included: cough, eyes move regularly to the left and right, face turned red, hand on face, quick blink, head tilt left and right, hand on neck, heavy breath, forehead muscles movement, lip pucker to the left, lip pucker to the right, scratching, quivering lips and stutter. Some of the unusual behaviors listed are relevant to culture, for instance, head tilt left and right is only observed in a specific ethnic group. To further interpret the data, we ran three statistical analyses approaches. In Analysis I, we analyze the facial AUs statistically. Then we implemented machine learning methods to find the accuracy of classification in truth tellers and liars in Analysis II. Finally, Analysis III looked for the best threshold in the machine learning classification-with the trade-off between the cost and the risk of missing the target.

Analysis I: Statistical analysis
We summarized the frequencies of the Facial AUs for 28 subjects, which contain 280 questions and 280 answers. We examine the following research question: Were there any differences in the facial actions of the questioning states: prepare to lie and prepare to be truthful and answering states: lying states, truthful states, telling the lies with intention of being truthful and telling the truth with the intention of lying.
We observed that the total AUs for the deceitful condition is slightly less than for the truthful condition. The reduced number of liars is supported by the fact that the liars attempted control clues [22]. At a glance, we also observed that AU4, AU7, AU9, AU10, AU24, AU32, AU43, AU51, AU52, AU55, AU82, AU84 and most of the additional unusual behavior occurred more often in the deceitful condition than the truthful condition. These observations might be useful cues to examine the distinction between lie and truth. For further justification, we ran a statistical analysis to examine the significance of the cues.
The occurrence of AU97, AU98, AU101, AU102 and AU108 indicates a lie, however, these rare events are not sufficient in monitoring the targets. For instance, a selective system to filter out the suspects based on these five cues will produce 57.14% of sensitivity and 35.71% of false positives. It is not reliable as these AUs might also indicate anxiety. A nonparametric test on a set of 10 AUs (AU9, AU23, AU24, AU32, AU82, AU97, AU98, AU101, AU102 and AU108) was conducted. The primary measure used was the frequency of occurrence of the facial visual cues: i.e. the number of times it was exhibited. By applying the non-parametric sign test, the result for the set of 10 AUs was considered statistically significant (p<0.05). This result indicated that there were more subjects who exhibited these 10 AUs are the deceitful condition compared to when they were being honest.

Analysis II: Machine learning methods in classification
To find if there are useful predictors of deception, we performed classifications by using machine learning on the coded facial AUs dataset. The classifications were based on 72 features: the 69 AUs (AU50 Speech is excluded), asymmetry, duration and ground truth. Each feature represented the frequency of the AU for each question. Hence, for each participant, there will be 10x72 dimensional features for the 10 truths and 10x72 dimensional features for 10 lies. The ground truth is provided for each question for the purpose of training and to automatically calculate the prediction accuracy. To find out the best machine learning classifier on our dataset, we used five popular classifiers implemented in the WEKA software package [26], namely: Logistic Regression (LR), Multiple Layer Perception (MLP), Naïve Bayesian (NB), Radial Basis Function (RBF) and Support Vector Machine (SVM). The default evaluation method in WEKA -10 fold cross validation -was used. Table 4 shows the comparison of the classification accuracy by using different machine learning algorithms. The best result was achieved by using LR with sensitivity of 47.9%, specificity of 71.2% and ROC area of 0.638. The poorest result was achieved by SVM which produced high specificity and poor sensitivity.
Overall the LR out-performed the other approaches in the classification. But machine learning algorithms tend to bias to truth prediction, as shown in Table 3 with low sensitivity and high specificity. This is not acceptable in real life application as it tends to miss a lot of deceptive cases and is not a lot better than chance. Hence, we proposed a new classification threshold to increase the sensitivity, which is presented in the following section.

Analysis III: Threshold of lie and truth
An interesting question and observation about the definition of lying in our study was: What is the percentage of lies from a subject if the session would be considered as deceptive? In a fair game, a 50% threshold is normally the cutting-off point in decision-making for classification. We used this standard classification threshold, i.e. 50% to run an experiment.
Since LR performed the best among the classifiers, it was chosen for the rest of our implementation of our experiments. By cross-validating the participants with five folds (with 22 subjects as training set and 6 subjects as testing set in each fold), we achieved the result as illustrated in Table 4. Please note that the split between training set and testing set was done randomly. This produced a sequence of 30 predictions (not 28 sequences). The overall accuracy is 53.3%, with sensitivity of 36.7% and specificity of 70.0%. It was expected that we would get poor sensitivity with high specificity as the nature of machine learning algorithms favors truth prediction. To overcome the bias, we made one assumption. Let us assume that a lie is not tolerable and furthermore that if a subject lied in more than three questions in a session, then the subject is categorized as deceptive. This implies that we reduced the classification's threshold to a lower value, i.e. 35%. The main purpose of putting such an assumption is to reduce the false alarms and misses, as we cannot risk these. The experimental results are presented in Table 5, which shows the improvement in the overall accuracy to 66.7%. More importantly, it showed improvements in sensitivity to 70.0% and specificity to 63.3%.

CONCLUSION
A problem with laboratory studies of deceptive facial behavior is that it contextualizes the human actions and choices [27]. It is necessary to analyze real life data. But there is a need for caution in putting the experimental studies into real-life application.
The future challenge is how to detect deception behavior within the context of complex social interactions and how to develop paradigms in which subjects have a real choice as to whether and when to lie. The real intention of a subject to deceive the examiner is crucial.
The problem of giving instruction to lie eliminates the voluntary intention to deceive. There are no consequences for the subjects' action (negatively), no harm can come to anyone and we do not achieve a valid representation of the process of deceptive acts. In the future, we have to consider the pragmatics of human communication [28] in our experimental design.
The literature review identified those psychological behaviours that might plausibly be used to detect malicious intent and deceit in the context of port immigration and customs. In particular, it addressed the behaviours that are detectable in the visual domains of facial behaviour. Our research established a rich FACS coded database that is expected to contribute to future research developments. In addition, in order to increase the detection rate, we showed that it is worthwhile to consider machine learning algorithms as a tool to aid human decision in human behavioural analysis.
In future work, we will investigate the use of multi-modalities, which combine facial behavioral analysis, body language and voice analysis, verbal content and physiological methods (thermal analysis). Recently, researchers have also looked at self-deception [29]. Humans are poor in detecting deception therefore automated detection tools to augment human judgment can greatly increase detection accuracy. More research under a variety of contexts will determine which indicators and systems are the most reliable.