Search CORE

23 research outputs found

Automatic emotion detector results.

Author: Charles Moehs (432147)
David Han (432143)
Edward Hill (196034)
John Giordano (432148)
Kenneth Blum (277781)
Marlene Oscar-Berman (398828)
Najim Dehak (432145)
Pierre Dumouchel (432144)
Thomas Quatieri (432146)
Thomas Simpatico (432149)
Publication venue
Publication date
Field of study

The overall accuracy of the speech emotion detector is 62.58% (95% CI: 61.5%–63.6%) The concordance matrix of predicted values from the emotion detector versus the labeled emotion is presented on the left of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0069043#pone-0069043-g007" target="_blank">figure 7</a>. The Diagonal provides the accuracy of each emotional class (predicted emotion = actual emotion). Off-diagonal cells give percentages of false recognition (e.g. anxious accuracy was 72%, with 14% anxious recordings falsely categorized as okay or neutral, 8% falsely categorized as happy, 4% falsely categorized as sad, and 2% falsely categorized as angry). The heat map on the right graphically depicts the concordance matrix with correct predictions on the diagonal (predicted class is flipped upside down).</p

FigShare

Gender and language of the research participants.

Author: Charles Moehs (432147)
David Han (432143)
Edward Hill (196034)
John Giordano (432148)
Kenneth Blum (277781)
Marlene Oscar-Berman (398828)
Najim Dehak (432145)
Pierre Dumouchel (432144)
Thomas Quatieri (432146)
Thomas Simpatico (432149)
Publication venue
Publication date
Field of study

Gender and language of the research participants.</p

FigShare

Significant differences in emotional empathy across groups.

Author: Charles Moehs (432147)
David Han (432143)
Edward Hill (196034)
John Giordano (432148)
Kenneth Blum (277781)
Marlene Oscar-Berman (398828)
Najim Dehak (432145)
Pierre Dumouchel (432144)
Thomas Quatieri (432146)
Thomas Simpatico (432149)
Publication venue
Publication date
Field of study

Interestingly, and as can be seen in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0069043#pone-0069043-g011" target="_blank">Figure 11</a>, the SUBX patients were more empathic to the neutral emotion state (76.5%; CI: 72.3–80.2) than AA members (p = 0.022) (71.7%; CI: 68.9–74.3). AA members were less empathic to anxiety (90.4%; CI: 86.7–93.1) than the GP (p = 0.022) (93.5%; CI: 91.8–94.8) and SUBX patients (p = 0.048) (93.5%; CI: 90.3–95.7).</p

FigShare

Activation-Evaluation Emotional space.

Author: Charles Moehs (432147)
David Han (432143)
Edward Hill (196034)
John Giordano (432148)
Kenneth Blum (277781)
Marlene Oscar-Berman (398828)
Najim Dehak (432145)
Pierre Dumouchel (432144)
Thomas Quatieri (432146)
Thomas Simpatico (432149)
Publication venue
Publication date
Field of study

The activation dimension (a.k.a. arousal dimension) refers to the degree of intensity (loudness, energy) in the emotional speech; and the evaluation dimension refers to how positive or negative the emotion is perceived <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0069043#pone.0069043-Tato1" target="_blank">[37]</a>. Emotional states with high and low level of arousal are hardly ever confused, but it is difficult to determine the emotion of a person with flat affect <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0069043#pone.0069043-Arnott1" target="_blank">[36]</a>. Emotions that are close in the activation-evaluation emotional space (flat affect) often tend to be confused <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0069043#pone.0069043-Tato1" target="_blank">[37]</a>.</p

FigShare

An Interactive Voice Response dialogue.

Author: Charles Moehs (432147)
David Han (432143)
Edward Hill (196034)
John Giordano (432148)
Kenneth Blum (277781)
Marlene Oscar-Berman (398828)
Najim Dehak (432145)
Pierre Dumouchel (432144)
Thomas Quatieri (432146)
Thomas Simpatico (432149)
Publication venue
Publication date
Field of study

The Voice User Interface (VUI) dialogue was carefully crafted to (1) capture a patient’s emotional expression, emotional self-assessment, and empathic assessment of another human’s emotional expression; and (2) to avoid subject-burden and training. The average call length is 12 seconds thus alleviating subject-burden (post collection surveys indicate ease-of-use. Call completion rates were 40% (95% CI: 33.6–46.7) (p = 0.003). Emotional expression in speech is elicited by asking the quintessential question “how do you feel?” It is human nature to colour our response to this question with emotion <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0069043#pone.0069043-Scott1" target="_blank">[27]</a>. Emotional self-assessment is captured by asking the patient to identify their emotional state from the emotion set: (Neutral, Happy, Sad, Angry and Anxious) by selecting the corresponding choice on their DTMF telephone keypad. The system captures empathy by prompting the patient with: “guess the emotion of the following speaker” followed by the playback of a randomly selected previously captured speech recording from another patient. The patient listens to the emotionally charged speech recording and registers an empathy assessment by selecting the corresponding choice from the emotion set on their DTMF telephone keypad.</p

FigShare

Frequency of emotional states collected per participant.

Author: Charles Moehs (432147)
David Han (432143)
Edward Hill (196034)
John Giordano (432148)
Kenneth Blum (277781)
Marlene Oscar-Berman (398828)
Najim Dehak (432145)
Pierre Dumouchel (432144)
Thomas Quatieri (432146)
Thomas Simpatico (432149)
Publication venue
Publication date
Field of study

Trial data capture is multilevel with emotional state samples grouped within patients. Frequencies of samples per patient are skewed towards a Poisson distribution; typical of ESM data collections. The mean is 64.4 and the median is 36.5 momentary emotional states per patient. On average participants answered 41% of emotion collection calls. SUBX patients answered significantly fewer calls (18.6%) as compared to the General Population (56.4%) and members of Alcoholics Anonymous (49.3%).</p

FigShare

Two stages of emotion detection: model training and real time detection.

Author: Charles Moehs (432147)
David Han (432143)
Edward Hill (196034)
John Giordano (432148)
Kenneth Blum (277781)
Marlene Oscar-Berman (398828)
Najim Dehak (432145)
Pierre Dumouchel (432144)
Thomas Quatieri (432146)
Thomas Simpatico (432149)
Publication venue
Publication date
Field of study

<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0069043#pone-0069043-g006" target="_blank">Figure 6</a> depicts the model training and detection stages of the emotion detector. Models are trained in the left pane of the figure. Emotion detection classification computes the most likely emotion using the trained models as shown in the right pane. Speech Activity Detection and Feature Extraction are identical in both model training and classification.</p

FigShare

Example of calculation of emotion from four sources.

Author: Charles Moehs (432147)
David Han (432143)
Edward Hill (196034)
John Giordano (432148)
Kenneth Blum (277781)
Marlene Oscar-Berman (398828)
Najim Dehak (432145)
Pierre Dumouchel (432144)
Thomas Quatieri (432146)
Thomas Simpatico (432149)
Publication venue
Publication date
Field of study

Example of calculation of emotion from four sources.</p

FigShare

Significant differences in emotional expressiveness across groups.

Author: Charles Moehs (432147)
David Han (432143)
Edward Hill (196034)
John Giordano (432148)
Kenneth Blum (277781)
Marlene Oscar-Berman (398828)
Najim Dehak (432145)
Pierre Dumouchel (432144)
Thomas Quatieri (432146)
Thomas Simpatico (432149)
Publication venue
Publication date
Field of study

<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0069043#pone-0069043-g012" target="_blank">Figure 12</a> shows that the SUBX group had significantly less emotional expressiveness, as measured by length of speech, than both the GP group and the AA group (p<0.0001). It may be difficult to determine the emotion of SUBX patients, both by humans and by the automatic detector, due to flatter affect. The average audio response to “How are you feeling?” was (3.07 seconds; CI: 2.89–3.25). SUBX patients’ responses were significantly shorter (2.39 seconds; CI: 2.05–2.78)) than both the GP (p<0.0001) (3.46; CI: 3.15–2.80) and AA members (p<0.0001) (3.31; CI: 2.97–3.68). In terms of emotional expressiveness as measured by confidence scores, the SUBX group also showed significantly lower scores than both the GP and the AA groups. There was significantly less confidence in SUBX patients’ audio responses (72%; CI: 0.69–0.74) than the GP (p = 0.038) (74%; CI: 0.73–0.76) and AA members (p = 0.018) (75%; CI: 0.73–0.77).</p

FigShare

Patient Momentary Emotional State collection through the Interactive Voice Response system.

Author: Charles Moehs (432147)
David Han (432143)
Edward Hill (196034)
John Giordano (432148)
Kenneth Blum (277781)
Marlene Oscar-Berman (398828)
Najim Dehak (432145)
Pierre Dumouchel (432144)
Thomas Quatieri (432146)
Thomas Simpatico (432149)
Publication venue
Publication date
Field of study

Patient-reported-outcome (PRO) Experience Sampling Method (ESM) data collection places considerable demands on participants. Success of an ESM data collection depends upon participant compliance with the sampling protocol. Participants must record an ESM at least 20% of the time when requested to do so; otherwise the validity of the protocol is questionable. The problem of “hoarding” – where reports are collected and completed at a later date – must be avoided. Stone et al confirmed this concern through a study and found only 11% of pen-and-pencil diaries where compliant; 89% of participants missed entries, or hoarded entries and bulk entered them later. <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0069043#pone.0069043-Stone2" target="_blank">[58]</a> IVR systems overcome hoarding by time-sampling and improve compliance by allowing researchers to actively place outgoing calls to participants in order to more dynamically sample their experience. Rates of compliance in IVR sampling literature vary from as high as 96% to as low as 40% <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0069043#pone.0069043-Hufford1" target="_blank">[59]</a> Subject burden has also been studied as a factor effecting compliance rates. At least six different aspects affect participant burden: Density of sampling (times per day); length of PRO assessments; the user interface of the reporting platform; the complexity of PRO assessments (i.e. the cognitive load, or effort, required to complete the assessments); duration of monitoring; and stability of the reporting platform <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0069043#pone.0069043-Hufford1" target="_blank">[59]</a>. Researchers have been known to improve compliance through extensive training of participants <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0069043#pone.0069043-Stone2" target="_blank">[58]</a>. Extensive training is impractical for automated ESM systems. Patients were called by the IVR system at designated times thus overcoming hoarding. A simple intuitive prompt: “How are you feeling?” elicited emotional state response (e.g., “I am angry!”); no training was required. The audio response is recorded on the web server for analysis. The IVR system was implemented through the W3C standards CCXML and VoiceXML on a Linux-Apache-MySQL-PHP (LAMP) server cluster.</p

FigShare