Glottal Source Features for Automatic Speech-based Depression Assessment
Affiliation auteurs | !!!! Error affiliation !!!! |
Titre | Glottal Source Features for Automatic Speech-based Depression Assessment |
Type de publication | Conference Paper |
Year of Publication | 2017 |
Auteurs | Simantiraki O, Charonyktakis P, Pampouchidou A, Tsiknakis M, Cooker M |
Conference Name | 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION |
Publisher | Int Speech Commun Assoc; Stockholm Univ; KTH Royal Inst Technol; Karolinska Inst; Amazon Alexa; DiDi; Furhat Robot; Microsoft; EZ Alibaba Grp; CIRRUS LOGIC; CVTE; Google; Baidu; IBM Res; YAHOO Japan; Nuance; Voice Provider; ASM Solut Ltd; Mitsubishi Elect |
Conference Location | C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE |
ISBN Number | 978-1-5108-4876-4 |
Mots-clés | binary classification, glottal source, Machine learning, Phase Distortion Deviation |
Résumé | Depression is one of the most prominent mental disorders, with an increasing rate that makes it the fourth cause of disability worldwide. The field of automated depression assessment has emerged to aid clinicians in the form of a decision support system. Such a system could assist as a pre-screening tool, or even for monitoring high risk populations. Related work most commonly involves multimodal approaches, typically combining audio and visual signals to identify depression presence and/or severity. The current study explores categorical assessment of depression using audio features alone. Specifically, since depression-related vocal characteristics impact the glottal source signal, we examine Phase Distortion Deviation which has previously been applied to the recognition of voice qualities such as hoarseness, breathiness and creakiness, some of which are thought to be features of depressed speech. The proposed method uses as features DCT-coefficients of the Phase Distortion Deviation for each frequency band. An automated machine learning tool, Just Add Data, is used to classify speech samples. The method is evaluated on a benchmark dataset (AVEC2014), in two conditions: read-speech and spontaneous-speech. Our findings indicate that Phase Distortion Deviation is a promising audio-only feature for automated detection and assessment of depressed speech. |
DOI | 10.21437/Interspeech.2017-1251 |