1,031 research outputs found
Speech-based automatic depression detection via biomarkers identification and artificial intelligence approaches
Depression has become one of the most prevalent mental health issues, affecting more than 300 million people all over the world. However, due to factors such as limited medical resources and accessibility to health care, there are still a large number of patients undiagnosed. In addition, the traditional approaches to depression diagnosis have limitations because they are usually time-consuming, and depend on clinical experience that varies across different clinicians. From this perspective, the use of automatic depression detection can make the diagnosis process much faster and more accessible. In this thesis, we present the possibility of using speech for automatic depression detection. This is based on the findings in neuroscience that depressed patients have abnormal cognition mechanisms thus leading to the speech differs from that of healthy people.
Therefore, in this thesis, we show two ways of benefiting from automatic depression detection, i.e., identifying speech markers of depression and constructing novel deep learning models to improve detection accuracy.
The identification of speech markers tries to capture measurable depression traces left in speech. From this perspective, speech markers such as speech duration, pauses and correlation matrices are proposed. Speech duration and pauses take speech fluency into account, while correlation matrices represent the relationship between acoustic features and aim at capturing psychomotor retardation in depressed patients. Experimental results demonstrate that these proposed markers are effective at improving the performance in recognizing depressed speakers. In addition, such markers show statistically significant differences between depressed patients and non-depressed individuals, which explains the possibility of using these markers for depression detection and further confirms that depression leaves detectable traces in speech.
In addition to the above, we propose an attention mechanism, Multi-local Attention (MLA), to emphasize depression-relevant information locally. Then we analyse the effectiveness of MLA on performance and efficiency. According to the experimental results, such a model can significantly improve performance and confidence in the detection while reducing the time required for recognition. Furthermore, we propose Cross-Data Multilevel Attention (CDMA) to emphasize different types of depression-relevant information, i.e., specific to each type of speech and common to both, by using multiple attention mechanisms. Experimental results demonstrate that the proposed model is effective to integrate different types of depression-relevant information in speech, improving the performance significantly for depression detection
Predicate Matrix: an interoperable lexical knowledge base for predicates
183 p.La Matriz de Predicados (Predicate Matrix en inglés) es un nuevo recurso léxico-semántico resultado de la integración de múltiples fuentes de conocimiento, entre las cuales se encuentran FrameNet, VerbNet, PropBank y WordNet. La Matriz de Predicados proporciona un léxico extenso y robusto que permite mejorar la interoperabilidad entre los recursos semánticos mencionados anteriormente. La creación de la Matriz de Predicados se basa en la integración de Semlink y nuevos mappings obtenidos utilizando métodos automáticos que enlazan el conocimiento semántico a nivel léxico y de roles. Asimismo, hemos ampliado la Predicate Matrix para cubrir los predicados nominales (inglés, español) y predicados en otros idiomas (castellano, catalán y vasco). Como resultado, la Matriz de predicados proporciona un léxico multilingüe que permite el análisis semántico interoperable en múltiples idiomas
Comparing the production of a formula with the development of L2 competence
This pilot study investigates the production of a formula with the development of L2 competence over proficiency levels of a spoken learner corpus. The results show that the formula
in beginner production data is likely being recalled holistically from learners’ phonological
memory rather than generated online, identifiable by virtue of its fluent production in absence
of any other surface structure evidence of the formula’s syntactic properties. As learners’ L2
competence increases, the formula becomes sensitive to modifications which show structural
conformity at each proficiency level. The transparency between the formula’s modification
and learners’ corresponding L2 surface structure realisations suggest that it is the independent
development of L2 competence which integrates the formula into compositional language,
and ultimately drives the SLA process forward
Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment
Social media is awash with hateful content, much of which is often veiled
with linguistic and topical diversity. The benchmark datasets used for hate
speech detection do not account for such divagation as they are predominantly
compiled using hate lexicons. However, capturing hate signals becomes
challenging in neutrally-seeded malicious content. Thus, designing models and
datasets that mimic the real-world variability of hate warrants further
investigation.
To this end, we present GOTHate, a large-scale code-mixed crowdsourced
dataset of around 51k posts for hate speech detection from Twitter. GOTHate is
neutrally seeded, encompassing different languages and topics. We conduct
detailed comparisons of GOTHate with the existing hate speech datasets,
highlighting its novelty. We benchmark it with 10 recent baselines. Our
extensive empirical and benchmarking experiments suggest that GOTHate is hard
to classify in a text-only setup. Thus, we investigate how adding endogenous
signals enhances the hate speech detection task. We augment GOTHate with the
user's timeline information and ego network, bringing the overall data source
closer to the real-world setup for understanding hateful content. Our proposed
solution HEN-mBERT is a modular, multilingual, mixture-of-experts model that
enriches the linguistic subspace with latent endogenous signals from history,
topology, and exemplars. HEN-mBERT transcends the best baseline by 2.5% and 5%
in overall macro-F1 and hate class F1, respectively. Inspired by our
experiments, in partnership with Wipro AI, we are developing a semi-automated
pipeline to detect hateful content as a part of their mission to tackle online
harm.Comment: 15 pages, 4 figures, 11 tables. Accepted at SIGKDD'2
Workshop Proceedings of the 12th edition of the KONVENS conference
The 2014 issue of KONVENS is even more a forum for exchange: its main topic is the interaction between Computational Linguistics and Information Science, and the synergies such interaction, cooperation and integrated views can produce. This topic at the crossroads of different research traditions which deal with natural language as a container of knowledge, and with methods to extract and manage knowledge that is linguistically represented is close to the heart of many researchers at the Institut für Informationswissenschaft und Sprachtechnologie of Universität Hildesheim: it has long been one of the institute’s research topics, and it has received even more attention over the last few years
Formulaic sequences in Early Modern English: A corpus-assisted historical pragmatic study
This doctoral project identifies formulaic sequences (hereinafter FS and the plural form FSs) in Early Modern English (hereinafter EModE) and intends to investigate the functions they serve in communication and different text types, namely EModE dialogues and letters. Main contributions of the study include, firstly, the study provides solid arguments and further evidence that FSs are constructions in the Construction Grammar instead of exceptions in the traditional grammar-dictionary model. Within this theoreticall framework, I proposed a new working definition of FSs that is inclusive, descriptive, and methodologically neutral. The study also argues that there are fundamental differences between FSs and lexical bundles (LBs), although the latter often treated as an alternative term of FSs or sub-groups of FSs. Nevertheless, after a thorogh review of the characteristics of the two mult-word units, the study argues that despite of the differences, LBs can be upgrated to FSs as long as they fulfill certail sematic, syntactic, and pragmatic criteria. THis forms the fundation of the methodology design of the study. Secondly, the study enhanced the corpus-assisted approach to the identification of FSs, esp. in EModE texts. The approach consists of three steps: preparation, identification, and generalisation. The identification step was further conducted within two phases: automatic generation of LBs for a corpus and manual identification of FSs from LBs. Specifically, in the preparation step, the dissertation critically discussed how spelling variation in EModE texts shall be dealt with in investigations on FSs. I designed a series of criteria for the two-phase identification of FSs. For one thing, I disagree with previous research that two-word LBs shall be excluded from examination by arguing that many of them are formulaic and cannot be captured from longer LBs and the workload of processing the massive number of two-word LBs is actually manageable. For another, the study contributes an easy-to-follow flow chart demonstrating the procedure of the manual identification of FSs from LBs and listing the criteria that guide the decision-making process. Thirdly, the study provides systematic and comprehensive accounts of FSs in EModE dialogues and letters, esp. how their forms are conventionally mapped to their functions. Data analysis were conducted from aspects such as degree of fixedness, grammatical structures, distribution across function categories, multi-functional FSs, genre-specific FSs, etc. General findings suggest that EModE dialogues and letters actually have many similarities regarding the form and function of FSs and general trends of distribution across function categories. However, outstanding differences between the two text types can be observed too. From the perspective of form, the distinction lies in word choice in realisations of certain FSs. From the perspective of meaning/function, the distinction lies in the kinds of functions that need FSs the most or the least and common function combinations. More importantly, the study observed two types of relationships among FSs themselves and the discourse, including horizonal networks and vertical networks, which reflects the complexity of FSs and their identity as constructions. Specifically, three types of horizontal networks of FSs are embedding, attaching, and joining. A pair of new concepts is proposed to describe the vertical networks: superordinate FSs and subordinate FSs. As a result of the vertical networks, three types of functional diviation are observed: function extension, shifting, and specification
Language Ideologies of Multilingual Learners in an Intensive English Program
Despite some rises and falls in the numbers due to various reasons, including the political climate in the Trump era and the COVID-19 pandemic (Laws & Ammigan, 2020), each year universities in the United States host a large number of multilingual international students from different parts of the world. Based on their TOEFL scores, many are required to enroll in an accelerated course of study in academic English, commonly known as the Intensive English Program (IEP) before they can begin their mainstream academic programs. Where there is language, there are language ideologies. Yet, often in monolingual, English-only classrooms, little is known by the instructors and, at times, by the learners themselves, about their linguistic and cultural repertoire and its potential influence on their language learning. This multilayered qualitative analysis explores the language ideologies and conceptualizations of multilingual learners in an IEP. The themes that emerged from the data include ideologies about multilingualism and English, language teaching and learning, raciolinguistic experiences, and the participants’ practice and ideologies pertaining to translanguaging. A critical metaphor analysis was also conducted to explore the participants’ subconscious conceptualizations about language. This analysis reveals the differences in the participants’ conceptualizations of their mother tongues and English. The study highlights the ways in which the language ideologies of multilingual learners in the IEP influence their acquisition of English and offers an insight into how they use their multilingual repertoire to learn English. The work concludes with practical implications for supporting multilingual learners in IEPs.
Advisor: Theresa Catalan
Examining the Relationships Between Distance Education Students’ Self-Efficacy and Their Achievement
This study aimed to examine the relationships between students’ self-efficacy (SSE) and students’ achievement (SA) in distance education. The instruments were administered to 100 undergraduate students in a distance university who work as migrant workers in Taiwan to gather data, while their SA scores were obtained from the university. The semi-structured interviews for 8 participants consisted of questions that showed the specific conditions of SSE and SA. The findings of this study were reported as follows: There was a significantly positive correlation between targeted SSE (overall scales and general self-efficacy) and SA. Targeted students' self-efficacy effectively predicted their achievement; besides, general self- efficacy had the most significant influence. In the qualitative findings, four themes were extracted for those students with lower self-efficacy but higher achievement—physical and emotional condition, teaching and learning strategy, positive social interaction, and intrinsic motivation. Moreover, three themes were extracted for those students with moderate or higher self-efficacy but lower achievement—more time for leisure (not hard-working), less social interaction, and external excuses. Providing effective learning environments, social interactions, and teaching and learning strategies are suggested in distance education
A Qualitative Research Study Exploring the Lived Experiences of K-12 School Teachers Who Serve African American Students Who Speak African American Vernacular English
The purpose of the proposed transcendental phenomenological study will be to explore the perceptions and lived experiences of k-12 schoolteachers working with African American students and their usage of AAVE in the State of Maryland. The theories guiding this study are critical race theory and Krashen’s second language acquisition theory. This qualitative transcendental phenomenological study will collect data from 12 participants in the state of Maryland using individual interviews, focus groups, and letters. The data will then be analyzed using Braun and Clarke\u27s qualitative descriptive research data analysis plan. The data showed that k-12 schoolteachers mainly perceived AAVE as a community dialect. Although some teachers believed that AAVE was an acceptable academic dialect, there was a common theme among the participants preferring African American students to code-switch when necessary
Language variation, automatic speech recognition and algorithmic bias
In this thesis, I situate the impacts of automatic speech recognition systems in relation to sociolinguistic theory (in particular drawing on concepts of language variation, language ideology
and language policy) and contemporary debates in AI ethics (especially regarding algorithmic
bias and fairness). In recent years, automatic speech recognition systems, alongside other
language technologies, have been adopted by a growing number of users and have been embedded in an increasing number of algorithmic systems. This expansion into new application
domains and language varieties can be understood as an expansion into new sociolinguistic
contexts. In this thesis, I am interested in how automatic speech recognition tools interact
with this sociolinguistic context, and how they affect speakers, speech communities and their
language varieties.
Focussing on commercial automatic speech recognition systems for British Englishes, I first
explore the extent and consequences of performance differences of these systems for different user groups depending on their linguistic background. When situating this predictive bias
within the wider sociolinguistic context, it becomes apparent that these systems reproduce and
potentially entrench existing linguistic discrimination and could therefore cause direct and indirect harms to already marginalised speaker groups. To understand the benefits and potentials
of automatic transcription tools, I highlight two case studies: transcribing sociolinguistic data
in English and transcribing personal voice messages in isiXhosa. The central role of the sociolinguistic context in developing these tools is emphasised in this comparison. Design choices,
such as the choice of training data, are particularly consequential because they interact with existing processes of language standardisation. To understand the impacts of these choices, and
the role of the developers making them better, I draw on theory from language policy research
and critical data studies. These conceptual frameworks are intended to help practitioners and
researchers in anticipating and mitigating predictive bias and other potential harms of speech
technologies. Beyond looking at individual choices, I also investigate the discourses about language variation and linguistic diversity deployed in the context of language technologies. These
discourses put forward by researchers, developers and commercial providers not only have a
direct effect on the wider sociolinguistic context, but they also highlight how this context (e.g.,
existing beliefs about language(s)) affects technology development. Finally, I explore ways of
building better automatic speech recognition tools, focussing in particular on well-documented,
naturalistic and diverse benchmark datasets. However, inclusive datasets are not necessarily
a panacea, as they still raise important questions about the nature of linguistic data and language variation (especially in relation to identity), and may not mitigate or prevent all potential
harms of automatic speech recognition systems as embedded in larger algorithmic systems
and sociolinguistic contexts
- …