398 research outputs found

    Methods and applications of automatic speech recognition

    Get PDF
    Abstract. This thesis is an examination of automatic speech recognition in the form of a narrative literature review. Both past and present methods, and the applications of automatic speech recognition were looked at and examined. Prior research used for sources in this thesis consists of a wide variety of technical conference papers and journal articles on methods of automatic speech recognition, which has seen a lot of advancements throughout the years, and compilations of knowledge on both methods and applications in the form of books and literature reviews. For methods of automatic speech recognition, three of the seemingly most significant ones that were examined were dynamic time warping, hidden Markov models, and deep neural networks. The latter one, deep neural networks, seemed to be the most advanced and used one currently. Applications of automatic speech recognition were looked at with groupings based on their desired communication improvement target, improving either human-human communication or human-machine communication. From the first group, speech-to-speech translation and speech summarization were two popular applications that were examined. From the second group, virtual assistants were examined as an application group of its own, being an encompassing name for a general software agent doing tasks in response to human speech. The research presented on this thesis has the possibility to serve as a basis of future research on the subject of automatic speech recognition. Suggested avenues for this include a quantitative research analysis on either the performance of different methods, privacy aspects of different applications, or approaching the subject from the point of design science research by documenting construction of an automatic speech recognition application using modern methods.Tiivistelmä. Tässä tutkielmassa tutkittiin automaattista puheentunnista narratiivisen kirjallisuuskatsauksen muodossa. Tutkielmassa tarkasteltiin sekä menneitä että nykyisiä tunnetuimpia automaattisen puheentunnistuksen menetelmiä, sekä sen tunnetuimpia sovelluksia kahdesta eri kategoriasta. Aiempi tutkimusmateriaali, jota tutkielmassa käytettiin lähteenä, koostui laajasta valikoimasta erityyppistä aineistoa. Pääasiallisesti automaattisen puheentunnistuksen menetelmiin liittyvä aineisto löytyi konferenssipapereista sekä tieteellisiä lehtiartikkeleita. Vuosien saatossa kehittyneet teknologiat liittyen menetelmiin auttoi tarjoamaan myös monia vuosikymmeniä kattavan tarjonnan tutkimusmateriaalia. Sovelluksiin liittyvä tieto taas on poimittu lähinnä eri kirjoista, sekä muista alan kirjallisuuskatsauksista. Menetelmistä tutkittiin historiallisesti kolmea suosituinta menetelmätapaa, “dynamic time warping”, “hidden Markov models”, sekä “deep neural networks”. Näistä viimeisin, eli syvät neuroverkot, vaikutti olevan edistynein ja suosituin menetelmä nykypäivänä. Sovelluksia tutkittiin kahteen kategoriaan jaettuna. Ensimmäinen kategoria sisältää sovellukset, jotka pyrkivät parantamaan ihmisten välistä kommunikaatiota ja vuorovaikutusta. Tästä kategoriasta tutkittiin kahta suosittua sovellusta, “speech-to-speech translation”, eli reaaliaikaista puheen kääntämistä, sekä “speech summarization”, eli puheen yhteenvetoa. Toinen kategoria sisälsi sovellukset, jotka pyrkivät parantamaan ihmisten ja laitteiden välistä kommunikaatiota ja vuorovaikutusta. Tämän kategorian sovelluksista tutkittiin ehkäpä automaattisen puheentunnistuksen suosituinta sovellustyyppiä, virtuaalisia avustajia. Virtuaalisia avustajia tarkasteltiin yleisenä ohjelmistotyyppinä, jonka pääominaisuutena ja -tarkoituksena on suorittaa eri toimintoja vastauksena ihmisen antamiin puheohjauksiin. Tutkielmassa esitellyn tiedon pohjalta voidaan tehdä myös tulevaisuudessa enemmän tutkimusta. Esimerkkinä tästä olisi kvantitatiivinen tutkimus joko eri automaattisen puheentunnistuksen menetelmien tehokkuuksin, tai automaattisen puheentunnistuksen sovelluksien tietoturvan eri aspekteihin. Mahdollisuutena olisi myös tehdä konstruktiivista tutkimusta tästä aiheesta, rakentaen esimerkiksi automaattisen puheentunnistuksen sovelluksen käyttäen moderneja menetelmiä

    Type prediction in RDF knowledge bases using hierarchical multilabel classification

    Get PDF
    Large Semantic Web knowledge bases are often noisy, incorrect, and incomplete with respect to type information. Automatic type prediction can help reduce such incompleteness, and, as previous works show, statistical methods are well-suited for this kind of data. Since most Semantic Web knowledge bases come with an ontology defining a type hierarchy, in this paper, we rephrase the type prediction problem as a hierarchical multilabel classification problem. We propose SLCN, a modification of the local classifier per node approach, which performs feature selection, instance sampling, and class balancing for each local classifier. Our approach improves scalability, facilitating its application on large Semantic Web datasets with high-dimensional feature and label spaces. We compare the performance of our proposed method with a state-of-the-art type prediction approach and popular hierarchical multilabel classifiers, and report on experiments with large-scale RDF datasets

    Social Media Analysis for Social Good

    Get PDF
    Data on social media is abundant and offers valuable information that can be utilised for a range of purposes. Users share their experiences and opinions on various topics, ranging from their personal life to the community and the world, in real-time. In comparison to conventional data sources, social media is cost-effective to obtain, is up-to-date and reaches a larger audience. By analysing this rich data source, it can contribute to solving societal issues and promote social impact in an equitable manner. In this thesis, I present my research in exploring innovative applications using \ac{NLP} and machine learning to identify patterns and extract actionable insights from social media data to ultimately make a positive impact on society. First, I evaluate the impact of an intervention program aimed at promoting inclusive and equitable learning opportunities for underrepresented communities using social media data. Second, I develop EmoBERT, an emotion-based variant of the BERT model, for detecting fine-grained emotions to gauge the well-being of a population during significant disease outbreaks. Third, to improve public health surveillance on social media, I demonstrate how emotions expressed in social media posts can be incorporated into health mention classification using an intermediate task fine-tuning and multi-feature fusion approach. I also propose a multi-task learning framework to model the literal meanings of disease and symptom words to enhance the classification of health mentions. Fourth, I create a new health mention dataset to address the imbalance in health data availability between developing and developed countries, providing a benchmark alternative to the traditional standards used in digital health research. Finally, I leverage the power of pretrained language models to analyse religious activities, recognised as social determinants of health, during disease outbreaks
    corecore