5 research outputs found

    Schizophrenia Detection Using Machine Learning Approach from Social Media Content

    No full text
    Schizophrenia is a severe mental disorder that ranks among the leading causes of disability worldwide. However, many cases of schizophrenia remain untreated due to failure to diagnose, self-denial, and social stigma. With the advent of social media, individuals suffering from schizophrenia share their mental health problems and seek support and treatment options. Machine learning approaches are increasingly used for detecting schizophrenia from social media posts. This study aims to determine whether machine learning could be effectively used to detect signs of schizophrenia in social media users by analyzing their social media texts. To this end, we collected posts from the social media platform Reddit focusing on schizophrenia, along with non-mental health related posts (fitness, jokes, meditation, parenting, relationships, and teaching) for the control group. We extracted linguistic features and content topics from the posts. Using supervised machine learning, we classified posts belonging to schizophrenia and interpreted important features to identify linguistic markers of schizophrenia. We applied unsupervised clustering to the features to uncover a coherent semantic representation of words in schizophrenia. We identified significant differences in linguistic features and topics including increased use of third person plural pronouns and negative emotion words and symptom-related topics. We distinguished schizophrenic from control posts with an accuracy of 96%. Finally, we found that coherent semantic groups of words were the key to detecting schizophrenia. Our findings suggest that machine learning approaches could help us understand the linguistic characteristics of schizophrenia and identify schizophrenia or otherwise at-risk individuals using social media texts

    K-EPIC: Entity-Perceived Context Representation in Korean Relation Extraction

    No full text
    Relation Extraction (RE) aims to predict the correct relation between two entities from the given sentence. To obtain the proper relation in Relation Extraction (RE), it is significant to comprehend the precise meaning of the two entities as well as the context of the sentence. In contrast to the RE research in English, Korean-based RE studies focusing on the entities and preserving Korean linguistic properties rarely exist. Therefore, we propose K-EPIC (Entity-Perceived Context representation in Korean) to ensure enhanced capability for understanding the meaning of entities along with considering linguistic characteristics in Korean. We present the experimental results on the BERT-Ko-RE and KLUE-RE datasets with four different types of K-EPIC methods, utilizing entity position tokens. To compare the ability of understanding entities and context of Korean pre-trained language models, we analyze HanBERT, KLUE-BERT, KoBERT, KorBERT, KoELECTRA, and multilingual-BERT (mBERT). The experimental results demonstrate that the F1 score increases significantly with our K-EPIC and that the performance of the language models trained with the Korean corpus outperforms the baseline

    There is no rose without a thorn: Finding weaknesses on BlenderBot 2.0 in terms of Model, Data and User-Centric Approach

    Full text link
    BlenderBot 2.0 is a dialogue model that represents open-domain chatbots by reflecting real-time information and remembering user information for an extended period using an internet search module and multi-session. Nonetheless, the model still has room for improvement. To this end, we examine BlenderBot 2.0 limitations and errors from three perspectives: model, data, and user. From the data point of view, we highlight the unclear guidelines provided to workers during the crowdsourcing process, as well as a lack of a process for refining hate speech in the collected data and verifying the accuracy of internet-based information. From a user perspective, we identify nine types of limitations of BlenderBot 2.0, and their causes are thoroughly investigated. Furthermore, for each point of view, we propose practical improvement methods and discuss several potential future research directions.Comment: English and Extention Version of "Empirical study on BlenderBot 2.0 errors analysis in terms of model, data and dialogue" (Journal of the Korea Convergence Society

    Empirical Analysis of Parallel Corpora and In-Depth Analysis Using LIWC

    No full text
    The machine translation system aims to translate source language into target language. Recent studies on MT systems mainly focus on neural machine translation. One factor that significantly affects the performance of NMT is the availability of high-quality parallel corpora. However, high-quality parallel corpora concerning Korean are relatively scarce compared to those associated with other high-resource languages, such as German or Italian. To address this problem, AI Hub recently released seven types of parallel corpora for Korean. In this study, we conduct an in-depth verification of the quality of corresponding parallel corpora through Linguistic Inquiry and Word Count (LIWC) and several relevant experiments. LIWC is a word-counting software program that can analyze corpora in multiple ways and extract linguistic features as a dictionary base. To the best of our knowledge, this study is the first to use LIWC to analyze parallel corpora in the field of NMT. Our findings suggest the direction of further research toward obtaining the improved quality parallel corpora through our correlation analysis in LIWC and NMT performance

    Empirical Analysis of Parallel Corpora and In-Depth Analysis Using LIWC

    No full text
    The machine translation system aims to translate source language into target language. Recent studies on MT systems mainly focus on neural machine translation. One factor that significantly affects the performance of NMT is the availability of high-quality parallel corpora. However, high-quality parallel corpora concerning Korean are relatively scarce compared to those associated with other high-resource languages, such as German or Italian. To address this problem, AI Hub recently released seven types of parallel corpora for Korean. In this study, we conduct an in-depth verification of the quality of corresponding parallel corpora through Linguistic Inquiry and Word Count (LIWC) and several relevant experiments. LIWC is a word-counting software program that can analyze corpora in multiple ways and extract linguistic features as a dictionary base. To the best of our knowledge, this study is the first to use LIWC to analyze parallel corpora in the field of NMT. Our findings suggest the direction of further research toward obtaining the improved quality parallel corpora through our correlation analysis in LIWC and NMT performance
    corecore