5 research outputs found

    Forensic authorship classification by paragraph vectors of speech transcriptions

    Get PDF
    In forensic comparison, document classification techniques are used mainly for authorship classification and author profiling. In the present study, we aim to introduce paragraph vector modelling (by Doc2Vec) into the likelihoodratio framework paradigm of forensic evidence comparison. Transcriptions of spontaneous speech recording are used as input to paragraph vector extraction model training. Logistic regression models are trained based on cosine distances of paragraph vector pairs to predict the same and different author origin probability. Results are evaluated according to different speaking styles (transcriptions of speech tasks available in the dataset). Cllr and equal error rate values (lowest ones are 0.47 and 0.11, respectively) show that the method can be useful as a feature for forensic authorship comparison and may extend the voice comparison methods for speaker verification

    Classifying Challenging Behaviors in Autism Spectrum Disorder with Neural Document Embeddings

    Get PDF
    The understanding and treatment of challenging behaviors in individuals with Autism Spectrum Disorder is paramount to enabling the success of behavioral therapy; an essential step in this process being the labeling of challenging behaviors demonstrated in therapy sessions. These manifestations differ across individuals and within individuals over time and thus, the appropriate classification of a challenging behavior when considering purely qualitative factors can be unclear. In this thesis we seek to add quantitative depth to this otherwise qualitative task of challenging behavior classification. We do so through the application of natural language processing techniques to behavioral descriptions extracted from the CARD Skills dataset. Specifically, we construct 3 sets of 50-dimensional document embeddings to represent the 1,917 recorded instances of challenging behaviors demonstrated in Applied Behavior Analysis therapy. These embeddings are learned through three processes: a TF-IDF weighted sum of Word2Vec embeddings, Doc2Vec embeddings which use hierarchical softmax as an output layer, and Doc2Vec which optimizes the original Doc2Vec architecture through Negative Sampling. Once created, these embeddings are initially used as input to a Support Vector Machine classifier to demonstrate the success of binary classification within this problem set. This preliminary exploration achieves promising classification accuracies ranging from 78.2-100% and establishes the separability of challenging behaviors given their neural embeddings. We next construct a multi-class classification model via a Gaussian Process Classifier fitted with Laplace approximation. This classification model, trained on an 80/20 stratified split of the seven most frequently occurring behaviors in the dataset, produces an accuracy of 82.7%. Through this exploration we demonstrate that the semantic queues derived from the language of challenging behavior descriptions, modeled using natural language processing techniques, can be successfully leveraged in classification architectures. This study represents the first of its kind, providing a proof of concept for the application of machine learning to the observations of challenging behaviors demonstrated in ASD with the ultimate goal of improving the efficacy of the behavioral treatments which intrinsically rely on the accurate identification of these behaviors

    Detecting Popularity of Ideas and Individuals in Online Community

    Get PDF
    Research in the last decade has prioritized the effects of online texts and online behaviors on user information prediction. However, the previous research overlooks the overall meaning of online texts and more detailed features about users’ online behaviors. The purpose of the research is to detect the adopted ideas, the popularity of ideas, and the popularity of individuals by identifying the overall meaning of online texts and the centrality features based on user’s online interactions within an online community. To gain insights into the research questions, the online discussions on MyStarbucksIdea website is examined in this research. MyStarbucksIdea had launched since 2008 that encouraged people to submit new ideas for improving Starbuck’s products and services. Starbucks had adopted hundreds of ideas from this crowdsourcing platform. Based on the example of the MyStarbucksIdea community, a new document representation approach, Doc2Vec, synthesized with the users’ centrality features was unitized in this research. Additionally, it also is essential to study the surface-level features of online texts, the sentiment features of online texts, and the features of users’ online behaviors to determine the idea adoption as well as the popularity of ideas and individuals in the online community. Furthermore, supervised machine learning approaches, including Logistic Regression, Support Vector Machine, and Random Forest, with the adjustments for the imbalanced classes, served as the classifiers for the experiments. The results of the experiments showed that the classifications of the idea adoption, the popularity of ideas, and the popularity of individuals were all considered successful. The overall meaning of idea texts and user’s centrality features were most accurate in detecting the adopted ideas and the popularity of ideas. The overall meaning of idea texts and the features of users’ online behaviors were most accurate in detecting the popularity of individuals. These results are in accord with the results of the previous studies, which used behavioral and textual features to predict user information and enhance the previous studies\u27 results by providing the new document embedding approach and the centrality features. The models used in this research can become a much-needed tool for the popularity predictions of future research

    XVIII. Magyar Számítógépes Nyelvészeti Konferencia

    Get PDF