15 research outputs found

    TwistBytes : hierarchical classification at GermEval 2019 : walking the fine line (of recall and precision)

    Get PDF
    We present here our approach to the GermEval 2019 Task 1 - Shared Task on hierarchical classification of German blurbs. We achieved the first place in the hierarchical subtask B and second place on the root node, flat classification subtask A. In subtask A, we applied a simple multi-feature TF-IDF extraction method using different n-gram range and stopword removal, on each feature extraction module. The classifier on top was a standard linear SVM. For the hierarchical classification, we used a local approach, which was more lightweighted but was similar to the one used in subtask A. The key point of our approach was the application of a post-processing to cope with the multi-label aspect of the task, increasing the recall but not surpassing the precision measure score

    Scalable Multi-label Classification

    Get PDF
    Multi-label classification is relevant to many domains, such as text, image and other media, and bioinformatics. Researchers have already noticed that in multi-label data, correlations exist between labels, and a variety of approaches, drawing inspiration from many spheres of machine learning, have been able to model these correlations. However, data sources from the real world are growing ever larger and the multi-label task is particularly sensitive to this due to the complexity associated with multiple labels and the correlations between them. Consequently, many methods do not scale up to large problems. This thesis deals with scalable multi-label classification: methods which exhibit high predictive performance, but are also able to scale up to larger problems. The first major contribution is the pruned sets method, which is able to model label correlations directly for high predictive performance, but reduces overfitting and complexity over related methods by pruning and subsampling label sets, and can thus scale up to larger datasets. The second major contribution is the classifier chains method, which models correlations with a chain of binary classifiers. The use of binary models allows for scalability to even larger datasets. Pruned sets and classifier chains are robust with respect to both the variety and scale of data that they can deal with, and can be incorporated into other methods. In an ensemble scheme, these methods are able to compete with state-of-the-art methods in terms of predictive performance as well as scale up to large datasets of hundreds of thousands of training examples. This thesis also puts a special emphasis on multi-label evaluation; introducing a new evaluation measure and studying threshold calibration. With one of the largest and most varied collections of multi-label datasets in the literature, extensive experimental evaluation shows the advantage of these methods, both in terms of predictive performance, and computational efficiency and scalability

    A Context Aware Classification System for Monitoring Driver’s Distraction Levels

    Get PDF
    Understanding the safety measures regarding developing self-driving futuristic cars is a concern for decision-makers, civil society, consumer groups, and manufacturers. The researchers are trying to thoroughly test and simulate various driving contexts to make these cars fully secure for road users. Including the vehicle’ surroundings offer an ideal way to monitor context-aware situations and incorporate the various hazards. In this regard, different studies have analysed drivers’ behaviour under different case scenarios and scrutinised the external environment to obtain a holistic view of vehicles and the environment. Studies showed that the primary cause of road accidents is driver distraction, and there is a thin line that separates the transition from careless to dangerous. While there has been a significant improvement in advanced driver assistance systems, the current measures neither detect the severity of the distraction levels nor the context-aware, which can aid in preventing accidents. Also, no compact study provides a complete model for transitioning control from the driver to the vehicle when a high degree of distraction is detected. The current study proposes a context-aware severity model to detect safety issues related to driver’s distractions, considering the physiological attributes, the activities, and context-aware situations such as environment and vehicle. Thereby, a novel three-phase Fast Recurrent Convolutional Neural Network (Fast-RCNN) architecture addresses the physiological attributes. Secondly, a novel two-tier FRCNN-LSTM framework is devised to classify the severity of driver distraction. Thirdly, a Dynamic Bayesian Network (DBN) for the prediction of driver distraction. The study further proposes the Multiclass Driver Distraction Risk Assessment (MDDRA) model, which can be adopted in a context-aware driving distraction scenario. Finally, a 3-way hybrid CNN-DBN-LSTM multiclass degree of driver distraction according to severity level is developed. In addition, a Hidden Markov Driver Distraction Severity Model (HMDDSM) for the transitioning of control from the driver to the vehicle when a high degree of distraction is detected. This work tests and evaluates the proposed models using the multi-view TeleFOT naturalistic driving study data and the American University of Cairo dataset (AUCD). The evaluation of the developed models was performed using cross-correlation, hybrid cross-correlations, K-Folds validation. The results show that the technique effectively learns and adopts safety measures related to the severity of driver distraction. In addition, the results also show that while a driver is in a dangerous distraction state, the control can be shifted from driver to vehicle in a systematic manner

    Challenges in machine learning for predicting psychological attributes from smartphone data

    Get PDF
    Predicting psychological attributes using psychometric approaches is a complex task that involves estimating latent constructs that cannot be directly measured. Psychometrics focuses on the measurement and assessment of psychological attributes, such as personality traits, behavioral patterns, or psychological disorders. Traditionally, personality assessment relied on self-report questionnaires, but advancements in technology have opened up new possibilities for assessment, particularly through the analysis of digital footprints. Smartphone sensor data has become particularly valuable in this context. By analyzing data related to movement, conversation patterns, activities, and interests, it is possible to gather insights that can contribute to predicting psychological attributes. Machine learning techniques are commonly employed to develop predictive models in this field. However, it is essential to ensure that the predictions are meaningful, accepted, and interpretable to gain trust from users. Interpreting machine learning models is crucial in the context of psychometric prediction. Interpreting the models helps identify biases, understand their operations, and determine the variables they rely on. This process enhances the accuracy of the models, establishes trust in their predictions, and promotes fairness in the prediction process. Given the large datasets involved in using smartphone sensor data, the issue of multicollinearity arises, making it challenging to identify which features are truly essential for predicting psychological attributes. To address this challenge, this thesis focuses on grouping similar features and quantifying their importance, aiming to reduce data complexity and highlight the most relevant factors. Additionally, visualizing the impact of these feature groups can provide a deeper understanding in the behavior of the predictive models.Psychometrie bezieht sich auf die Messung psychologischer Merkmale wie Persönlichkeitsmerkmale, Verhaltensmuster oder psychischer Störungen. Üblicherweise werden hierfĂŒr Selbstauskunftsfragebögen verwendet, da psychologische Merkmale oft nicht direkt messbar sind. Dank technologischer Fortschritte eröffnen sich jedoch moderne Möglichkeiten, psychologische Merkmale vorherzusagen, insbesondere durch die Analyse digitaler Fußspuren. Besonders relevant sind in diesem Zusammenhang Smartphone-Sensordaten. Durch die Auswertung von Daten zu Bewegungsmustern, GesprĂ€chsverhalten, AktivitĂ€ten und Interessen können Erkenntnisse gewonnen werden, die zur Vorhersage psychologischer Merkmale beitragen können. Hierbei kommen hĂ€ufig maschinelle Lernverfahren zum Einsatz. Dabei ist es wichtig sicherzustellen, dass die Vorhersagen sinnvoll, akzeptiert und interpretierbar sind. Die Interpretation maschineller Lernverfahren spielt bei der Vorhersage psychologischer Merkmale eine entscheidende Rolle. Sie hilft dabei, die Funktionsweise der Modelle zu verstehen und wichtige Variablen zu identifizieren. Bei der Verwendung von Smartphone-Daten entstehen große DatensĂ€tze, was das Problem der MultikollinearitĂ€t mit sich bringt. Dies erschwert die Bestimmung, welche Merkmale tatsĂ€chlich relevant sind, um psychologische Merkmale vorherzusagen. Um dieser Herausforderung zu begegnen, konzentriert sich diese Arbeit darauf, Ă€hnliche Merkmale zu gruppieren und ihre Bedeutung zu quantifizieren. Dadurch kann die KomplexitĂ€t der Daten reduziert und die relevantesten Faktoren hervorgehoben werden. DarĂŒber hinaus kann die Visualisierung der Effekte dieser Merkmalsgruppen ein besseres VerstĂ€ndnis fĂŒr das Verhalten der Vorhersagemodelle liefern

    CLINICAL AND SOCIAL PATHWAYS TO CARE: A COMPUTATIONAL EXAMINATION OF SOCIAL MEDIA FOR MENTAL HEALTH CARE

    Get PDF
    In the last decade, powered by connectivity to large social networks and advances in collecting and analyzing digital traces of individuals from social media platforms, researchers have gleaned rich insights into individuals’ and populations’ mental health states and experiences, including their moods, emotions, social interactions, language, and communication patterns. Using these inferences, researchers have been able to study support-seeking behaviors, distinguishing patterns, risk markers, and diagnosis states for mental illnesses from social media data, promising a fundamental change in mental health care. What we need next in this line of work is for data and algorithms based on social media to be contextualized in people’s pathways to mental health care. However, there are several challenges and unanswered questions that present hurdles. First, gaps exist in the psychometric validity of social media based measurements of behaviors and the utility of these inferences in predicting clinical outcomes in patient populations. Second, if social media can act as an intervention platform, outside of discrete events, a holistic understanding of its role in people’s lives along the course of a mental illness is crucial. Lastly, several questions remain around the ethical implications of research practices in engaging with a vulnerable population subject to this research. This thesis charts out empirical and critical understandings and develops novel computational techniques to ethically and holistically examine how social media can be employed to support mental health care. Focusing on schizophrenia, one of the most debilitating and stigmatizing of mental illnesses, this thesis contributes a deeper understanding on pathways to care via social media along three themes: 1) prediction of clinical mental health states from social media data to support clinical interventions, 2) understanding online self-disclosure and social support as pathways to social care, and 3) the intersection of social and clinical pathways to care along the course of mental illness. In doing so, this work combines theories from social psychology, computer-mediated communication, and clinical literature with machine learning, statistical modeling, and natural language analysis methods applied on large-scale behavioral data from social media platforms. Together, this work contributes novel methodologies and human-centered algorithmic design frameworks to understand the efficacy of social media as a mental health intervention platform, informing clinicians, researchers, and designers who engage in developing and deploying interventions for mental health and well-being.Ph.D

    Information Reliability on the Social Web - Models and Applications in Intelligent User Interfaces

    Get PDF
    The Social Web is undergoing continued evolution, changing the paradigm of information production, processing and sharing. Information sources have shifted from institutions to individual users, vastly increasing the amount of information available online. To overcome the information overload problem, modern filtering algorithms have enabled people to find relevant information in efficient ways. However, noisy, false and otherwise useless information remains a problem. We believe that the concept of information reliability needs to be considered along with information relevance to adapt filtering algorithms to today's Social Web. This approach helps to improve information search and discovery and can also improve user experience by communicating aspects of information reliability.This thesis first shows the results of a cross-disciplinary study into perceived reliability by reporting on a novel user experiment. This is followed by a discussion of modeling, validating, and communicating information reliability, including its various definitions across disciplines. A selection of important reliability attributes such as source credibility, competence, influence and timeliness are examined through different case studies. Results show that perceived reliability of information can vary greatly across contexts. Finally, recent studies on visual analytics, including algorithm explanations and interactive interfaces are discussed with respect to their impact on the perception of information reliability in a range of application domains
    corecore