347 research outputs found

    Learning Better Clinical Risk Models.

    Full text link
    Risk models are used to estimate a patient’s risk of suffering particular outcomes throughout clinical practice. These models are important for matching patients to the appropriate level of treatment, for effective allocation of resources, and for fairly evaluating the performance of healthcare providers. The application and development of methods from the field of machine learning has the potential to improve patient outcomes and reduce healthcare spending with more accurate estimates of patient risk. This dissertation addresses several limitations of currently used clinical risk models, through the identification of novel risk factors and through the training of more effective models. As wearable monitors become more effective and less costly, the previously untapped predictive information in a patient’s physiology over time has the potential to greatly improve clinical practice. However translating these technological advances into real-world clinical impacts will require computational methods to identify high-risk structure in the data. This dissertation presents several approaches to learning risk factors from physiological recordings, through the discovery of latent states using topic models, and through the identification of predictive features using convolutional neural networks. We evaluate these approaches on patients from a large clinical trial and find that these methods not only outperform prior approaches to leveraging heart rate for cardiac risk stratification, but that they improve overall prediction of cardiac death when considered alongside standard clinical risk factors. We also demonstrate the utility of this work for learning a richer description of sleep recordings. Additionally, we consider the development of risk models in the presence of missing data, which is ubiquitous in real-world medical settings. We present a novel method for jointly learning risk and imputation models in the presence of missing data, and find significant improvements relative to standard approaches when evaluated on a large national registry of trauma patients.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113326/1/alexve_1.pd

    Adaptive sentiment analysis

    Get PDF
    Domain dependency is one of the most challenging problems in the field of sentiment analysis. Although most sentiment analysis methods have decent performance if they are targeted at a specific domain and writing style, they do not usually work well with texts that are originated outside of their domain boundaries. Often there is a need to perform sentiment analysis in a domain where no labelled document is available. To address this scenario, researchers have proposed many domain adaptation or unsupervised sentiment analysis methods. However, there is still much room for improvement, as those methods typically cannot match conventional supervised sentiment analysis methods. In this thesis, we propose a novel aspect-level sentiment analysis method that seamlessly integrates lexicon- and learning-based methods. While its performance is comparable to existing approaches, it is less sensitive to domain boundaries and can be applied to cross-domain sentiment analysis when the target domain is similar to the source domain. It also offers more structured and readable results by detecting individual topic aspects and determining their sentiment strengths. Furthermore, we investigate a novel approach to automatically constructing domain-specific sentiment lexicons based on distributed word representations (aka word embeddings). The induced lexicon has quality on a par with a handcrafted one and could be used directly in a lexiconbased algorithm for sentiment analysis, but we find that a two-stage bootstrapping strategy could further boost the sentiment classification performance. Compared to existing methods, such an end-to-end nearly-unsupervised approach to domain-specific sentiment analysis works out of the box for any target domain, requires no handcrafted lexicon or labelled corpus, and achieves sentiment classification accuracy comparable to that of fully supervised approaches. Overall, the contribution of this Ph.D. work to the research field of sentiment analysis is twofold. First, we develop a new sentiment analysis system which can — in a nearlyunsupervised manner—adapt to the domain at hand and perform sentiment analysis with minimal loss of performance. Second, we showcase this system in several areas (including finance, politics, and e-business), and investigate particularly the temporal dynamics of sentiment in such contexts

    Multilingual sentiment analysis in social media.

    Get PDF
    252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations

    Multilingual sentiment analysis in social media.

    Get PDF
    252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations

    Music emotion recognition: a multimodal machine learning approach

    Get PDF
    Music emotion recognition (MER) is an emerging domain of the Music Information Retrieval (MIR) scientific community, and besides, music searches through emotions are one of the major selection preferred by web users. As the world goes to digital, the musical contents in online databases, such as Last.fm have expanded exponentially, which require substantial manual efforts for managing them and also keeping them updated. Therefore, the demand for innovative and adaptable search mechanisms, which can be personalized according to users’ emotional state, has gained increasing consideration in recent years. This thesis concentrates on addressing music emotion recognition problem by presenting several classification models, which were fed by textual features, as well as audio attributes extracted from the music. In this study, we build both supervised and semisupervised classification designs under four research experiments, that addresses the emotional role of audio features, such as tempo, acousticness, and energy, and also the impact of textual features extracted by two different approaches, which are TF-IDF and Word2Vec. Furthermore, we proposed a multi-modal approach by using a combined feature-set consisting of the features from the audio content, as well as from context-aware data. For this purpose, we generated a ground truth dataset containing over 1500 labeled song lyrics and also unlabeled big data, which stands for more than 2.5 million Turkish documents, for achieving to generate an accurate automatic emotion classification system. The analytical models were conducted by adopting several algorithms on the crossvalidated data by using Python. As a conclusion of the experiments, the best-attained performance was 44.2% when employing only audio features, whereas, with the usage of textual features, better performances were observed with 46.3% and 51.3% accuracy scores considering supervised and semi-supervised learning paradigms, respectively. As of last, even though we created a comprehensive feature set with the combination of audio and textual features, this approach did not display any significant improvement for classification performanc

    Genre and Domain Dependencies in Sentiment Analysis

    Get PDF
    Genre and domain influence an author\''s style of writing and therefore a text\''s characteristics. Natural language processing is prone to such variations in textual characteristics: it is said to be genre and domain dependent. This thesis investigates genre and domain dependencies in sentiment analysis. Its goal is to support the development of robust sentiment analysis approaches that work well and in a predictable manner under different conditions, i.e. for different genres and domains. Initially, we show that a prototypical approach to sentiment analysis -- viz. a supervised machine learning model based on word n-gram features -- performs differently on gold standards that originate from differing genres and domains, but performs similarly on gold standards that originate from resembling genres and domains. We show that these gold standards differ in certain textual characteristics, viz. their domain complexity. We find a strong linear relation between our approach\''s accuracy on a particular gold standard and its domain complexity, which we then use to estimate our approach\''s accuracy. Subsequently, we use certain textual characteristics -- viz. domain complexity, domain similarity, and readability -- in a variety of applications. Domain complexity and domain similarity measures are used to determine parameter settings in two tasks. Domain complexity guides us in model selection for in-domain polarity classification, viz. in decisions regarding word n-gram model order and word n-gram feature selection. Domain complexity and domain similarity guide us in domain adaptation. We propose a novel domain adaptation scheme and apply it to cross-domain polarity classification in semi- and unsupervised domain adaptation scenarios. Readability is used for feature engineering. We propose to adopt readability gradings, readability indicators as well as word and syntax distributions as features for subjectivity classification. Moreover, we generalize a framework for modeling and representing negation in machine learning-based sentiment analysis. This framework is applied to in-domain and cross-domain polarity classification. We investigate the relation between implicit and explicit negation modeling, the influence of negation scope detection methods, and the efficiency of the framework in different domains. Finally, we carry out a case study in which we transfer the core methods of our thesis -- viz. domain complexity-based accuracy estimation, domain complexity-based model selection, and negation modeling -- to a gold standard that originates from a genre and domain hitherto not used in this thesis

    Modelling Uncertainty in Black-box Classification Systems

    Get PDF
    [eng] Currently, thanks to the Big Data boom, the excellent results obtained by deep learning models and the strong digital transformation experienced over the last years, many companies have decided to incorporate machine learning models into their systems. Some companies have detected this opportunity and are making a portfolio of artificial intelligence services available to third parties in the form of application programming interfaces (APIs). Subsequently, developers include calls to these APIs to incorporate AI functionalities in their products. Although it is an option that saves time and resources, it is true that, in most cases, these APIs are displayed in the form of blackboxes, the details of which are unknown to their clients. The complexity of such products typically leads to a lack of control and knowledge of the internal components, which, in turn, can drive to potential uncontrolled risks. Therefore, it is necessary to develop methods capable of evaluating the performance of these black-boxes when applied to a specific application. In this work, we present a robust uncertainty-based method for evaluating the performance of both probabilistic and categorical classification black-box models, in particular APIs, that enriches the predictions obtained with an uncertainty score. This uncertainty score enables the detection of inputs with very confident but erroneous predictions while protecting against out of distribution data points when deploying the model in a productive setting. In the first part of the thesis, we develop a thorough revision of the concept of uncertainty, focusing on the uncertainty of classification systems. We review the existingrelated literature, describing the different approaches for modelling this uncertainty, its application to different use cases and some of its desirable properties. Next, we introduce the proposed method for modelling uncertainty in black-box settings. Moreover, in the last chapters of the thesis, we showcase the method applied to different domains, including NLP and computer vision problems. Finally, we include two reallife applications of the method: classification of overqualification in job descriptions and readability assessment of texts.[spa] La tesis propone un método para el cálculo de la incertidumbre asociada a las predicciones de APIs o librerías externas de sistemas de clasificación
    • …
    corecore