229 research outputs found

    Text Mining Methods for Analyzing Online Health Information and Communication

    Get PDF
    The Internet provides an alternative way to share health information. Specifically, social network systems such as Twitter, Facebook, Reddit, and disease specific online support forums are increasingly being used to share information on health related topics. This could be in the form of personal health information disclosure to seek suggestions or answering other patients\u27 questions based on their history. This social media uptake gives a new angle to improve the current health communication landscape with consumer generated content from social platforms. With these online modes of communication, health providers can offer more immediate support to the people seeking advice. Non-profit organizations and federal agencies can also diffuse preventative information in such networks for better outcomes. Researchers in health communication can mine user generated content on social networks to understand themes and derive insights into patient experiences that may be impractical to glean through traditional surveys. The main difficulty in mining social health data is in separating the signal from the noise. Social data is characterized by informal nature of content, typos, emoticons, tonal variations (e.g. sarcasm), and ambiguities arising from polysemous words, all of which make it difficult in building automated systems for deriving insights from such sources. In this dissertation, we present four efforts to mine health related insights from user generated social data. In the first effort, we build a model to identify marketing tweets on electronic cigarettes (e-cigs) and assess different topics in marketing and non-marketing messages on e-cigs on Twitter. In our next effort, we build ensemble models to classify messages on a mental health forum for triaging posts whose authors need immediate attention from trained moderators to prevent self-harm. The third effort deals with models from our participation in a shared task on identifying tweets that discuss adverse drug reactions and those that mention medication intake. In the final task, we build a classifier that identifies whether a particular tweet about the popular Juul e-cig indicates the tweeter actually using the product. Our methods range from linear classifiers (e.g., logistic regression), classical nonlinear models (e.g., nearest neighbors), recent deep neural networks (e.g., convolutional neural networks), and ensembles of all these models in using different supervised training regimens (e.g., co-training). The focus is more on task specific system building than on building specific individual models. Overall, we demonstrate that it is possible to glean insights from social data on health related topics through natural language processing and machine learning with use-cases from substance use and mental health

    Robust Anomaly Detection with Applications to Acoustics and Graphs

    Get PDF
    Our goal is to develop a robust anomaly detector that can be incorporated into pattern recognition systems that may need to learn, but will never be shunned for making egregious errors. The ability to know what we do not know is a concept often overlooked when developing classifiers to discriminate between different types of normal data in controlled experiments. We believe that an anomaly detector should be used to produce warnings in real applications when operating conditions change dramatically, especially when other classifiers only have a fixed set of bad candidates from which to choose. Our approach to distributional anomaly detection is to gather local information using features tailored to the domain, aggregate all such evidence to form a global density estimate, and then compare it to a model of normal data. A good match to a recognizable distribution is not required. By design, this process can detect the "unknown unknowns" [1] and properly react to the "black swan events" [2] that can have devastating effects on other systems. We demonstrate that our system is robust to anomalies that may not be well-defined or well-understood even if they have contaminated the training data that is assumed to be non-anomalous. In order to develop a more robust speech activity detector, we reformulate the problem to include acoustic anomaly detection and demonstrate state-of-the-art performance using simple distribution modeling techniques that can be used at incredibly high speed. We begin by demonstrating our approach when training on purely normal conversational speech and then remove all annotation from our training data and demonstrate that our techniques can robustly accommodate anomalous training data contamination. When comparing continuous distributions in higher dimensions, we develop a novel method of discarding portions of a semi-parametric model to form a robust estimate of the Kullback-Leibler divergence. Finally, we demonstrate the generality of our approach by using the divergence between distributions of vertex invariants as a graph distance metric and achieve state-of-the-art performance when detecting graph anomalies with neighborhoods of excessive or negligible connectivity. [1] D. Rumsfeld. (2002) Transcript: DoD news briefing - Secretary Rumsfeld and Gen. Myers. [2] N. N. Taleb, The Black Swan: The Impact of the Highly Improbable. Random House, 2007

    Computational Sarcasm Analysis on Social Media: A Systematic Review

    Full text link
    Sarcasm can be defined as saying or writing the opposite of what one truly wants to express, usually to insult, irritate, or amuse someone. Because of the obscure nature of sarcasm in textual data, detecting it is difficult and of great interest to the sentiment analysis research community. Though the research in sarcasm detection spans more than a decade, some significant advancements have been made recently, including employing unsupervised pre-trained transformers in multimodal environments and integrating context to identify sarcasm. In this study, we aim to provide a brief overview of recent advancements and trends in computational sarcasm research for the English language. We describe relevant datasets, methodologies, trends, issues, challenges, and tasks relating to sarcasm that are beyond detection. Our study provides well-summarized tables of sarcasm datasets, sarcastic features and their extraction methods, and performance analysis of various approaches which can help researchers in related domains understand current state-of-the-art practices in sarcasm detection.Comment: 50 pages, 3 tables, Submitted to 'Data Mining and Knowledge Discovery' for possible publicatio

    Domain adaptation with minimal training

    Get PDF
    The performance of a machine learning model trained on labeled data of a (source) domain degrades severely when they are tested on a different (target) domain. Traditional approaches deal with this problem by training a new model for every target domain. In natural language processing, top performing systems often use multiple interconnected models; therefore training all of them for every target domain is computationally expensive. Moreover, retraining the model for the target domain requires access to the labeled data from the source domain which may not be available to end users due to copyright issues. This thesis is a study on how to adapt to a target domain, using the system trained on source domain and avoiding the cost of retraining and the need for access to the source labeled data. This thesis identifies two key ingredients for adaptation without training: broad coverage resources and constraints. We show how resources like Wikipedia, VerbNet and WordNet that contain comprehensive coverage of entities, semantic roles and words in English can help a model adapt to the target domain. For the task of semantic role labeling, we show that in the decision phase, we can replace a linguistic unit (e.g. verb, word) with another equivalent linguistic unit residing in the same cluster defined in these resources (e.g. VerbNet, WordNet) such that after replacement, text becomes more like text on which the model was trained. We show that the model's output is more accurate on the transformed text than on original text. In another instance, we show how to use a system for linking mentions to Wikipedia concepts for adaptation of a named entity recognition system. Since Wikipedia has a broad domain coverage, the linking system is robust across domain variations. Therefore, jointly performing entity recognition and linking improves the accuracy of entity recognition on the target domain without requiring training of a new system for the new domain. In all cases, we show how to use intuitive constraints to guide the model into making coherent predictions. We show how incorporating prior knowledge about a new domain as declarative constraints into the decision phase can improve performance of a model on the new domain. When such prior knowledge is unavailable, we show how to acquire knowledge automatically from unlabeled text from the new domain and domains similar to both source and target domains

    Framework for Human Computer Interaction for Learning Dialogue Strategies using Controlled Natural Language in Information Systems

    Get PDF
    Spoken Language systems are going to have a tremendous impact in all the real world applications, be it healthcare enquiry, public transportation system or airline booking system maintaining the language ethnicity for interaction among users across the globe. These system have the capability of interacting with the user in di erent languages that the system supports. Normally when a person interacts with another person there are many non-verbal clues which guide the dialogue and all the utterances have a contextual relationship, which manage the dialogue as its mixed by the two speakers. Human Computer Interaction has a wide impact on the design of the applications and has become one of the emerging interest area of the researchers. All of us are witness to an explosive electronic revolution where lots of gadgets and gizmo's have surrounded us, advanced not only in power, design, applications but the ease of access or what we call user friendly interfaces are designed that we can easily use and control all the functionality of the devices. Since speech is one of the most intuitive form of interaction that humans use. It provides potential bene ts such as handfree access to machines, ergonomics and greater e ciency of interaction. Yet, speech-based interfaces design has been an expert job for a long time. Lot of research has been done in building real spoken Dialogue Systems which can interact with humans using voice interactions and help in performing various tasks as are done by humans. Last two decades have seen utmost advanced research in the automatic speech recognition, dialogue management, text to speech synthesis and Natural Language Processing for various applications which have shown positive results. This dissertation proposes to apply machine learning (ML) techniques to the problem of optimizing the dialogue management strategy selection in the Spoken Dialogue system prototype design. Although automatic speech recognition and system initiated dialogues where the system expects an answer in the form of `yes' or `no' have already been applied to Spoken Dialogue Systems( SDS), no real attempt to use those techniques in order to design a new system from scratch has been made. In this dissertation, we propose some novel ideas in order to achieve the goal of easing the design of Spoken Dialogue Systems and allow novices to have access to voice technologies. A framework for simulating and evaluating dialogues and learning optimal dialogue strategies in a controlled Natural Language is proposed. The simulation process is based on a probabilistic description of a dialogue and on the stochastic modelling of both arti cial NLP modules composing a SDS and the user. This probabilistic model is based on a set of parameters that can be tuned from the prior knowledge from the discourse or learned from data. The evaluation is part of the simulation process and is based on objective measures provided by each module. Finally, the simulation environment is connected to a learning agent using the supplied evaluation metrics as an objective function in order to generate an optimal behaviour for the SDS

    Discriminative, generative, and imitative learning

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2002.Includes bibliographical references (leaves 201-212).I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specific knowledge in terms of structure and parameter priors over the joint space of variables. Bayesian networks and Bayesian statistics provide a rich and flexible language for specifying this knowledge and subsequently refining it with data and observations. The final result is a distribution that is a good generator of novel exemplars. Conversely, discriminative algorithms adjust a possibly non-distributional model to data optimizing for a specific task, such as classification or prediction. This typically leads to superior performance yet compromises the flexibility of generative modeling. I present Maximum Entropy Discrimination (MED) as a framework to combine both discriminative estimation and generative probability densities. Calculations involve distributions over parameters, margins, and priors and are provably and uniquely solvable for the exponential family. Extensions include regression, feature selection, and transduction. SVMs are also naturally subsumed and can be augmented with, for example, feature selection, to obtain substantial improvements. To extend to mixtures of exponential families, I derive a discriminative variant of the Expectation-Maximization (EM) algorithm for latent discriminative learning (or latent MED).(cont.) While EM and Jensen lower bound log-likelihood, a dual upper bound is made possible via a novel reverse-Jensen inequality. The variational upper bound on latent log-likelihood has the same form as EM bounds, is computable efficiently and is globally guaranteed. It permits powerful discriminative learning with the wide range of contemporary probabilistic mixture models (mixtures of Gaussians, mixtures of multinomials and hidden Markov models). We provide empirical results on standardized data sets that demonstrate the viability of the hybrid discriminative-generative approaches of MED and reverse-Jensen bounds over state of the art discriminative techniques or generative approaches. Subsequently, imitative learning is presented as another variation on generative modeling which also learns from exemplars from an observed data source. However, the distinction is that the generative model is an agent that is interacting in a much more complex surrounding external world. It is not efficient to model the aggregate space in a generative setting. I demonstrate that imitative learning (under appropriate conditions) can be adequately addressed as a discriminative prediction task which outperforms the usual generative approach. This discriminative-imitative learning approach is applied with a generative perceptual system to synthesize a real-time agent that learns to engage in social interactive behavior.by Tony Jebara.Ph.D

    Métodos discriminativos para la optimización de modelos en la Verificación del Hablante

    Get PDF
    La creciente necesidad de sistemas de autenticación seguros ha motivado el interés de algoritmos efectivos de Verificación de Hablante (VH). Dicha necesidad de algoritmos de alto rendimiento, capaces de obtener tasas de error bajas, ha abierto varias ramas de investigación. En este trabajo proponemos investigar, desde un punto de vista discriminativo, un conjunto de metodologías para mejorar el desempeño del estado del arte de los sistemas de VH. En un primer enfoque investigamos la optimización de los hiper-parámetros para explícitamente considerar el compromiso entre los errores de falsa aceptación y falso rechazo. El objetivo de la optimización se puede lograr maximizando el área bajo la curva conocida como ROC (Receiver Operating Characteristic) por sus siglas en inglés. Creemos que esta optimización de los parámetros no debe de estar limitada solo a un punto de operación y una estrategia más robusta es optimizar los parámetros para incrementar el área bajo la curva, AUC (Area Under the Curve por sus siglas en inglés) de modo que todos los puntos sean maximizados. Estudiaremos cómo optimizar los parámetros utilizando la representación matemática del área bajo la curva ROC basada en la estadística de Wilcoxon Mann Whitney (WMW) y el cálculo adecuado empleando el algoritmo de descendente probabilístico generalizado. Además, analizamos el efecto y mejoras en métricas como la curva detection error tradeoff (DET), el error conocido como Equal Error Rate (EER) y el valor mínimo de la función de detección de costo, minimum value of the detection cost function (minDCF) todos ellos por sue siglas en inglés. En un segundo enfoque, investigamos la señal de voz como una combinación de atributos que contienen información del hablante, del canal y el ruido. Los sistemas de verificación convencionales entrenan modelos únicos genéricos para todos los casos, y manejan las variaciones de estos atributos ya sea usando análisis de factores o no considerando esas variaciones de manera explícita. Proponemos una nueva metodología para particionar el espacio de los datos de acuerdo a estas carcterísticas y entrenar modelos por separado para cada partición. Las particiones se pueden obtener de acuerdo a cada atributo. En esta investigación mostraremos como entrenar efectivamente los modelos de manera discriminativa para maximizar la separación entre ellos. Además, el diseño de algoritimos robustos a las condiciones de ruido juegan un papel clave que permite a los sistemas de VH operar en condiciones reales. Proponemos extender nuestras metodologías para mitigar los efectos del ruido en esas condiciones. Para nuestro primer enfoque, en una situación donde el ruido se encuentre presente, el punto de operación puede no ser solo un punto, o puede existir un corrimiento de forma impredecible. Mostraremos como nuestra metodología de maximización del área bajo la curva ROC es más robusta que la usada por clasificadores convencionales incluso cuando el ruido no está explícitamente considerado. Además, podemos encontrar ruido a diferentes relación señal a ruido (SNR) que puede degradar el desempeño del sistema. Así, es factible considerar una descomposición eficiente de las señales de voz que tome en cuenta los diferentes atributos como son SNR, el ruido y el tipo de canal. Consideramos que en lugar de abordar el problema con un modelo unificado, una descomposición en particiones del espacio de características basado en atributos especiales puede proporcionar mejores resultados. Esos atributos pueden representar diferentes canales y condiciones de ruido. Hemos analizado el potencial de estas metodologías que permiten mejorar el desempeño del estado del arte de los sistemas reduciendo el error, y por otra parte controlar los puntos de operación y mitigar los efectos del ruido
    corecore