1,271 research outputs found

    Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis

    Full text link
    Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.Comment: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 201

    Modeling of the Temporal Patterns of Fluoxetine Prescriptions and Suicide Rates in the United States

    Get PDF
    BACKGROUND: To study the potential association of antidepressant use and suicide at a population level, we analyzed the associations between suicide rates and dispensing of the prototypic SSRI antidepressant fluoxetine in the United States during the period 1960–2002. METHODS AND FINDINGS: Sources of data included Centers of Disease Control and US Census Bureau age-adjusted suicide rates since 1960 and numbers of fluoxetine sales in the US, since its introduction in 1988. We conducted statistical analysis of age-adjusted population data and prescription numbers. Suicide rates fluctuated between 12.2 and 13.7 per 100,000 for the entire population from the early 1960s until 1988. Since then, suicide rates have gradually declined, with the lowest value of 10.4 per 100,000 in 2000. This steady decline is significantly associated with increased numbers of fluoxetine prescriptions dispensed from 2,469,000 in 1988 to 33,320,000 in 2002 (r(s) = −0.92; p < 0.001). Mathematical modeling of what suicide rates would have been during the 1988–2002 period based on pre-1988 data indicates that since the introduction of fluoxetine in 1988 through 2002 there has been a cumulative decrease in expected suicide mortality of 33,600 individuals (posterior median, 95% Bayesian credible interval 22,400–45,000). CONCLUSIONS: The introduction of SSRIs in 1988 has been temporally associated with a substantial reduction in the number of suicides. This effect may have been more apparent in the female population, whom we postulate might have particularly benefited from SSRI treatment. While these types of data cannot lead to conclusions on causality, we suggest here that in the context of untreated depression being the major cause of suicide, antidepressant treatment could have had a contributory role in the reduction of suicide rates in the period 1988–2002

    Learning vector representation of medical objects via EMR-driven nonnegative restricted Boltzmann machines (eNRBM)

    Get PDF
    Electronic medical record (EMR) offers promises for novel analytics. However, manual feature engineering from EMR is labor intensive because EMR is complex – it contains temporal, mixed-type and multimodal data packed in irregular episodes. We present a computational framework to harness EMR with minimal human supervision via restricted Boltzmann machine (RBM). The framework derives a new representation of medical objects by embedding them in a low-dimensional vector space. This new representation facilitates algebraic and statistical manipulations such as projection onto 2D plane (thereby offering intuitive visualization), object grouping (hence enabling automated phenotyping), and risk stratification. To enhance model interpretability, we introduced two constraints into model parameters: (a) nonnegative coefficients, and (b) structural smoothness. These result in a novel model called eNRBM (EMR-driven nonnegative RBM). We demonstrate the capability of the eNRBM on a cohort of 7578 mental health patients under suicide risk assessment. The derived representation not only shows clinically meaningful feature grouping but also facilitates short-term risk stratification. The F-scores, 0.21 for moderate-risk and 0.36 for high-risk, are significantly higher than those obtained by clinicians and competitive with the results obtained by support vector machines

    An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests

    Get PDF
    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, that can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine and bioinformatics within the past few years. High dimensional problems are common not only in genetics, but also in some areas of psychological research, where only few subjects can be measured due to time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications, and provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated using freely available implementations in the R system for statistical computing

    Suicide Risk Modeling with Uncertain Diagnostic Records

    Full text link
    Motivated by the pressing need for suicide prevention through improving behavioral healthcare, we use medical claims data to study the risk of subsequent suicide attempts for patients who were hospitalized due to suicide attempts and later discharged. Understanding the risk behaviors of such patients at elevated suicide risk is an important step towards the goal of "Zero Suicide". An immediate and unconventional challenge is that the identification of suicide attempts from medical claims contains substantial uncertainty: almost 20\% of "suspected" suicide attempts are identified from diagnostic codes indicating external causes of injury and poisoning with undermined intent. It is thus of great interest to learn which of these undetermined events are more likely actual suicide attempts and how to properly utilize them in survival analysis with severe censoring. To tackle these interrelated problems, we develop an integrative Cox cure model with regularization to perform survival regression with uncertain events and a latent cure fraction. We apply the proposed approach to study the risk of subsequent suicide attempt after suicide-related hospitalization for adolescent and young adult population, using medical claims data from Connecticut. The identified risk factors are highly interpretable; more intriguingly, our method distinguishes the risk factors that are most helpful in assessing either susceptibility or timing of subsequent attempt. The predicted statuses of the uncertain attempts are further investigated, leading to several new insights on suicide event identification
    corecore