170 research outputs found

    A Recurrent Neural Network Survival Model: Predicting Web User Return Time

    Full text link
    The size of a website's active user base directly affects its value. Thus, it is important to monitor and influence a user's likelihood to return to a site. Essential to this is predicting when a user will return. Current state of the art approaches to solve this problem come in two flavors: (1) Recurrent Neural Network (RNN) based solutions and (2) survival analysis methods. We observe that both techniques are severely limited when applied to this problem. Survival models can only incorporate aggregate representations of users instead of automatically learning a representation directly from a raw time series of user actions. RNNs can automatically learn features, but can not be directly trained with examples of non-returning users who have no target value for their return time. We develop a novel RNN survival model that removes the limitations of the state of the art methods. We demonstrate that this model can successfully be applied to return time prediction on a large e-commerce dataset with a superior ability to discriminate between returning and non-returning users than either method applied in isolation.Comment: Accepted into ECML PKDD 2018; 8 figures and 1 tabl

    Statistical Relational Learning with Formal Ontologies

    Full text link

    Location Dependent Dirichlet Processes

    Full text link
    Dirichlet processes (DP) are widely applied in Bayesian nonparametric modeling. However, in their basic form they do not directly integrate dependency information among data arising from space and time. In this paper, we propose location dependent Dirichlet processes (LDDP) which incorporate nonparametric Gaussian processes in the DP modeling framework to model such dependencies. We develop the LDDP in the context of mixture modeling, and develop a mean field variational inference algorithm for this mixture model. The effectiveness of the proposed modeling framework is shown on an image segmentation task

    Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis

    Full text link
    Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.Comment: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 201

    Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identifying quantitative trait loci (QTL) for both additive and epistatic effects raises the statistical issue of selecting variables from a large number of candidates using a small number of observations. Missing trait and/or marker values prevent one from directly applying the classical model selection criteria such as Akaike's information criterion (AIC) and Bayesian information criterion (BIC).</p> <p>Results</p> <p>We propose a two-step Bayesian variable selection method which deals with the sparse parameter space and the small sample size issues. The regression coefficient priors are flexible enough to incorporate the characteristic of "large <it>p </it>small <it>n</it>" data. Specifically, sparseness and possible asymmetry of the significant coefficients are dealt with by developing a Gibbs sampling algorithm to stochastically search through low-dimensional subspaces for significant variables. The superior performance of the approach is demonstrated via simulation study. We also applied it to real QTL mapping datasets.</p> <p>Conclusion</p> <p>The two-step procedure coupled with Bayesian classification offers flexibility in modeling "large p small n" data, especially for the sparse and asymmetric parameter space. This approach can be extended to other settings characterized by high dimension and low sample size.</p

    Assessing the clinical utility of cancer genomic and proteomic data across tumor types

    Get PDF
    Molecular profiling of tumors promises to advance the clinical management of cancer, but the benefits of integrating molecular data with traditional clinical variables have not been systematically studied. Here we retrospectively predict patient survival using diverse molecular data (somatic copy-number alteration, DNA methylation and mRNA, miRNA and protein expression) from 953 samples of four cancer types from The Cancer Genome Atlas project. We found that incorporating molecular data with clinical variables yielded statistically significantly improved predictions (FDR < 0.05) for three cancers but those quantitative gains were limited (2.2–23.9%). Additional analyses revealed little predictive power across tumor types except for one case. In clinically relevant genes, we identified 10,281 somatic alterations across 12 cancer types in 2,928 of 3,277 patients (89.4%), many of which would not be revealed in single-tumor analyses. Our study provides a starting point and resources, including an open-access model evaluation platform, for building reliable prognostic and therapeutic strategies that incorporate molecular data

    Satisfaction with web-based training in an integrated healthcare delivery network: do age, education, computer skills and attitudes matter?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Healthcare institutions spend enormous time and effort to train their workforce. Web-based training can potentially streamline this process. However the deployment of web-based training in a large-scale setting with a diverse healthcare workforce has not been evaluated. The aim of this study was to evaluate the satisfaction of healthcare professionals with web-based training and to determine the predictors of such satisfaction including age, education status and computer proficiency.</p> <p>Methods</p> <p>Observational, cross-sectional survey of healthcare professionals from six hospital systems in an integrated delivery network. We measured overall satisfaction to web-based training and response to survey items measuring Website Usability, Course Usefulness, Instructional Design Effectiveness, Computer Proficiency and Self-learning Attitude.</p> <p>Results</p> <p>A total of 17,891 healthcare professionals completed the web-based training on HIPAA Privacy Rule; and of these, 13,537 completed the survey (response rate 75.6%). Overall course satisfaction was good (median, 4; scale, 1 to 5) with more than 75% of the respondents satisfied with the training (rating 4 or 5) and 65% preferring web-based training over traditional instructor-led training (rating 4 or 5). Multivariable ordinal regression revealed 3 key predictors of satisfaction with web-based training: Instructional Design Effectiveness, Website Usability and Course Usefulness. Demographic predictors such as gender, age and education did not have an effect on satisfaction.</p> <p>Conclusion</p> <p>The study shows that web-based training when tailored to learners' background, is perceived as a satisfactory mode of learning by an interdisciplinary group of healthcare professionals, irrespective of age, education level or prior computer experience. Future studies should aim to measure the long-term outcomes of web-based training.</p
    • …
    corecore