3 research outputs found

    Statistical learning methods for mining marketing and biological data

    Get PDF
    Nowadays, the value of data has been broadly recognized and emphasized. More and more decisions are made based on data and analysis rather than solely on experience and intuition. With the fast development of networking, data storage, and data collection capacity, data have increased dramatically in industry, science and engineering domains, which brings both great opportunities and challenges. To take advantage of the data flood, new computational methods are in demand to process, analyze and understand these datasets. This dissertation focuses on the development of statistical learning methods for online advertising and bioinformatics to model real world data with temporal or spatial changes. First, a collaborated online change-point detection method is proposed to identify the change-points in sparse time series. It leverages the signals from the auxiliary time series such as engagement metrics to compensate the sparse revenue data and improve detection efficiency and accuracy through smart collaboration. Second, a task-specific multi-task learning algorithm is developed to model the ever-changing video viewing behaviors. With the 1-regularized task-specific features and jointly estimated shared features, it allows different models to seek common ground while reserving differences. Third, an empirical Bayes method is proposed to identify 3\u27 and 5\u27 alternative splicing in RNA-seq data. It formulates alternative 3\u27 and 5\u27 splicing site selection as a change-point problem and provides for the first time a systematic framework to pool information across genes and integrate various information when available, in particular the useful junction read information, in order to obtain better performance

    Untranslated Parts of Genes Interpreted: Making Heads or Tails of High-Throughput Transcriptomic Data via Computational Methods Computational methods to discover and quantify isoforms with alternative untranslated regions

    Get PDF
    In this review we highlight the importance of defining the untranslated parts of transcripts, and present a number of computational approaches for the discovery and quantification of alternative transcription start and poly‐adenylation events in high‐throughput transcriptomic data. The fate of eukaryotic transcripts is closely linked to their untranslated regions, which are determined by the position at which transcription starts and ends at a genomic locus. Although the extent of alternative transcription starts and alternative poly‐adenylation sites has been revealed by sequencing methods focused on the ends of transcripts, the application of these methods is not yet widely adopted by the community. We suggest that computational methods applied to standard high‐throughput technologies are a useful, albeit less accurate, alternative to the expertise‐demanding 5′ and 3′ sequencing and they are the only option for analysing legacy transcriptomic data. We review these methods here, focusing on technical challenges and arguing for the need to include better normalization of the data and more appropriate statistical models of the expected variation in the signal

    Untranslated parts of genes interpreted: making heads or tails of high-throughput transcriptomic data via computational methods

    Get PDF
    The fate of eukaryotic transcripts is closely linked to their untranslated regions, which are determined by where transcription starts and ends on a genomic locus. The extent of alternative transcription start and alternative poly-adenylation has been revealed by sequencing methods focused on the ends of transcripts, but the application of these methods is not yet widely adopted by the community. In this review we highlight the importance of defining the untranslated parts of transcripts and suggest that computational methods applied to standard high-throughput technologies are a useful alternative to the expertise-demanding 5’ and 3’ sequencing. We present a number of computational approaches for the discovery and quantification of alternative transcription start and poly-adenylation events, focusing on technical challenges and arguing for the need to include better normalization of the data and more appropriate statistical models of the expected variation in the signal
    corecore