Search CORE

3 research outputs found

Statistical learning methods for mining marketing and biological data

Author: Zhang Jie
Publication venue: Digital Commons @ NJIT
Publication date: 01/04/2017
Field of study

Nowadays, the value of data has been broadly recognized and emphasized. More and more decisions are made based on data and analysis rather than solely on experience and intuition. With the fast development of networking, data storage, and data collection capacity, data have increased dramatically in industry, science and engineering domains, which brings both great opportunities and challenges. To take advantage of the data flood, new computational methods are in demand to process, analyze and understand these datasets. This dissertation focuses on the development of statistical learning methods for online advertising and bioinformatics to model real world data with temporal or spatial changes. First, a collaborated online change-point detection method is proposed to identify the change-points in sparse time series. It leverages the signals from the auxiliary time series such as engagement metrics to compensate the sparse revenue data and improve detection efficiency and accuracy through smart collaboration. Second, a task-specific multi-task learning algorithm is developed to model the ever-changing video viewing behaviors. With the 1-regularized task-specific features and jointly estimated shared features, it allows different models to seek common ground while reserving differences. Third, an empirical Bayes method is proposed to identify 3\u27 and 5\u27 alternative splicing in RNA-seq data. It formulates alternative 3\u27 and 5\u27 splicing site selection as a change-point problem and provides for the first time a systematic framework to pool information across genes and integrate various information when available, in particular the useful junction read information, in order to obtain better performance

Digital Commons @ New Jersey Institute of Technology (NJIT)

Untranslated Parts of Genes Interpreted: Making Heads or Tails of High-Throughput Transcriptomic Data via Computational Methods Computational methods to discover and quantify isoforms with alternative untranslated regions

Author: Nobeli I
Szkop KJ
Publication venue: 'Royal College of Obstetricians & Gynaecologists (RCOG)'
Publication date: 01/12/2017
Field of study

In this review we highlight the importance of defining the untranslated parts of transcripts, and present a number of computational approaches for the discovery and quantification of alternative transcription start and poly‐adenylation events in high‐throughput transcriptomic data. The fate of eukaryotic transcripts is closely linked to their untranslated regions, which are determined by the position at which transcription starts and ends at a genomic locus. Although the extent of alternative transcription starts and alternative poly‐adenylation sites has been revealed by sequencing methods focused on the ends of transcripts, the application of these methods is not yet widely adopted by the community. We suggest that computational methods applied to standard high‐throughput technologies are a useful, albeit less accurate, alternative to the expertise‐demanding 5′ and 3′ sequencing and they are the only option for analysing legacy transcriptomic data. We review these methods here, focusing on technical challenges and arguing for the need to include better normalization of the data and more appropriate statistical models of the expected variation in the signal

UCL Discovery

Untranslated parts of genes interpreted: making heads or tails of high-throughput transcriptomic data via computational methods

Author: Ahmed
Akhtar
Akman
Angelini
Archer
Barrett
Bayerlová
Berkovits
Bicknell
Birol
Bolisetty
Bonfert
Brett
Brockman
Cambon
Cheng
Cui
Curinha
Dassi
David
Deamer
Derti
Dieudonné
Down
Erdman
Erdman
Fu
Garalde
Graber
Granovskaia
Grassi
Gruber
Haas
Harrison
Hashimoto
Havukkala
Hayer
Hoenen
Hollerer
Hoque
Hsin-Sung Yeh
Huber
Jain
Jan
Ji
Johannsen
Kalkatawi
Kanamori-Katayama
Kanitz
Katz
Kim
Laver
Le Pera
Lee
Lee
Legendre
Li
Li
Lianoglou
Loman
Love
Lu
MacDonald
Martin
Mayr
Mercer
Mignone
Mironov
Miura
Modrek
Mortazavi
Müller
Nagalakshmi
Nellore
Ohler
Ozsolak
Park
Pelechano
Pickrell
Plessy
Quick
Rasmussen
Roberts
Robinson
Rojas-Duran
Rot
Routh
Ruan
Salamov
Salisbury
Sandberg
Sharon
Shenker
Shepard
Shiraki
Sigurgeirsson
Smibert
Steijger
Suzuki
Tabaska
Tan
Tian
Trapnell
Tzanis
Valen
Velculescu
Wang
Wang
Wang
Wilkening
Winter
Wu
Wu
Xia
Xie
You
Zawada
Zhang
Zhang
Publication venue: 'Wiley'
Publication date: 20/10/2017
Field of study

The fate of eukaryotic transcripts is closely linked to their untranslated regions, which are determined by where transcription starts and ends on a genomic locus. The extent of alternative transcription start and alternative poly-adenylation has been revealed by sequencing methods focused on the ends of transcripts, but the application of these methods is not yet widely adopted by the community. In this review we highlight the importance of defining the untranslated parts of transcripts and suggest that computational methods applied to standard high-throughput technologies are a useful alternative to the expertise-demanding 5’ and 3’ sequencing. We present a number of computational approaches for the discovery and quantification of alternative transcription start and poly-adenylation events, focusing on technical challenges and arguing for the need to include better normalization of the data and more appropriate statistical models of the expected variation in the signal

Crossref

Birkbeck Institutional Research Online