13 research outputs found
Looking under the hood of RNA-Seq algorithms: Cufflinks
<p>Brief introduction to cufflinks model for gene expression estimation.</p>
<p> </p>
<p>Doubts, Issues and problmes</p
Method for comparing microarray data in investigating cell type specific genes
<p>An Iterative LiMMa approach for noise removal allowing for comparative analysis of gene exrepssion data from multiple sources.<br><br>Presented approach significantly decreases the lab influence for the gene expression data.</p>
<p> </p
Looking under the hood of RNA-Seq algorithms: Cufflinks
<p>Brief introduction to cufflinks model for gene expression estimation.</p>
<p> </p>
<p>Doubts, Issues and problmes</p
On Brownian Distance Covariance
<p>A brief introduction slides for the Brownian Distance Covariance</p
A Comparison of Euclidean metrics in spike train space
<p>Spike trains are observables when investigating neural activity - represent the response of a neuron to stimuli and are often modeled as realizations of stochastic point processes. The spike train space is non-euclidean, recently, however, two L 2<br>- like distances were introduced on that space:<br>the Elastic distance and Generalized Victor-Purpura (GVP) distance.</p>
<p><br>On this poster we briefly review these two distances and run several comparisons, including construction of the summary statistics, corresponding in ideas to mean and variance as well as classification capabilities. To allow comparisons between<br>metrics we propose an efficient algorithm for GVP summary statistics.</p
SRVF mean consistency
The slides outline the proof of consistency and robustness of SRVF mean estimates in L2 and SRVF spaces.<br
Stochastic point process models for Next Generation Sequencing
<p>The Next Generation Sequencing (NGS) revolutionized the quality and quantity of the genetic data delivered. To extract all the benefits of the new technique there is an urge of precise inference rules built from a strong theoretical basis. In the presentation I will provide a novel, extended way of looking at NGS data. The NGS experiment can<br>be interpreted as a process of mapping short fragments of sequences (short reads) to a genome region of interest (exon , gene, gene family or even whole chromosome) and the activity of a region, is derived from the number of successful mappings. The increased reliability and the design of the NGS experiments allows for a more sophisticated<br>mathematical framework which uses not only the intensity of expression but also the position of particular reads aligned to the genomic region. To account for both aspects, in my presentation I introduce the Poisson point process framework for the NGS experiments. In this approach the reference genome coordinate information of the mapped reads implies that the differences in activity can arise also in changes of read positioning. Using the<br>inference tools for stochastic point processes combined with functional data analysis I provide a method to quantify the activity differences in terms of both - the intensity and positioning - through the phase-amplitude separation. As a consequence I revisit the problem of the variability in NGS data and indicate, how it can be understood through the phase-amplitude dichotomy. Finally I will show that the new approach can reveal additional<br>information in the genetic data. The proposed method can be effectively utilized in detecting events of alternative splicing, exon blocking, exon skipping, can be also thought of as a new setting for inference on NGS data.</p
Improving statistical models for discovering cell type specific genes
<p>To reveal genetic control of various subsets of CD4 T cells we compared gene expression profiles<br>of resting and activated conventional CD4 T cells, resting and activated natural Treg cells and<br>adaptive Treg cells. RNA was isolated from the respective T cell populations and hybridized to<br>Affymetrix GeneChip M430 2.0 Plus microarrays. Three individual samples of each kind were<br>processed.</p>
<p><br>In order to make our data set more representative we included microarrays from the respective<br>CD4 T cell subsets from other laboratories. These data were obtained from the GEO database:<br>www.ncbi.nlm.nih.gov/geo.</p>
<p><br>Using the LiMMa framework improved by NMF gene selection we presented ≈ 100 genes divided into 5 clusters<br>specific for corresponding investigated 5 cell types. We pointed out that applying same analysis to data from<br>multiple sources requires extreme care, as differences between sources might have strong influence on results.We<br>proposed a framework (Iterative LiMMa procedure) which allows to reduce, to some extend, source effect in<br>each sample. Nevertheless, the complete solution for removing the effect of the origin remains unknown.</p
SRVF functional data analysis for Next Generation Sequencing Data
<b>Motivation:</b> Sequencing-based methods to examine fundamental features of<br>the genome, such as gene expression and chromatin structure, rely on inferences<br>from the abundance and distribution of reads derived from Illumina sequencing.<br>Drawing sound inferences from such experiments relies on appropriate mathemat-<br>ical methods to model the distribution of reads along the genome, which has been<br>challenging due to the scale and nature of these data.<br><b>Results:</b> We propose a new framework (SRSFseq) based on Square Root<br>Slope Functions shape analysis to analyse Illumina sequencing data for count-<br>based assays using point process filtering. In the new approach the mapped reads<br>are interpreted as realizations of a stochastic poisson point process over genomic<br>regions of interest. The Poisson assumption is used to fit an intensity function<br>and the new generative model enables to account for shape variability of the in-<br>tensities in addition to standard L2 differences. An equivalent of a Fisher test<br>is used to quantify the significance of shape differences in read distribution pat-<br>terns between groups of intensity functions in different experimental conditions.<br>We evaluated the performance of this new framework to analyze RNA-seq data<br>at the exon level, which enabled the detection of variation in read distributions<br>and abundances between experimental conditions not detected by other methods.<br>The variety of intensity representations and flexibility of mathematical design al-<br>lows the model to be easily adapted to other data types or problems in which<br>the distribution ad count is to be tested. The functional interpretation and SRSF<br>phase-amplitude separation technique gives an efficient noise reduction procedure improving the sensitivity and specificity of the method.<br><br><br
Functional interpretation of RNA sequencing uncovers new patterns in genomic activity
The crucial component of the modern gene regulation analysis is a proper<br>interpretation of the experimental results. The rapid development of the<br>sequencing technologies left the statistical and the mathematical tools<br>lagging behind. In this poster we show how to use functional data<br>modeling to unlock the potential of sequencing methods.<br><br