24 research outputs found

    Precalculus

    No full text
    598 hlm., 24 c

    An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data.

    No full text
    The discrete data structure and large sequencing depth of RNA sequencing (RNA-seq) experiments can often generate outlier read counts in one or more RNA samples within a homogeneous group. Thus, how to identify and manage outlier observations in RNA-seq data is an emerging topic of interest. One of the main objectives in these research efforts is to develop statistical methodology that effectively balances the impact of outlier observations and achieves maximal power for statistical testing. To reach that goal, strengthening the accuracy of outlier detection is an important precursor. Current outlier detection algorithms for RNA-seq data are executed within a testing framework and may be sensitive to sparse data and heavy-tailed distributions. Therefore, we propose a univariate algorithm that utilizes a probabilistic approach to measure the deviation between an observation and the distribution generating the remaining data and implement it within in an iterative leave-one-out design strategy. Analyses of real and simulated RNA-seq data show that the proposed methodology has higher outlier detection rates for both non-normalized and normalized negative binomial distributed data

    Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery

    No full text
    Abstract Background A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. Results The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. Conclusion The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features

    Evaluating the Stability of RNA-Seq Transcriptome Profiles and Drug-Induced Immune-Related Expression Changes in Whole Blood.

    No full text
    Methods were developed to evaluate the stability of rat whole blood expression obtained from RNA sequencing (RNA-seq) and assess changes in whole blood transcriptome profiles in experiments replicated over time. Expression was measured in globin-depleted RNA extracted from the whole blood of Sprague-Dawley rats, given either saline (control) or neurotoxic doses of amphetamine (AMPH). The experiment was repeated four times (paired control and AMPH groups) over a 2-year span. The transcriptome of the control and AMPH-treated groups was evaluated on: 1) transcript levels for ribosomal protein subunits; 2) relative expression of immune-related genes; 3) stability of the control transcriptome over 2 years; and 4) stability of the effects of AMPH on immune-related genes over 2 years. All, except one, of the 70 genes that encode the 80s ribosome had levels that ranked in the top 5% of all mean expression levels. Deviations in sequencing performance led to significant changes in the ribosomal transcripts. The overall expression profile of immune-related genes and genes specific to monocytes, T-cells or B-cells were well represented and consistent within treatment groups. There were no differences between the levels of ribosomal transcripts in time-matched control and AMPH groups but significant differences in the expression of immune-related genes between control and AMPH groups. AMPH significantly increased expression of some genes related to monocytes but down-regulated those specific to T-cells. These changes were partially due to changes in the two types of leukocytes present in blood, which indicate an activation of the innate immune system by AMPH. Thus, the stability of RNA-seq whole blood transcriptome can be verified by assessing ribosomal protein subunits and immune-related gene expression. Such stability enables the pooling of samples from replicate experiments to carry out differential expression analysis with acceptable power

    Venn diagram of the number of features with outliers detected by <i>iLOO</i> and <i>edgeR-robust</i>.

    No full text
    <p>The totals provided present the number of (a) single outlier features and (b) features with two detected outliers identified by <i>iLOO</i> and <i>edgeR-robust</i> in the control group of rat RNA-seq data.</p

    Scatterplot of read counts observed in real data for a sample of features.

    No full text
    <p>Scatterplot of raw counts for six representative features displaying counts identified as outliers by <i>iLOO</i> (purple diamond), <i>edgeR-robust</i> (red diamond), and both methods (blue diamond) in the control group of rat RNA-seq data.</p

    Number of features with 0 through 4 detected outliers in the control group of rat RNA-seq data.

    No full text
    <p>Number of features with 0 through 4 detected outliers in the control group of rat RNA-seq data.</p

    Expression of Ribosomal Subunits in the Control versus Amphetamine Groups.

    No full text
    <p>Pearson correlation and scatterplot matrix of log2 normalized expression of the ribosomal subunit proteins comparing each control group to its time-matched AMPH group. The red solid line is the identity line.</p
    corecore