130 research outputs found

    The identification of informative genes from multiple datasets with increasing complexity

    Get PDF
    Background In microarray data analysis, factors such as data quality, biological variation, and the increasingly multi-layered nature of more complex biological systems complicates the modelling of regulatory networks that can represent and capture the interactions among genes. We believe that the use of multiple datasets derived from related biological systems leads to more robust models. Therefore, we developed a novel framework for modelling regulatory networks that involves training and evaluation on independent datasets. Our approach includes the following steps: (1) ordering the datasets based on their level of noise and informativeness; (2) selection of a Bayesian classifier with an appropriate level of complexity by evaluation of predictive performance on independent data sets; (3) comparing the different gene selections and the influence of increasing the model complexity; (4) functional analysis of the informative genes. Results In this paper, we identify the most appropriate model complexity using cross-validation and independent test set validation for predicting gene expression in three published datasets related to myogenesis and muscle differentiation. Furthermore, we demonstrate that models trained on simpler datasets can be used to identify interactions among genes and select the most informative. We also show that these models can explain the myogenesis-related genes (genes of interest) significantly better than others (P < 0.004) since the improvement in their rankings is much more pronounced. Finally, after further evaluating our results on synthetic datasets, we show that our approach outperforms a concordance method by Lai et al. in identifying informative genes from multiple datasets with increasing complexity whilst additionally modelling the interaction between genes. Conclusions We show that Bayesian networks derived from simpler controlled systems have better performance than those trained on datasets from more complex biological systems. Further, we present that highly predictive and consistent genes, from the pool of differentially expressed genes, across independent datasets are more likely to be fundamentally involved in the biological process under study. We conclude that networks trained on simpler controlled systems, such as in vitro experiments, can be used to model and capture interactions among genes in more complex datasets, such as in vivo experiments, where these interactions would otherwise be concealed by a multitude of other ongoing events

    Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput sequencing technologies, such as the Illumina Genome Analyzer, are powerful new tools for investigating a wide range of biological and medical questions. Statistical and computational methods are key for drawing meaningful and accurate conclusions from the massive and complex datasets generated by the sequencers. We provide a detailed evaluation of statistical methods for normalization and differential expression (DE) analysis of Illumina transcriptome sequencing (mRNA-Seq) data.</p> <p>Results</p> <p>We compare statistical methods for detecting genes that are significantly DE between two types of biological samples and find that there are substantial differences in how the test statistics handle low-count genes. We evaluate how DE results are affected by features of the sequencing platform, such as, varying gene lengths, base-calling calibration method (with and without phi X control lane), and flow-cell/library preparation effects. We investigate the impact of the read count normalization method on DE results and show that the standard approach of scaling by total lane counts (e.g., RPKM) can bias estimates of DE. We propose more general quantile-based normalization procedures and demonstrate an improvement in DE detection.</p> <p>Conclusions</p> <p>Our results have significant practical and methodological implications for the design and analysis of mRNA-Seq experiments. They highlight the importance of appropriate statistical methods for normalization and DE inference, to account for features of the sequencing platform that could impact the accuracy of results. They also reveal the need for further research in the development of statistical and computational methods for mRNA-Seq.</p

    Second-Generation Sequencing Supply an Effective Way to Screen RNAi Targets in Large Scale for Potential Application in Pest Insect Control

    Get PDF
    The key of RNAi approach success for potential insect pest control is mainly dependent on careful target selection and a convenient delivery system. We adopted second-generation sequencing technology to screen RNAi targets. Illumina's RNA-seq and digital gene expression tag profile (DGE-tag) technologies were used to screen optimal RNAi targets from Ostrinia furnalalis. Total 14690 stage specific genes were obtained which can be considered as potential targets, and 47 were confirmed by qRT-PCR. Ten larval stage specific expression genes were selected for RNAi test. When 50 ng/µl dsRNAs of the genes DS10 and DS28 were directly sprayed on the newly hatched larvae which placed on the filter paper, the larval mortalities were around 40∼50%, while the dsRNAs of ten genes were sprayed on the larvae along with artificial diet, the mortalities reached 73% to 100% at 5 d after treatment. The qRT-PCR analysis verified the correlation between larval mortality and the down-regulation of the target gene expression. Topically applied fluorescent dsRNA confirmed that dsRNA did penetrate the body wall and circulate in the body cavity. It seems likely that the combination of DGE-tag with RNA-seq is a rapid, high-throughput, cost less and an easy way to select the candidate target genes for RNAi. More importantly, it demonstrated that dsRNAs are able to penetrate the integument and cause larval developmental stunt and/or death in a lepidopteron insect. This finding largely broadens the target selection for RNAi from just gut-specific genes to the targets in whole insects and may lead to new strategies for designing RNAi-based technology against insect damage

    Next-generation sequencing of vertebrate experimental organisms

    Get PDF
    Next-generation sequencing technologies are revolutionizing biology by allowing for genome-wide transcription factor binding-site profiling, transcriptome sequencing, and more recently, whole-genome resequencing. While it is currently not possible to generate complete de novo assemblies of higher-vertebrate genomes using next-generation sequencing, improvements in sequence read lengths and throughput, coupled with new assembly algorithms for large data sets, will soon make this a reality. These developments will in turn spawn a revolution in how genomic data are used to understand genetics and how model organisms are used for disease gene discovery. This review provides an overview of the current next-generation sequencing platforms and the newest computational tools for the analysis of next-generation sequencing data. We also describe how next-generation sequencing may be applied in the context of vertebrate model organism genetics

    Increasing incidence and mortality of infective endocarditis: a population-based study through a record-linkage system

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Few population-based studies provide epidemiological data on infective endocarditis (IE). Aim of the study is to analyze incidence and outcomes of IE in the Veneto Region (North-Eastern Italy).</p> <p>Methods</p> <p>Residents with a first hospitalization for IE in 2000-2008 were extracted from discharge data and linked to mortality records to estimate 365-days survival. Etiology was retrieved in subsets of this cohort by discharge codes and by linkage to a microbiological database. Risk factors for mortality were assessed through logistic regression.</p> <p>Results</p> <p>1,863 subjects were hospitalized for IE, with a corresponding crude rate of 4.4 per 100,000 person-years, increasing from 4.1 in 2000-2002 to 4.9 in 2006-2008 (p = 0.003). Median age was 68 years; 39% of subjects were hospitalized in the three preceding months. 23% of patients underwent a cardiac valve procedure in the index admission or in the following year. Inhospital mortality was 14% (19% including hospital transfers); 90-days and 365-days mortality rose through the study years. Mortality increased with age and the Charlson comorbidity index, in subjects with previous hospitalizations for heart failure, and (in the subcohort with microbiological data) in IE due to Staphylococci (40% of IE).</p> <p>Conclusions</p> <p>The study demonstrates an increasing incidence and mortality for IE over the last decade. Analyses of electronic archives provide a region-wide picture of IE, overcoming referral biases affecting single clinic or multicentric studies, and therefore represent a first fundamental step to detect critical issues related to IE.</p

    Non-nosocomial healthcare-associated infective endocarditis in Taiwan: an underrecognized disease with poor outcome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Non-nosocomial healthcare-associated infective endocarditis (NNHCA-IE) is a new category of IE of increasing importance. This study described the clinical and microbiological characteristics and outcome of NNHCA-IE in Taiwan.</p> <p>Methods</p> <p>A retrospective study was conducted of all patients with IE admitted to the Kaohsiung Veterans General Hospital in Kaohsiung, Taiwan over a five-year period from July 2004 to July 2009. The clinical and microbiological features of NNHCA-IE were compared to those of community-acquired and nosocomial IE. Predictors for in-hospital death were determined.</p> <p>Results</p> <p>Two-hundred episodes of confirmed IE occurred during the study period. These included 148 (74%) community-acquired, 30 (15%) non-nosocomial healthcare-associated, and 22 (11%) nosocomial healthcare-associated IE. <it>Staphylococcus aureus </it>was the most frequent pathogen. Patients with NNHCA-IE compared to community-acquired IE, were older (median age, 67 vs. 44, years, <it>p </it>< 0.001), had more MRSA (43.3% vs. 9.5%, <it>p </it>< 0.001), more comorbidity conditions (median Charlson comorbidity index [interquartile range], 4[2-6] vs. 0[0-1], <it>p </it>< 0.001), a higher in-hospital mortality (50.0% vs. 17.6%, <it>p </it>< 0.001) and were less frequently recognized by clinicians on admission (16.7% vs. 47.7%, <it>p </it>= 0.002). The overall in-hospital mortality rate for all patients with IE was 25%. Shock was the strongest risk factor for in-hospital death (odds ratio 7.8, 95% confidence interval 2.4-25.2, <it>p </it>< 0.001).</p> <p>Conclusions</p> <p>NNHCA-IE is underrecognized and carries a high mortality rate. Early recognition is crucial to provide optimal management and improve outcome.</p

    Novel Protein-Protein Interactions Inferred from Literature Context

    Get PDF
    We have developed a method that predicts Protein-Protein Interactions (PPIs) based on the similarity of the context in which proteins appear in literature. This method outperforms previously developed PPI prediction algorithms that rely on the conjunction of two protein names in MEDLINE abstracts. We show significant increases in coverage (76% versus 32%) and sensitivity (66% versus 41% at a specificity of 95%) for the prediction of PPIs currently archived in 6 PPI databases. A retrospective analysis shows that PPIs can efficiently be predicted before they enter PPI databases and before their interaction is explicitly described in the literature. The practical value of the method for discovery of novel PPIs is illustrated by the experimental confirmation of the inferred physical interaction between CAPN3 and PARVB, which was based on frequent co-occurrence of both proteins with concepts like Z-disc, dysferlin, and alpha-actinin. The relationships between proteins predicted by our method are broader than PPIs, and include proteins in the same complex or pathway. Dependent on the type of relationships deemed useful, the precision of our method can be as high as 90%. The full set of predicted interactions is available in a downloadable matrix and through the webtool Nermal, which lists the most likely interaction partners for a given protein. Our framework can be used for prioritizing potential interaction partners, hitherto undiscovered, for follow-up studies and to aid the generation of accurate protein interaction maps
    corecore