284 research outputs found

    VariantDB: A flexible annotation and filtering portal for next generation sequencing data

    Get PDF

    The identification of informative genes from multiple datasets with increasing complexity

    Get PDF
    Background In microarray data analysis, factors such as data quality, biological variation, and the increasingly multi-layered nature of more complex biological systems complicates the modelling of regulatory networks that can represent and capture the interactions among genes. We believe that the use of multiple datasets derived from related biological systems leads to more robust models. Therefore, we developed a novel framework for modelling regulatory networks that involves training and evaluation on independent datasets. Our approach includes the following steps: (1) ordering the datasets based on their level of noise and informativeness; (2) selection of a Bayesian classifier with an appropriate level of complexity by evaluation of predictive performance on independent data sets; (3) comparing the different gene selections and the influence of increasing the model complexity; (4) functional analysis of the informative genes. Results In this paper, we identify the most appropriate model complexity using cross-validation and independent test set validation for predicting gene expression in three published datasets related to myogenesis and muscle differentiation. Furthermore, we demonstrate that models trained on simpler datasets can be used to identify interactions among genes and select the most informative. We also show that these models can explain the myogenesis-related genes (genes of interest) significantly better than others (P < 0.004) since the improvement in their rankings is much more pronounced. Finally, after further evaluating our results on synthetic datasets, we show that our approach outperforms a concordance method by Lai et al. in identifying informative genes from multiple datasets with increasing complexity whilst additionally modelling the interaction between genes. Conclusions We show that Bayesian networks derived from simpler controlled systems have better performance than those trained on datasets from more complex biological systems. Further, we present that highly predictive and consistent genes, from the pool of differentially expressed genes, across independent datasets are more likely to be fundamentally involved in the biological process under study. We conclude that networks trained on simpler controlled systems, such as in vitro experiments, can be used to model and capture interactions among genes in more complex datasets, such as in vivo experiments, where these interactions would otherwise be concealed by a multitude of other ongoing events

    Filtering Genes for Cluster and Network Analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Prior to cluster analysis or genetic network analysis it is customary to filter, or remove genes considered to be irrelevant from the set of genes to be analyzed. Often genes whose variation across samples is less than an arbitrary threshold value are deleted. This can improve interpretability and reduce bias.</p> <p>Results</p> <p>This paper introduces modular models for representing network structure in order to study the relative effects of different filtering methods. We show that cluster analysis and principal components are strongly affected by filtering. Filtering methods intended specifically for cluster and network analysis are introduced and compared by simulating modular networks with known statistical properties. To study more realistic situations, we analyze simulated "real" data based on well-characterized E. coli and S. cerevisiae regulatory networks.</p> <p>Conclusion</p> <p>The methods introduced apply very generally, to any similarity matrix describing gene expression. One of the proposed methods, SUMCOV, performed well for all models simulated.</p

    Non-destructive evaluation techniques and what they tell us about wood property variation

    Get PDF
    To maximize utilization of our forest resources, detailed knowledge of wood property variation and the impacts this has on end-product performance is required at multiple scales (within and among trees, regionally). As many wood properties are difficult and time-consuming to measure our knowledge regarding their variation is often inadequate as is our understanding of their responses to genetic and silvicultural manipulation. The emergence of many non-destructive evaluation (NDE) methodologies offers the potential to greatly enhance our understanding of the forest resource; however, it is critical to recognize that any technique has its limitations and it is important to select the appropriate technique for a given application. In this review, we will discuss the following technologies for assessing wood properties both in the field: acoustics, Pilodyn, Resistograph and Rigidimeter and the lab: computer tomography (CT) scanning, DiscBot, near infrared (NIR) spectroscopy, radial sample acoustics and SilviScan. We will discuss these techniques, explore their utilization, and list applications that best suit each methodology. As an end goal, NDE technologies will help researchers worldwide characterize wood properties, develop accurate models for prediction, and utilize field equipment that can validate the predictions. The continued advancement of NDE technologies will also allow researchers to better understand the impact on wood properties on product performance

    An Alternate Method of Classifying Allergic Bronchopulmonary Aspergillosis Based on High-Attenuation Mucus

    Get PDF
    Allergic bronchopulmonary aspergillosis (ABPA) is classified radiologically based on the findings of central bronchiectasis (CB) and other radiologic features (ORF). However, the long-term clinical significance of these classifications remains unknown. We hypothesized that the immunological activity and outcomes of ABPA could be predicted on HRCT chest finding of high-attenuation mucus (HAM), a marker of inflammatory activity. In this study, we evaluate the severity and clinical outcomes of ABPA with different radiological classifications. specific IgE levels, eosinophil count) severity of the disease and clinical outcomes in various classifications were analyzed.Of the 234 (123 males, 111 females; mean age, 34.1 years) patients, 55 (23.5%) had normal HRCT, 179 (76.5%) had CB, 49 (20.9%) had HAM, and 27 (11.5%) had ORF. All immunological markers were consistently higher in the HAM classification, while in other classifications these findings were inconsistent. On multivariate analysis, the factors predicting frequent relapses were presence of HAM (OR 7.38; 95% CI, 3.21–17.0) and CB (OR 3.93; 95% CI, 1.63–9.48) after adjusting for ORF.The classification scheme based on HAM most consistently predicts immunological severity in ABPA. Central bronchiectasis and HAM are independent predictors of recurrent relapses in ABPA. Hence, HAM should be employed in the radiological classification of ABPA
    corecore