2,800 research outputs found

    The INCF Digital Atlasing Program: Report on Digital Atlasing Standards in the Rodent Brain

    Get PDF
    The goal of the INCF Digital Atlasing Program is to provide the vision and direction necessary to make the rapidly growing collection of multidimensional data of the rodent brain (images, gene expression, etc.) widely accessible and usable to the international research community. This Digital Brain Atlasing Standards Task Force was formed in May 2008 to investigate the state of rodent brain digital atlasing, and formulate standards, guidelines, and policy recommendations.

Our first objective has been the preparation of a detailed document that includes the vision and specific description of an infrastructure, systems and methods capable of serving the scientific goals of the community, as well as practical issues for achieving
the goals. This report builds on the 1st INCF Workshop on Mouse and Rat Brain Digital Atlasing Systems (Boline et al., 2007, _Nature Preceedings_, doi:10.1038/npre.2007.1046.1) and includes a more detailed analysis of both the current state and desired state of digital atlasing along with specific recommendations for achieving these goals

    Meta Analysis of Gene Expression Data within and Across Species

    Get PDF
    Since the second half of the 1990s, a large number of genome-wide analyses have been described that study gene expression at the transcript level. To this end, two major strategies have been adopted, a first one relying on hybridization techniques such as microarrays, and a second one based on sequencing techniques such as serial analysis of gene expression (SAGE), cDNA-AFLP, and analysis based on expressed sequence tags (ESTs). Despite both types of profiling experiments becoming routine techniques in many research groups, their application remains costly and laborious. As a result, the number of conditions profiled in individual studies is still relatively small and usually varies from only two to few hundreds of samples for the largest experiments. More and more, scientific journals require the deposit of these high throughput experiments in public databases upon publication. Mining the information present in these databases offers molecular biologists the possibility to view their own small-scale analysis in the light of what is already available. However, so far, the richness of the public information remains largely unexploited. Several obstacles such as the correct association between ESTs and microarray probes with the corresponding gene transcript, the incompleteness and inconsistency in the annotation of experimental conditions, and the lack of standardized experimental protocols to generate gene expression data, all impede the successful mining of these data. Here, we review the potential and difficulties of combining publicly available expression data from respectively EST analyses and microarray experiments. With examples from literature, we show how meta-analysis of expression profiling experiments can be used to study expression behavior in a single organism or between organisms, across a wide range of experimental conditions. We also provide an overview of the methods and tools that can aid molecular biologists in exploiting these public data

    Making open data work for plant scientists

    Get PDF
    Despite the clear demand for open data sharing, its implementation within plant science is still limited. This is, at least in part, because open data-sharing raises several unanswered questions and challenges to current research practices. In this commentary, some of the challenges encountered by plant researchers at the bench when generating, interpreting, and attempting to disseminate their data have been highlighted. The difficulties involved in sharing sequencing, transcriptomics, proteomics, and metabolomics data are reviewed. The benefits and drawbacks of three data-sharing venues currently available to plant scientists are identified and assessed: (i) journal publication; (ii) university repositories; and (iii) community and project-specific databases. It is concluded that community and project-specific databases are the most useful to researchers interested in effective data sharing, since these databases are explicitly created to meet the researchers’ needs, support extensive curation, and embody a heightened awareness of what it takes to make data reuseable by others. Such bottom-up and community-driven approaches need to be valued by the research community, supported by publishers, and provided with long-term sustainable support by funding bodies and government. At the same time, these databases need to be linked to generic databases where possible, in order to be discoverable to the majority of researchers and thus promote effective and efficient data sharing. As we look forward to a future that embraces open access to data and publications, it is essential that data policies, data curation, data integration, data infrastructure, and data funding are linked together so as to foster data access and research productivity

    reComBat: batch-effect removal in large-scale multi-source gene-expression data integration

    Get PDF
    With the steadily increasing abundance of omics data produced all over the world under vastly different experimental conditions residing in public databases, a crucial step in many data-driven bioinformatics applications is that of data integration. The challenge of batch-effect removal for entire databases lies in the large number of batches and biological variation, which can result in design matrix singularity. This problem can currently not be solved satisfactorily by any common batch-correction algorithm.; We present; reComBat; , a regularized version of the empirical Bayes method to overcome this limitation and benchmark it against popular approaches for the harmonization of public gene-expression data (both microarray and bulkRNAsq) of the human opportunistic pathogen; Pseudomonas aeruginosa; . Batch-effects are successfully mitigated while biologically meaningful gene-expression variation is retained.; reComBat; fills the gap in batch-correction approaches applicable to large-scale, public omics databases and opens up new avenues for data-driven analysis of complex biological processes beyond the scope of a single study.; The code is available at https://github.com/BorgwardtLab/reComBat, all data and evaluation code can be found at https://github.com/BorgwardtLab/batchCorrectionPublicData.; Supplementary data are available at; Bioinformatics Advances; online

    Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis

    Full text link
    A prespecified set of genes may be enriched, to varying degrees, for genes that have altered expression levels relative to two or more states of a cell. Knowing the enrichment of gene sets defined by functional categories, such as gene ontology (GO) annotations, is valuable for analyzing the biological signals in microarray expression data. A common approach to measuring enrichment is by cross-classifying genes according to membership in a functional category and membership on a selected list of significantly altered genes. A small Fisher's exact test pp-value, for example, in this 2×22\times2 table is indicative of enrichment. Other category analysis methods retain the quantitative gene-level scores and measure significance by referring a category-level statistic to a permutation distribution associated with the original differential expression problem. We describe a class of random-set scoring methods that measure distinct components of the enrichment signal. The class includes Fisher's test based on selected genes and also tests that average gene-level evidence across the category. Averaging and selection methods are compared empirically using Affymetrix data on expression in nasopharyngeal cancer tissue, and theoretically using a location model of differential expression. We find that each method has a domain of superiority in the state space of enrichment problems, and that both methods have benefits in practice. Our analysis also addresses two problems related to multiple-category inference, namely, that equally enriched categories are not detected with equal probability if they are of different sizes, and also that there is dependence among category statistics owing to shared genes. Random-set enrichment calculations do not require Monte Carlo for implementation. They are made available in the R package allez.Comment: Published at http://dx.doi.org/10.1214/07-AOAS104 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Integrative disease classification based on cross-platform microarray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Disease classification has been an important application of microarray technology. However, most microarray-based classifiers can only handle data generated within the same study, since microarray data generated by different laboratories or with different platforms can not be compared directly due to systematic variations. This issue has severely limited the practical use of microarray-based disease classification.</p> <p>Results</p> <p>In this study, we tested the feasibility of disease classification by integrating the large amount of heterogeneous microarray datasets from the public microarray repositories. Cross-platform data compatibility is created by deriving expression log-rank ratios within datasets. One may then compare vectors of log-rank ratios across datasets. In addition, we systematically map textual annotations of datasets to concepts in Unified Medical Language System (UMLS), permitting quantitative analysis of the phenotype "distance" between datasets and automated construction of disease classes. We design a new classification approach named ManiSVM, which integrates Manifold data transformation with SVM learning to exploit the data properties. Using the leave one dataset out cross validation, ManiSVM achieved the overall accuracy of 70.7% (68.6% precision and 76.9% recall) with many disease classes achieving the accuracy higher than 80%.</p> <p>Conclusion</p> <p>Our results not only demonstrated the feasibility of the integrated disease classification approach, but also showed that the classification accuracy increases with the number of homogenous training datasets. Thus, the power of the integrative approach will increase with the continuous accumulation of microarray data in public repositories. Our study shows that automated disease diagnosis can be an important and promising application of the enormous amount of costly to generate, yet freely available, public microarray data.</p

    Short time-series microarray analysis: Methods and challenges

    Get PDF
    The detection and analysis of steady-state gene expression has become routine. Time-series microarrays are of growing interest to systems biologists for deciphering the dynamic nature and complex regulation of biosystems. Most temporal microarray data only contain a limited number of time points, giving rise to short-time-series data, which imposes challenges for traditional methods of extracting meaningful information. To obtain useful information from the wealth of short-time series data requires addressing the problems that arise due to limited sampling. Current efforts have shown promise in improving the analysis of short time-series microarray data, although challenges remain. This commentary addresses recent advances in methods for short-time series analysis including simplification-based approaches and the integration of multi-source information. Nevertheless, further studies and development of computational methods are needed to provide practical solutions to fully exploit the potential of this data
    corecore