280 research outputs found

    Machine learning of hierarchical clustering to segment 2D and 3D images

    Get PDF
    We aim to improve segmentation through the use of machine learning tools during region agglomeration. We propose an active learning approach for performing hierarchical agglomerative segmentation from superpixels. Our method combines multiple features at all scales of the agglomerative process, works for data with an arbitrary number of dimensions, and scales to very large datasets. We advocate the use of variation of information to measure segmentation accuracy, particularly in 3D electron microscopy (EM) images of neural tissue, and using this metric demonstrate an improvement over competing algorithms in EM and natural images.Comment: 15 pages, 8 figure

    Practical Sparse Matrices in C++ with Hybrid Storage and Template-Based Expression Optimisation

    Get PDF
    Despite the importance of sparse matrices in numerous fields of science, software implementations remain difficult to use for non-expert users, generally requiring the understanding of underlying details of the chosen sparse matrix storage format. In addition, to achieve good performance, several formats may need to be used in one program, requiring explicit selection and conversion between the formats. This can be both tedious and error-prone, especially for non-expert users. Motivated by these issues, we present a user-friendly and open-source sparse matrix class for the C++ language, with a high-level application programming interface deliberately similar to the widely used MATLAB language. This facilitates prototyping directly in C++ and aids the conversion of research code into production environments. The class internally uses two main approaches to achieve efficient execution: (i) a hybrid storage framework, which automatically and seamlessly switches between three underlying storage formats (compressed sparse column, Red-Black tree, coordinate list) depending on which format is best suited and/or available for specific operations, and (ii) a template-based meta-programming framework to automatically detect and optimise execution of common expression patterns. Empirical evaluations on large sparse matrices with various densities of non-zero elements demonstrate the advantages of the hybrid storage framework and the expression optimisation mechanism.Comment: extended and revised version of an earlier conference paper arXiv:1805.0338

    Usefulness and limitations of dK random graph models to predict interactions and functional homogeneity in biological networks under a pseudo-likelihood parameter estimation approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many aspects of biological functions can be modeled by biological networks, such as protein interaction networks, metabolic networks, and gene coexpression networks. Studying the statistical properties of these networks in turn allows us to infer biological function. Complex statistical network models can potentially more accurately describe the networks, but it is not clear whether such complex models are better suited to find biologically meaningful subnetworks.</p> <p>Results</p> <p>Recent studies have shown that the degree distribution of the nodes is not an adequate statistic in many molecular networks. We sought to extend this statistic with 2nd and 3rd order degree correlations and developed a pseudo-likelihood approach to estimate the parameters. The approach was used to analyze the MIPS and BIOGRID yeast protein interaction networks, and two yeast coexpression networks. We showed that 2nd order degree correlation information gave better predictions of gene interactions in both protein interaction and gene coexpression networks. However, in the biologically important task of predicting functionally homogeneous modules, degree correlation information performs marginally better in the case of the MIPS and BIOGRID protein interaction networks, but worse in the case of gene coexpression networks.</p> <p>Conclusion</p> <p>Our use of dK models showed that incorporation of degree correlations could increase predictive power in some contexts, albeit sometimes marginally, but, in all contexts, the use of third-order degree correlations decreased accuracy. However, it is possible that other parameter estimation methods, such as maximum likelihood, will show the usefulness of incorporating 2nd and 3rd degree correlations in predicting functionally homogeneous modules.</p

    An integrative modular approach to systematically predict gene-phenotype associations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Complex human diseases are often caused by multiple mutations, each of which contributes only a minor effect to the disease phenotype. To study the basis for these complex phenotypes, we developed a network-based approach to identify coexpression modules specifically activated in particular phenotypes. We integrated these modules, protein-protein interaction data, Gene Ontology annotations, and our database of gene-phenotype associations derived from literature to predict novel human gene-phenotype associations. Our systematic predictions provide us with the opportunity to perform a global analysis of human gene pleiotropy and its underlying regulatory mechanisms.</p> <p>Results</p> <p>We applied this method to 338 microarray datasets, covering 178 phenotype classes, and identified 193,145 phenotype-specific coexpression modules. We trained random forest classifiers for each phenotype and predicted a total of 6,558 gene-phenotype associations. We showed that 40.9% genes are pleiotropic, highlighting that pleiotropy is more prevalent than previously expected. We collected 77 ChIP-chip datasets studying 69 transcription factors binding over 16,000 targets under various phenotypic conditions. Utilizing this unique data source, we confirmed that dynamic transcriptional regulation is an important force driving the formation of phenotype specific gene modules.</p> <p>Conclusion</p> <p>We created a genome-wide gene to phenotype mapping that has many potential implications, including providing potential new drug targets and uncovering the basis for human disease phenotypes. Our analysis of these phenotype-specific coexpression modules reveals a high prevalence of gene pleiotropy, and suggests that phenotype-specific transcription factor binding may contribute to phenotypic diversity. All resources from our study are made freely available on our online Phenotype Prediction Database <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p

    Joint Genome-Wide Profiling of miRNA and mRNA Expression in Alzheimer's Disease Cortex Reveals Altered miRNA Regulation

    Get PDF
    Although microRNAs are being extensively studied for their involvement in cancer and development, little is known about their roles in Alzheimer's disease (AD). In this study, we used microarrays for the first joint profiling and analysis of miRNAs and mRNAs expression in brain cortex from AD and age-matched control subjects. These data provided the unique opportunity to study the relationship between miRNA and mRNA expression in normal and AD brains. Using a non-parametric analysis, we showed that the levels of many miRNAs can be either positively or negatively correlated with those of their target mRNAs. Comparative analysis with independent cancer datasets showed that such miRNA-mRNA expression correlations are not static, but rather context-dependent. Subsequently, we identified a large set of miRNA-mRNA associations that are changed in AD versus control, highlighting AD-specific changes in the miRNA regulatory system. Our results demonstrate a robust relationship between the levels of miRNAs and those of their targets in the brain. This has implications in the study of the molecular pathology of AD, as well as miRNA biology in general

    An integrative approach to characterize disease-specific pathways and their coordination: a case study in cancer

    Get PDF
    BACKGROUND: The most common application of microarray technology in disease research is to identify genes differentially expressed in disease versus normal tissues. However, it is known that, in complex diseases, phenotypes are determined not only by genes, but also by the underlying structure of genetic networks. Often, it is the interaction of many genes that causes phenotypic variations. RESULTS: In this work, using cancer as an example, we develop graph-based methods to integrate multiple microarray datasets to discover disease-related co-expression network modules. We propose an unsupervised method that take into account both co-expression dynamics and network topological information to simultaneously infer network modules and phenotype conditions in which they are activated or de-activated. Using our method, we have discovered network modules specific to cancer or subtypes of cancers. Many of these modules are consistent with or supported by their functional annotations or their previously known involvement in cancer. In particular, we identified a module that is predominately activated in breast cancer and is involved in tumor suppression. While individual components of this module have been suggested to be associated with tumor suppression, their coordinated function has never been elucidated. Here by adopting a network perspective, we have identified their interrelationships and, particularly, a hub gene PDGFRL that may play an important role in this tumor suppressor network. CONCLUSION: Using a network-based approach, our method provides new insights into the complex cellular mechanisms that characterize cancer and cancer subtypes. By incorporating co-expression dynamics information, our approach can not only extract more functionally homogeneous modules than those based solely on network topology, but also reveal pathway coordination beyond co-expression

    Loss of active neurogenesis in the adult shark retina

    Get PDF
    Neurogenesis is the process by which progenitor cells generate new neurons. As development progresses neurogenesis becomes restricted to discrete neurogenic niches, where it persists during postnatal life. The retina of teleost fishes is thought to proliferate and produce new cells throughout life. Whether this capacity may be an ancestral characteristic of gnathostome vertebrates is completely unknown. Cartilaginous fishes occupy a key phylogenetic position to infer ancestral states fixed prior to the gnathostome radiation. Previous work from our group revealed that the juvenile retina of the catshark Scyliorhinus canicula, a cartilaginous fish, shows active proliferation and neurogenesis. Here, we compared the morphology and proliferative status of the retina in catshark juveniles and adults. Histological and immunohistochemical analyses revealed an important reduction in the size of the peripheral retina (where progenitor cells are mainly located), a decrease in the thickness of the inner nuclear layer (INL), an increase in the thickness of the inner plexiform layer and a decrease in the cell density in the INL and in the ganglion cell layer in adults. Contrary to what has been reported in teleost fish, mitotic activity in the catshark retina was virtually absent after sexual maturation. Based on these results, we carried out RNA-Sequencing (RNA-Seq) analyses comparing the retinal transcriptome of juveniles and adults, which revealed a statistically significant decrease in the expression of many genes involved in cell proliferation and neurogenesis in adult catsharks. Our RNA-Seq data provides an excellent resource to identify new signaling pathways controlling neurogenesis in the vertebrate retinaFunded by the Ministerio de Economía Industria y Competitividad (to EC; grant number BFU-2017-89861-P) and Xunta de Galicia Predoctoral Fellowship (to IH-N; grant number ED 481 A 2018 216). Both grants were partially financed by the European Social FundS

    Automated tracing of myelinated axons and detection of the nodes of Ranvier in serial images of peripheral nerves

    Get PDF
    The development of realistic neuroanatomical models of peripheral nerves for simulation purposes requires the reconstruction of the morphology of the myelinated fibres in the nerve, including their nodes of Ranvier. Currently, this information has to be extracted by semimanual procedures, which severely limit the scalability of the experiments. In this contribution, we propose a supervised machine learning approach for the detailed reconstruction of the geometry of fibres inside a peripheral nerve based on its high-resolution serial section images. Learning from sparse expert annotations, the algorithm traces myelinated axons, even across the nodes of Ranvier. The latter are detected automatically. The approach is based on classifying the myelinated membranes in a supervised fashion, closing the membrane gaps by solving an assignment problem, and classifying the closed gaps for the nodes of Ranvier detection. The algorithm has been validated on two very different datasets: (i) rat vagus nerve subvolume, SBFSEM microscope, 200 × 200 × 200 nm resolution, (ii) rat sensory branch subvolume, confocal microscope, 384 × 384 × 800 nm resolution. For the first dataset, the algorithm correctly reconstructed 88% of the axons (241 out of 273) and achieved 92% accuracy on the task of Ranvier node detection. For the second dataset, the gap closing algorithm correctly closed 96.2% of the gaps, and 55% of axons were reconstructed correctly through the whole volume. On both datasets, training the algorithm on a small data subset and applying it to the full dataset takes a fraction of the time required by the currently used semiautomated protocols. Our software, raw data and ground truth annotations are available at http://hci.iwr.uni-heidelberg.de/Benchmarks/. The development version of the code can be found at https://github.com/RWalecki/ATMA
    corecore