97,243 research outputs found

    Application of Spatial Concepts to Genome Data

    Get PDF
    This project will investigate the application of geographic information science concepts and methods to the modeling and analysis of genome data. The primary objective of the research is to develop a data model for genomes that supports the graphical exploration of the higher order spatial arrangement of genome features through spatial queries and spatial data analysis tools. The spatial genome model formalizes topological and order relationships among genome features (before, after, overlap), uses metric properties to refine spatial topologies, and includes representations of features that have uncertain metric properties. The genome spatial model enhances the integrative and comparative potential of genome data by providing the foundation for more powerful spatial reasoning and inferences than can be achieved by data models that incorporate only a small subset of possible temporal-spatial relationships among genome features (e.g. order and distance). The research represents a logical extension from current feature by feature analytical approaches of genome studies to one that allows biologists to ask questions about the contextual and organizational significance of the spatial arrangement of genome features. These functional capabilities should, in turn, aid in the automation of repetitive analytical tasks associated with the mapping of genome features and drive the discovery of biologically significant aspects of genome organization and function

    Systems Level Modeling of the Cell Cycle Using Budding Yeast

    Get PDF
    Proteins involved in the regulation of the cell cycle are highly conserved across all eukaryotes, and so a relatively simple eukaryote such as yeast can provide insight into a variety of cell cycle perturbations including those that occur in human cancer. To date, the budding yeast Saccharomyces cerevisiae has provided the largest amount of experimental and modeling data on the progression of the cell cycle, making it a logical choice for in-depth studies of this process. Moreover, the advent of methods for collection of high-throughput genome, transcriptome, and proteome data has provided a means to collect and precisely quantify simultaneous cell cycle gene transcript and protein levels, permitting modeling of the cell cycle on the systems level. With the appropriate mathematical framework and sufficient and accurate data on cell cycle components, it should be possible to create a model of the cell cycle that not only effectively describes its operation, but can also predict responses to perturbations such as variation in protein levels and responses to external stimuli including targeted inhibition by drugs. In this review, we summarize existing data on the yeast cell cycle, proteomics technologies for quantifying cell cycle proteins, and the mathematical frameworks that can integrate this data into representative and effective models. Systems level modeling of the cell cycle will require the integration of high-quality data with the appropriate mathematical framework, which can currently be attained through the combination of dynamic modeling based on proteomics data and using yeast as a model organism

    Unconventional machine learning of genome-wide human cancer data

    Full text link
    Recent advances in high-throughput genomic technologies coupled with exponential increases in computer processing and memory have allowed us to interrogate the complex aberrant molecular underpinnings of human disease from a genome-wide perspective. While the deluge of genomic information is expected to increase, a bottleneck in conventional high-performance computing is rapidly approaching. Inspired in part by recent advances in physical quantum processors, we evaluated several unconventional machine learning (ML) strategies on actual human tumor data. Here we show for the first time the efficacy of multiple annealing-based ML algorithms for classification of high-dimensional, multi-omics human cancer data from the Cancer Genome Atlas. To assess algorithm performance, we compared these classifiers to a variety of standard ML methods. Our results indicate the feasibility of using annealing-based ML to provide competitive classification of human cancer types and associated molecular subtypes and superior performance with smaller training datasets, thus providing compelling empirical evidence for the potential future application of unconventional computing architectures in the biomedical sciences

    Infectious Disease Ontology

    Get PDF
    Technological developments have resulted in tremendous increases in the volume and diversity of the data and information that must be processed in the course of biomedical and clinical research and practice. Researchers are at the same time under ever greater pressure to share data and to take steps to ensure that data resources are interoperable. The use of ontologies to annotate data has proven successful in supporting these goals and in providing new possibilities for the automated processing of data and information. In this chapter, we describe different types of vocabulary resources and emphasize those features of formal ontologies that make them most useful for computational applications. We describe current uses of ontologies and discuss future goals for ontology-based computing, focusing on its use in the field of infectious diseases. We review the largest and most widely used vocabulary resources relevant to the study of infectious diseases and conclude with a description of the Infectious Disease Ontology (IDO) suite of interoperable ontology modules that together cover the entire infectious disease domain

    Understanding Communication Signals during Mycobacterial Latency through Predicted Genome-Wide Protein Interactions and Boolean Modeling

    Get PDF
    About 90% of the people infected with Mycobacterium tuberculosis carry latent bacteria that are believed to get activated upon immune suppression. One of the fundamental challenges in the control of tuberculosis is therefore to understand molecular mechanisms involved in the onset of latency and/or reactivation. We have attempted to address this problem at the systems level by a combination of predicted functional protein∶protein interactions, integration of functional interactions with large scale gene expression studies, predicted transcription regulatory network and finally simulations with a Boolean model of the network. Initially a prediction for genome-wide protein functional linkages was obtained based on genome-context methods using a Support Vector Machine. This set of protein functional linkages along with gene expression data of the available models of latency was employed to identify proteins involved in mediating switch signals during dormancy. We show that genes that are up and down regulated during dormancy are not only coordinately regulated under dormancy-like conditions but also under a variety of other experimental conditions. Their synchronized regulation indicates that they form a tightly regulated gene cluster and might form a latency-regulon. Conservation of these genes across bacterial species suggests a unique evolutionary history that might be associated with M. tuberculosis dormancy. Finally, simulations with a Boolean model based on the regulatory network with logical relationships derived from gene expression data reveals a bistable switch suggesting alternating latent and actively growing states. Our analysis based on the interaction network therefore reveals a potential model of M. tuberculosis latency

    Training-free Measures Based on Algorithmic Probability Identify High Nucleosome Occupancy in DNA Sequences

    Full text link
    We introduce and study a set of training-free methods of information-theoretic and algorithmic complexity nature applied to DNA sequences to identify their potential capabilities to determine nucleosomal binding sites. We test our measures on well-studied genomic sequences of different sizes drawn from different sources. The measures reveal the known in vivo versus in vitro predictive discrepancies and uncover their potential to pinpoint (high) nucleosome occupancy. We explore different possible signals within and beyond the nucleosome length and find that complexity indices are informative of nucleosome occupancy. We compare against the gold standard (Kaplan model) and find similar and complementary results with the main difference that our sequence complexity approach. For example, for high occupancy, complexity-based scores outperform the Kaplan model for predicting binding representing a significant advancement in predicting the highest nucleosome occupancy following a training-free approach.Comment: 8 pages main text (4 figures), 12 total with Supplementary (1 figure
    corecore