609,308 research outputs found

    Maximal Extraction of Biological Information from Genetic Interaction Data

    Get PDF
    Targeted genetic perturbation is a powerful tool for inferring gene function in model organisms. Functional relationships between genes can be inferred by observing the effects of multiple genetic perturbations in a single strain. The study of these relationships, generally referred to as genetic interactions, is a classic technique for ordering genes in pathways, thereby revealing genetic organization and gene-to-gene information flow. Genetic interaction screens are now being carried out in high-throughput experiments involving tens or hundreds of genes. These data sets have the potential to reveal genetic organization on a large scale, and require computational techniques that best reveal this organization. In this paper, we use a complexity metric based in information theory to determine the maximally informative network given a set of genetic interaction data. We find that networks with high complexity scores yield the most biological information in terms of (i) specific associations between genes and biological functions, and (ii) mapping modules of co-functional genes. This information-based approach is an automated, unsupervised classification of the biological rules underlying observed genetic interactions. It might have particular potential in genetic studies in which interactions are complex and prior gene annotation data are sparse

    An agent-based simulation framework for complex systems

    Get PDF
    In this abstract we present a new approach to the simulation of complex systems as biological interaction networks, chemical reactions, ecosystems, etc. It aims at overcoming previously proposed analytical approaches that, because of several computational challenges, could not handle systems of realistic com- plexity. The proposed model is based on a set of agents interacting through a shared environment. Each agent functions independently from the others, and its be- havior is driven only by its current status and the "content" of the surrounding environment. The environment is the only "data repository" and does not store the value of variables, but only their presence and concentration. Each agent performs 3 main functions: 1. it samples the environment at random locations 2. based on the distribution of the sampled data and a proper Transfer Func- tion, it computes the rate at which the output values are generated 3. it writes the output "products" at random locations. The environment is modeled as a Really Random Access Memory (R2AM). Data is written and sampled at random memory locations. Each memory location represent an atomic sample (a molecule, a chemical compound, a protein, an ion, . . . ). Presence and concentration of these samples are what constitutes the environment data set. The environment can be sensitive to external stimuli (e.g., pH, Temperature, ...) and can include topological information to allow its partitioning (e.g. between nucleus and cytoplasm in a cell) and the modeling of sample "movements" within the environment. The proposed approach is easily scalable in both complexity and computa- tional costs. Each module could implement a very simple object as a single chemical reaction or a very complex process as a gene translation into a pro- tein. At the same time, from the hardware point of view, the complexity of the objects implementing a single agent can range from a single software process to a dedicated computer or hardware platfor

    Semantic Clustering of Genomic Documents using GO Terms as Feature Set

    Get PDF
    The biological databases generate huge volume of genomics and proteomics data. The sequence information is used by researches to find similarity of genes, proteins and to find other related information. The genomic sequence database consists of large number of attributes as annotations, represented for defining the sequences in Xml format. It is necessary to have proper mechanism to group the documents for information retrieval. Data mining techniques like clustering and classification methods can be used to group the documents. The objective of the paper is to analyze the set of keywords which can be represented as features for grouping the documents semantically. This paper focuses on clustering genomic documents based on both structural and content similarity .The structural similarity is found using structural path between the documents. The semantic similarity is found for the structurally similar documents. We have proposed a methodology to cluster the genomic documents using sequence attributes without using the sequence data. The sequence attributes for genomic documents are analyzed using Filter based feature selection methods to find the relevant feature set for grouping the similar documents. Based on the attribute ranking we have clustered the similar documents using All Keyword approach (KBA) and GO Terms based approach (GOTA). The experimental results of the clusters are validated for two approaches by inferring biological meaning using Gene Ontology. From the results it was inferred that all keywords based approach grouped documents based on the semantic meaning of Gene Ontology terms. The GO terms based approach grouped larger number of documents without considering any other keywords, which is semantically relevant which results in reducing the complexity of the attributes considered. We claim that using GO terms can alone be used as features set to group genomic documents with high similarity

    Exponential Random Graph Modeling for Complex Brain Networks

    Get PDF
    Exponential random graph models (ERGMs), also known as p* models, have been utilized extensively in the social science literature to study complex networks and how their global structure depends on underlying structural components. However, the literature on their use in biological networks (especially brain networks) has remained sparse. Descriptive models based on a specific feature of the graph (clustering coefficient, degree distribution, etc.) have dominated connectivity research in neuroscience. Corresponding generative models have been developed to reproduce one of these features. However, the complexity inherent in whole-brain network data necessitates the development and use of tools that allow the systematic exploration of several features simultaneously and how they interact to form the global network architecture. ERGMs provide a statistically principled approach to the assessment of how a set of interacting local brain network features gives rise to the global structure. We illustrate the utility of ERGMs for modeling, analyzing, and simulating complex whole-brain networks with network data from normal subjects. We also provide a foundation for the selection of important local features through the implementation and assessment of three selection approaches: a traditional p-value based backward selection approach, an information criterion approach (AIC), and a graphical goodness of fit (GOF) approach. The graphical GOF approach serves as the best method given the scientific interest in being able to capture and reproduce the structure of fitted brain networks

    graphite - a Bioconductor package to convert pathway topology to gene network

    Get PDF
    BACKGROUND: Gene set analysis is moving towards considering pathway topology as a crucial feature. Pathway elements are complex entities such as protein complexes, gene family members and chemical compounds. The conversion of pathway topology to a gene/protein networks (where nodes are a simple element like a gene/protein) is a critical and challenging task that enables topology-based gene set analyses. Unfortunately, currently available R/Bioconductor packages provide pathway networks only from single databases. They do not propagate signals through chemical compounds and do not differentiate between complexes and gene families. RESULTS: Here we present graphite, a Bioconductor package addressing these issues. Pathway information from four different databases is interpreted following specific biologically-driven rules that allow the reconstruction of gene-gene networks taking into account protein complexes, gene families and sensibly removing chemical compounds from the final graphs. The resulting networks represent a uniform resource for pathway analyses. Indeed, graphite provides easy access to three recently proposed topological methods. The graphite package is available as part of the Bioconductor software suite. CONCLUSIONS: graphite is an innovative package able to gather and make easily available the contents of the four major pathway databases. In the field of topological analysis graphite acts as a provider of biological information by reducing the pathway complexity considering the biological meaning of the pathway elements

    Single-molecule real-time sequencing combined with optical mapping yields completely finished fungal genome

    Get PDF
    Next-generation sequencing (NGS) technologies have increased the scalability, speed, and resolution of genomic sequencing and, thus, have revolutionized genomic studies. However, eukaryotic genome sequencing initiatives typically yield considerably fragmented genome assemblies. Here, we assessed various state-of-the-art sequencing and assembly strategies in order to produce a contiguous and complete eukaryotic genome assembly, focusing on the filamentous fungus Verticillium dahliae. Compared with Illumina-based assemblies of the V. dahliae genome, hybrid assemblies that also include PacBio- generated long reads establish superior contiguity. Intriguingly, provided that sufficient sequence depth is reached, assemblies solely based on PacBio reads outperform hybrid assemblies and even result in fully assembled chromosomes. Furthermore, the addition of optical map data allowed us to produce a gapless and complete V. dahliae genome assembly of the expected eight chromosomes from telomere to telomere. Consequently, we can now study genomic regions that were previously not assembled or poorly assembled, including regions that are populated by repetitive sequences, such as transposons, allowing us to fully appreciate an organism’s biological complexity. Our data show that a combination of PacBio-generated long reads and optical mapping can be used to generate complete and gapless assemblies of fungal genomes. IMPORTANCE Studying whole-genome sequences has become an important aspect of biological research. The advent of nextgeneration sequencing (NGS) technologies has nowadays brought genomic science within reach of most research laboratories, including those that study nonmodel organisms. However, most genome sequencing initiatives typically yield (highly) fragmented genome assemblies. Nevertheless, considerable relevant information related to genome structure and evolution is likely hidden in those nonassembled regions. Here, we investigated a diverse set of strategies to obtain gapless genome assemblies, using the genome of a typical ascomycete fungus as the template. Eventually, we were able to show that a combination of PacBiogenerated long reads and optical mapping yields a gapless telomere-to-telomere genome assembly, allowing in-depth genome sanalyses to facilitate functional studies into an organism’s biology
    • …
    corecore