Search CORE

586 research outputs found

Recommended from our members

The functional network in predictive biology : predicting phenotype from genotype and predicting human disease from fungal phenotype

Author: McGary Kriston Lyle
Publication venue
Publication date: 01/12/2008
Field of study

textThe ability to predict is one of the hallmarks of successful theories. Historically, the predictive power of biology has lagged behind disciplines like physics because the biological world is complex, challenging to quantify, and full of exceptions. However, in recent years the amount of available data has expanded exponentially and biological predictions based on this data become a possibility. The functional gene network is a quantitative way to integrate this data and a useful framework for making biological predictions. This study demonstrates that functional networks capture real biological insight and uses the network to predict both subcellular protein localization and the phenotypic outcome of gene knockouts. Furthermore, I use the functional network to evaluate genetic modules shared between diverse organisms that lead to orthologous phenotypes, many that are non-obvious. I show that the successful predictions of the functional network have broad applicability and implications that range from the design of large-scale biological experiments to the discovery of genes with potential roles in human disease.Institute for Cellular and Molecular Biolog

Texas ScholarWorks

Broad network-based predictability of Saccharomyces cerevisiae gene loss-of-function phenotypes

Author: Lee Insuk
Marcotte Edward M
McGary Kriston L
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Loss-of-function phenotypes of yeast genes can be predicted from the loss-of-function phenotypes of their neighbours in functional gene networks. This could potentially be applied to the prediction of human disease genes

Springer - Publisher Connector

Texas ScholarWorks

Prioritization of candidate disease genes by topological similarity between disease and protein diffusion profiles

Author
Publication venue: BioMed Central
Publication date
Field of study

Springer - Publisher Connector

Mining phenotypes for gene function prediction

Author: A Kahraman
A Keller
AA Dobritsa
AJ Butte
B Hur
B Schwikowski
Bertram Weiss
BP Kelley
CR Scriver
D Kuttenkeuler
D Lin
D Sieburth
E SanJuana
EC Green
F Piano
G Pandey
G Roman
GJ Hannon
Hans-Dieter Pohlenz
JZ Wang
KA Kellerman
KC Gunsalus
KC Gunsalus
KJ Gaulton
LB Vosshall
M Bate
M Steinbach
MA Huynen
MA van Driel
N Daraselia
N Freimer
P Bhandari
P Groth
P Groth
Philip Groth
PW Lord
RM Cripps
S Jaeger
S Raychaudhuri
SC Rison
SD Brown
T Schupbach
U Nongthomba
Ulf Leser
US Eggert
V Mermall
V Spirin
X Guo
Y Lussier
Y Shi
Y Tao
Y Zhao
Y Zhao
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Health and disease of organisms are reflected in their phenotypes. Often, a genetic component to a disease is discovered only after clearly defining its phenotype. In the past years, many technologies to systematically generate phenotypes in a high-throughput manner, such as RNA interference or gene knock-out, have been developed and used to decipher functions for genes. However, there have been relatively few efforts to make use of phenotype data beyond the single genotype-phenotype relationships. Results We present results on a study where we use a large set of phenotype data – in textual form – to predict gene annotation. To this end, we use text clustering to group genes based on their phenotype descriptions. We show that these clusters correlate well with several indicators for biological coherence in gene groups, such as functional annotations from the Gene Ontology (GO) and protein-protein interactions. We exploit these clusters for predicting gene function by carrying over annotations from well-annotated genes to other, less-characterized genes in the same cluster. For a subset of groups selected by applying objective criteria, we can predict GO-term annotations from the biological process sub-ontology with up to 72.6% precision and 16.7% recall, as evaluated by cross-validation. We manually verified some of these clusters and found them to exhibit high biological coherence, e.g. a group containing all available antennal Drosophila odorant receptors despite inconsistent GO-annotations. Conclusion The intrinsic nature of phenotypes to visibly reflect genetic activity underlines their usefulness in inferring new gene functions. Thus, systematically analyzing these data on a large scale offers many possibilities for inferring functional annotation of genes. We show that text clustering can play an important role in this process.</p

Springer - Publisher Connector

Directory of Open Access Journals

Elucidating the Altered Transcriptional Programs in Breast Cancer using Independent Component Analysis

Author: Andrew E Teschendorff
Carlos Caldas
Michel Journée
Pierre A Absil
Rodolphe Sepulchre
Satoru Miyano
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

The quantity of mRNA transcripts in a cell is determined by a complex interplay of cooperative and counteracting biological processes. Independent Component Analysis (ICA) is one of a few number of unsupervised algorithms that have been applied to microarray gene expression data in an attempt to understand phenotype differences in terms of changes in the activation/inhibition patterns of biological pathways. While the ICA model has been shown to outperform other linear representations of the data such as Principal Components Analysis (PCA), a validation using explicit pathway and regulatory element information has not yet been performed. We apply a range of popular ICA algorithms to six of the largest microarray cancer datasets and use pathway-knowledge and regulatory-element databases for validation. We show that ICA outperforms PCA and clustering-based methods in that ICA components map closer to known cancer-related pathways, regulatory modules, and cancer phenotypes. Furthermore, we identify cancer signalling and oncogenic pathways and regulatory modules that play a prominent role in breast cancer and relate the differential activation patterns of these to breast cancer phenotypes. Importantly, we find novel associations linking immune response and epithelial–mesenchymal transition pathways with estrogen receptor status and histological grade, respectively. In addition, we find associations linking the activity levels of biological pathways and transcription factors (NF1 and NFAT) with clinical outcome in breast cancer. ICA provides a framework for a more biologically relevant interpretation of genomewide transcriptomic data. Adopting ICA as the analysis tool of choice will help understand the phenotype–pathway relationship and thus help elucidate the molecular taxonomy of heterogeneous cancers and of other complex genetic diseases

Directory of Open Access Journals

Computational Proteomics Using Network-Based Strategies

Author: Goh Wen
Publication venue: Computing, Imperial College London
Publication date: 01/03/2014
Field of study

This thesis examines the productive application of networks towards proteomics, with a specific biological focus on liver cancer. Contempory proteomics (shot- gun) is plagued by coverage and consistency issues. These can be resolved via network-based approaches. The application of 3 classes of network-based approaches are examined: A traditional cluster based approach termed Proteomics Expansion Pipeline), a generalization of PEP termed Maxlink and a feature-based approach termed Proteomics Signature Profiling. PEP is an improvement on prevailing cluster-based approaches. It uses a state- of-the-art cluster identification algorithm as well as network-cleaning approaches to identify the critical network regions indicated by the liver cancer data set. The top PARP1 associated-cluster was identified and independently validated. Maxlink allows identification of undetected proteins based on the number of links to identified differential proteins. It is more sensitive than PEP due to more relaxed requirements. Here, the novel roles of ARRB1/2 and ACTB are identified and discussed in the context of liver cancer. Both PEP and Maxlink are unable to deal with consistency issues, PSP is the first method able to deal with both, and is termed feature-based since the network- based clusters it uses are predicted independently of the data. It is also capable of using real complexes or predicted pathway subnets. By combining pathways and complexes, a novel basis of liver cancer progression implicating nucleotide pool imbalance aggravated by mutations of key DNA repair complexes was identified. Finally, comparative evaluations suggested that pure network-based methods are vastly outperformed by feature-based network methods utilizing real complexes. This is indicative that the quality of current networks are insufficient to provide strong biological rigor for data analysis, and should be carefully evaluated before further validations.Open Acces

Spiral - Imperial College Digital Repository

Knowledge-based Biomedical Data Science 2019

Author: Callahan Tiffany J.
Hunter Lawrence E.
Pielke-Lombardo Harrison
Tripodi Ignacio J.
Publication venue
Publication date: 08/10/2019
Field of study

Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

arXiv.org e-Print Archive

Practical Approaches to Biological Network Discovery

Author: Haynes Brian
Publication venue: Washington University Open Scholarship
Publication date: 24/05/2012
Field of study

This dissertation addresses a current outstanding problem in the field of systems biology, which is to identify the structure of a transcriptional network from high-throughput experimental data. Understanding of the connectivity of a transcriptional network is an important piece of the puzzle, which relates the genotype of an organism to its phenotypes. An overwhelming number of computational approaches have been proposed to perform integrative analyses on large collections of high-throughput gene expression datasets to infer the structure of transcriptional networks. I put forth a methodology by which these tools can be evaluated and compared against one another to better understand their strengths and weaknesses. Next I undertake the task of utilizing high-throughput datasets to learn new and interesting network biology in the pathogenic fungus Cryptococcus neoformans. Finally I propose a novel computational method for mapping out transcriptional networks that unifies two orthogonal strategies for network inference. I apply this method to map out the transcriptional network of Saccharomyces cerevisiae and demonstrate how network inference results can complement chromatin immunoprecipitation: ChIP) experiments, which directly probe the binding events of transcriptional regulators. Collectively, my contributions improve both the accessibility and practicality of network inference methods

Washington University St. Louis: Open Scholarship

Development of a text mining approach to disease network discovery

Author: Lamurias Andre
Publication venue
Publication date: 01/01/2019
Field of study

Scientific literature is one of the major sources of knowledge for systems biology, in the form of papers, patents and other types of written reports. Text mining methods aim at automatically extracting relevant information from the literature. The hypothesis of this thesis was that biological systems could be elucidated by the development of text mining solutions that can automatically extract relevant information from documents. The first objective consisted in developing software components to recognize biomedical entities in text, which is the first step to generate a network about a biological system. To this end, a machine learning solution was developed, which can be trained for specific biological entities using an annotated dataset, obtaining high-quality results. Additionally, a rule-based solution was developed, which can be easily adapted to various types of entities. The second objective consisted in developing an automatic approach to link the recognized entities to a reference knowledge base. A solution based on the PageRank algorithm was developed in order to match the entities to the concepts that most contribute to the overall coherence. The third objective consisted in automatically extracting relations between entities, to generate knowledge graphs about biological systems. Due to the lack of annotated datasets available for this task, distant supervision was employed to train a relation classifier on a corpus of documents and a knowledge base. The applicability of this approach was demonstrated in two case studies: microRNAgene relations for cystic fibrosis, obtaining a network of 27 relations using the abstracts of 51 recently published papers; and cell-cytokine relations for tolerogenic cell therapies, obtaining a network of 647 relations from 3264 abstracts. Through a manual evaluation, the information contained in these networks was determined to be relevant. Additionally, a solution combining deep learning techniques with ontology information was developed, to take advantage of the domain knowledge provided by ontologies. This thesis contributed with several solutions that demonstrate the usefulness of text mining methods to systems biology by extracting domain-specific information from the literature. These solutions make it easier to integrate various areas of research, leading to a better understanding of biological systems

Universidade de Lisboa: Repositório.UL