58 research outputs found
The Cure: Making a game of gene selection for breast cancer survival prediction
Motivation: Molecular signatures for predicting breast cancer prognosis could
greatly improve care through personalization of treatment. Computational
analyses of genome-wide expression datasets have identified such signatures,
but these signatures leave much to be desired in terms of accuracy,
reproducibility and biological interpretability. Methods that take advantage of
structured prior knowledge (e.g. protein interaction networks) show promise in
helping to define better signatures but most knowledge remains unstructured.
Crowdsourcing via scientific discovery games is an emerging methodology that
has the potential to tap into human intelligence at scales and in modes
previously unheard of. Here, we developed and evaluated a game called The Cure
on the task of gene selection for breast cancer survival prediction. Our
central hypothesis was that knowledge linking expression patterns of specific
genes to breast cancer outcomes could be captured from game players. We
envisioned capturing knowledge both from the players prior experience and from
their ability to interpret text related to candidate genes presented to them in
the context of the game.
Results: Between its launch in Sept. 2012 and Sept. 2013, The Cure attracted
more than 1,000 registered players who collectively played nearly 10,000 games.
Gene sets assembled through aggregation of the collected data clearly
demonstrated the accumulation of relevant expert knowledge. In terms of
predictive accuracy, these gene sets provided comparable performance to gene
sets generated using other methods including those used in commercial tests.
The Cure is available at http://genegames.org/cure
Linking genes to diseases with a SNPedia-Gene Wiki mashup
<p>Abstract</p> <p>Background</p> <p>A variety of topic-focused wikis are used in the biomedical sciences to enable the mass-collaborative synthesis and distribution of diverse bodies of knowledge. To address complex problems such as defining the relationships between genes and disease, it is important to bring the knowledge from many different domains together. Here we show how advances in wiki technology and natural language processing can be used to automatically assemble âmeta-wikisâ that present integrated views over the data collaboratively created in multiple source wikis.</p> <p>Results</p> <p>We produced a semantic meta-wiki called the Gene Wiki+ that automatically mirrors and integrates data from the Gene Wiki and SNPedia. The Gene Wiki+, available at (<url>http://genewikiplus.org/</url>), captures 8,047 distinct gene-disease relationships. SNPedia accounts for 4,149 of the gene-disease pairs, the Gene Wiki provides 4,377 and only 479 appear independently in both sources. All of this content is available to query and browse and is provided as linked open data.</p> <p>Conclusions</p> <p>Wikis contain increasing amounts of diverse, biological information useful for elucidating the connections between genes and disease. The Gene Wiki+ shows how wiki technology can be used in concert with natural language processing to provide integrated views over diverse underlying data sources.</p
Recommended from our members
Quantitating the epigenetic transformation contributing to cholesterol homeostasis using Gaussian process.
To understand the impact of epigenetics on human misfolding disease, we apply Gaussian-process regression (GPR) based machine learning (ML) (GPR-ML) through variation spatial profiling (VSP). VSP generates population-based matrices describing the spatial covariance (SCV) relationships that link genetic diversity to fitness of the individual in response to histone deacetylases inhibitors (HDACi). Niemann-Pick C1 (NPC1) is a Mendelian disorder caused by >300 variants in the NPC1 gene that disrupt cholesterol homeostasis leading to the rapid onset and progression of neurodegenerative disease. We determine the sequence-to-function-to-structure relationships of the NPC1 polypeptide fold required for membrane trafficking and generation of a tunnel that mediates cholesterol flux in late endosomal/lysosomal (LE/Ly) compartments. HDACi treatment reveals unanticipated epigenomic plasticity in SCV relationships that restore NPC1 functionality. GPR-ML based matrices capture the epigenetic processes impacting information flow through central dogma, providing a framework for quantifying the effect of the environment on the healthspan of the individual
Epigenetic Enhancer Marks and Transcription Factor Binding Influence VÎș Gene Rearrangement in Pre-B Cells and Pro-B Cells
To date there has not been a study directly comparing relative IgÎș rearrangement frequencies obtained from genomic DNA (gDNA) and cDNA and since each approach has potential biases, this is an important issue to clarify. Here we used deep sequencing to compare the unbiased gDNA and RNA IgÎș repertoire from the same pre-B cell pool. We find that ~20% of VÎș genes have rearrangement frequencies â„2-fold up or down in RNA vs. DNA libraries, including many members of the VÎș3, VÎș4, and VÎș6 families. Regression analysis indicates Ikaros and E2A binding are associated with strong promoters. Within the pre-B cell repertoire, we observed that individual VÎș genes rearranged at very different frequencies, and also displayed very different JÎș usage. Regression analysis revealed that the greatly unequal VÎș gene rearrangement frequencies are best predicted by epigenetic marks of enhancers. In particular, the levels of newly arising H3K4me1 peaks associated with many VÎș genes in pre-B cells are most predictive of rearrangement levels. Since H3K4me1 is associated with long range chromatin interactions which are created during locus contraction, our data provides mechanistic insight into unequal rearrangement levels. Comparison of IgÎș rearrangements occurring in pro-B cells and pre-B cells from the same mice reveal a pro-B cell bias toward usage of JÎș-distal VÎș genes, particularly VÎș10-96 and VÎș1-135. Regression analysis indicates that PU.1 binding is the highest predictor of VÎș gene rearrangement frequency in pro-B cells. Lastly, the repertoires of iEÎșâ/â pre-B cells reveal that iEÎș actively influences VÎș gene usage, particularly VÎș3 family genes, overlapping with a zone of iEÎș-regulated germline transcription. These represent new roles for iEÎș in addition to its critical function in promoting overall IgÎș rearrangement. Together, this study provides insight into many aspects of IgÎș repertoire formation
Integrative Analysis of Low- and High-Resolution eQTL
The study of expression quantitative trait loci (eQTL) is a powerful way of detecting transcriptional regulators at a genomic scale and for elucidating how natural genetic variation impacts gene expression. Power and genetic resolution are heavily affected by the study population: whereas recombinant inbred (RI) strains yield greater statistical power with low genetic resolution, using diverse inbred or outbred strains improves genetic resolution at the cost of lower power. In order to overcome the limitations of both individual approaches, we combine data from RI strains with genetically more diverse strains and analyze hippocampus eQTL data obtained from mouse RI strains (BXD) and from a panel of diverse inbred strains (Mouse Diversity Panel, MDP). We perform a systematic analysis of the consistency of eQTL independently obtained from these two populations and demonstrate that a significant fraction of eQTL can be replicated. Based on existing knowledge from pathway databases we assess different approaches for using the high-resolution MDP data for fine mapping BXD eQTL. Finally, we apply this framework to an eQTL hotspot on chromosome 1 (Qrr1), which has been implicated in a range of neurological traits. Here we present the first systematic examination of the consistency between eQTL obtained independently from the BXD and MDP populations. Our analysis of fine-mapping approaches is based on âreal lifeâ data as opposed to simulated data and it allows us to propose a strategy for using MDP data to fine map BXD eQTL. Application of this framework to Qrr1 reveals that this eQTL hotspot is not caused by just one (or few) âmaster regulatorsâ, but actually by a set of polymorphic genes specific to the central nervous system
Reductionist and Integrative approaches to explore the H.pylori genome
The reductionist approach of decomposing biological systems into their constituent parts has dominated molecular biology for half a century. Since organisms are composed solely of atoms and molecules without the participation of extraneous forces, it has been assumed that it should be possible to explain biological systems on the basis of the physico-chemical properties of their individual components, down to the atomic level. However, despite the remarkable success of methodological reductionism in analyzing individual cellular components, it is now generally accepted that the behavior of complex biological systems cannot be understood by studying their individual parts in isolation. To tackle the complexity inherent in understanding large networks of interacting biomolecules, the integrative viewpoint emphasizes cybernetic and systems theoretical methods, using a combination of mathematics, computation and empirical observation. Such an approach is beginning to become feasible in prokaryotes, combining an almost complete view of the genome and transcriptome with a reasonably extensive picture of the proteome.
Pathogenic bacteria are undoubtedly the most investigated subjects among prokaryotes. A paradigmatic example is the the human pathogen H.pylori, a causative agent of severe gastroduodenal disorders that infects almost half of the world population.
In this thesis, we investigated various aspects of Helicobacter pylori molecular physiology using both reductionist and integrative approaches.
In Section I, we have employed a reductionist, bottom-up perspective in studying the Cysteine oxidised/reduced state and the disulphide bridge pattern of an unusual GroES homolog expressed by H.pylori, Heat Shock protein A (HspA). This protein possesses a high Cys content, is involved in nickel binding and exhibits an extended subcellular localization, ranging from cytoplasm to cell surface. We have produced and characterized a recombinant HspA and mutants Cys94Ala and C94A/C111A. The disulphide bridge pattern has been assigned by integrating biochemical methodologies with mass spectrometry. All Cys are engaged in disulphide bonds that force the C-term domain to assume a peculiar closed loop structure, prone to host nickel ions. This novel Ni binding structural arrangement can be related to the Ni uptake/delivery to the extracellular urease, essential for the bacterium survival.
In Section II, we combined different computational methods with two main goals:
1) Analyze the H.pylori biomolecular interaction network in an attempt to select new molecular targets against H.pylori infection (Chapters 4 & 5);
2) Model and simulate the signaling perturbations induced by invading H.pylori proteins in the host ephitelial cells (Chapter 6).
Chapter 4 explores the 'robust yet fragile' feature of the H.pylori cell, viewed as a complex system in which robustness in response to certain perturbation is inevitably associated with fragility in response to other perturbations. With this in mind, we developed a general strategy aimed at identify control points in bacterial metabolic networks, which could be targets for novel drugs. The methodology is implemented on Helicobacter pylori 26695.
The entire metabolic network of the pathogen is analyzed to find biochemically critical points, e.g. enzymes which uniquely consume and/or produce a certain metabolite. Once identified, the list of critical enzymes is filtered in order to find candidate targets wich are non-homologous with the human enzymes. Finally, the essentiality of the identified targets is cross-validated by in silico deletion studies using flux-balance analysis (FBA) on a recent genome-scale metabolic model of H. pylori. Following this approach, we identified some enzymes which could be interesting targets for inhibition studies of H.pylori infection.
The study reported in Chapter 5 extends the previously described approach in light of recent theoretical studies on biological networks. These studies suggested that multiple weak attacks on selected targets are inevitably more efficient than the knockout of a single target, thus providing a conceptual framework for the recent success of multi-target drugs. We used this concept to exploit H.pylori metabolic robustness through multiple weak attacks on selected enzymes, therefore directing us toward target-sets discovery for combinatorial therapies.
We used the known metabolic and protein interaction data to build an integrated biomolecular network of the pathogen. The network was subsequently screened to find central elements of network communication, e.g. hubs, bridges with high betweenness centrality and overlaps of network communities. The selected enzymes were then classified on the basis of available data about cellular function and essentiality in an attempt to predict successful target-combinations. In order to evaluate the network effect triggered by the partial inactivation of candidate targets, robustness analysis was performed on small groups of selected enzymes using flux balance analysis (FBA) on a recent genome-scale metabolic model of H.pylori. In particular, the FBA simulation framework allowed to predict the growth phenotype associated to every partial inactivation set.
The preliminary results obtained so far may help to restrict the initial target-pool in search of target-sets for novel combinatorial drugs against H.pylori persistence. However, our long-term goal is to better understand the indirect network effects that lie at the heart of multi-target drug action and, ultimately, how multiple weak hits can perturb complex biological systems.
H.pylori produces various a cytotoxic protein, CagA, that interfere with a very important host signaling pathway, i.e. the epidermal growth factor receptor (EGFR) signaling network. EGFR signaling is one of the most extensively studied areas of signal transduction, since it regulates growth, survival, proliferation and differentiation in mammalian cells. In Chapter 6, we attempted to build an executable model of the EGFR-signaling core process using a process algebra approach. In the EGFR network, the core process is the heart of its underlying hour-glass architecture, as it plays a central role in downstream signaling cascades to gene expression through activation of multiple transcription factors. It consists in a dense array of molecules and interactions wich are tightly coupled to each other.
In order to build the executable model, a small set of EGFR core molecules and their interactions is tentatively translated in a BetaWB model. BetaWB is a framework for modelling and simulating biological processes based on Beta-binders language and its stochastic extension.
Once obtained, the computational model of the EGFR core process can be used to test and compare hypotheses regarding the principles of operation of the signaling network, i.e. how the EGFR network generates different responses for each set of combinatorial stimuli. In particular, probabilistic model checking can be used to explore the states and possible state changes of the computational model, whereas stochastic simulation (corresponding to the execution of the BetaWB model) may give quantitative insights into the dynamic behaviour of the system in response to different stimuli. Information from the above tecniques allows model validation through comparison within the experimental data available in the literature.
The inherent compositionality of the process algebra modeling approach enables further expansion of the EGFR core model, as well as the study of its behavior under specific perturbations, such as invading H.pylori proteins. This latter aspect might be of great value for H.pylori pathogenesis research, as signaling through the EGF receptors is intricately involved in gastric cancer and in many other gastroduodenal diseases
Dizeez: an online game for human gene-disease annotation.
Structured gene annotations are a foundation upon which many bioinformatics and statistical analyses are built. However the structured annotations available in public databases are a sparse representation of biological knowledge as a whole. The rate of biomedical data generation is such that centralized biocuration efforts struggle to keep up. New models for gene annotation need to be explored that expand the pace at which we are able to structure biomedical knowledge. Recently, online games have emerged as an effective way to recruit, engage and organize large numbers of volunteers to help address difficult biological challenges. For example, games have been successfully developed for protein folding (Foldit), multiple sequence alignment (Phylo) and RNA structure design (EteRNA). Here we present Dizeez, a simple online game built with the purpose of structuring knowledge of gene-disease associations. Preliminary results from game play online and at scientific conferences suggest that Dizeez is producing valid gene-disease annotations not yet present in any public database. These early results provide a basic proof of principle that online games can be successfully applied to the challenge of gene annotation. Dizeez is available at http://genegames.org
Recommended from our members
A Proteomic Variant Approach (ProVarA) for Personalized Medicine of Inherited and Somatic Disease.
The advent of precision medicine for genetic diseases has been hampered by the large number of variants that cause familial and somatic disease, a complexity that is further confounded by the impact of genetic modifiers. To begin to understand differences in onset, progression and therapeutic response that exist among disease-causing variants, we present the proteomic variant approach (ProVarA), a proteomic method that integrates mass spectrometry with genomic tools to dissect the etiology of disease. To illustrate its value, we examined the impact of variation in cystic fibrosis (CF), where 2025 disease-associated mutations in the CF transmembrane conductance regulator (CFTR) gene have been annotated and where individual genotypes exhibit phenotypic heterogeneity and response to therapeutic intervention. A comparative analysis of variant-specific proteomics allows us to identify a number of protein interactions contributing to the basic defects associated with F508del- and G551D-CFTR, two of the most common disease-associated variants in the patient population. We demonstrate that a number of these causal interactions are significantly altered in response to treatment with Vx809 and Vx770, small-molecule therapeutics that respectively target the F508del and G551D variants. ProVarA represents the first comparative proteomic analysis among multiple disease-causing mutations, thereby providing a methodological approach that provides a significant advancement to existing proteomic efforts in understanding the impact of variation in CF disease. We posit that the implementation of ProVarA for any familial or somatic mutation will provide a substantial increase in the knowledge base needed to implement a precision medicine-based approach for clinical management of disease
- âŠ