58 research outputs found

    The Cure: Making a game of gene selection for breast cancer survival prediction

    Get PDF
    Motivation: Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility and biological interpretability. Methods that take advantage of structured prior knowledge (e.g. protein interaction networks) show promise in helping to define better signatures but most knowledge remains unstructured. Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes previously unheard of. Here, we developed and evaluated a game called The Cure on the task of gene selection for breast cancer survival prediction. Our central hypothesis was that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from game players. We envisioned capturing knowledge both from the players prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game. Results: Between its launch in Sept. 2012 and Sept. 2013, The Cure attracted more than 1,000 registered players who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data clearly demonstrated the accumulation of relevant expert knowledge. In terms of predictive accuracy, these gene sets provided comparable performance to gene sets generated using other methods including those used in commercial tests. The Cure is available at http://genegames.org/cure

    Linking genes to diseases with a SNPedia-Gene Wiki mashup

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A variety of topic-focused wikis are used in the biomedical sciences to enable the mass-collaborative synthesis and distribution of diverse bodies of knowledge. To address complex problems such as defining the relationships between genes and disease, it is important to bring the knowledge from many different domains together. Here we show how advances in wiki technology and natural language processing can be used to automatically assemble ‘meta-wikis’ that present integrated views over the data collaboratively created in multiple source wikis.</p> <p>Results</p> <p>We produced a semantic meta-wiki called the Gene Wiki+ that automatically mirrors and integrates data from the Gene Wiki and SNPedia. The Gene Wiki+, available at (<url>http://genewikiplus.org/</url>), captures 8,047 distinct gene-disease relationships. SNPedia accounts for 4,149 of the gene-disease pairs, the Gene Wiki provides 4,377 and only 479 appear independently in both sources. All of this content is available to query and browse and is provided as linked open data.</p> <p>Conclusions</p> <p>Wikis contain increasing amounts of diverse, biological information useful for elucidating the connections between genes and disease. The Gene Wiki+ shows how wiki technology can be used in concert with natural language processing to provide integrated views over diverse underlying data sources.</p

    Epigenetic Enhancer Marks and Transcription Factor Binding Influence VÎș Gene Rearrangement in Pre-B Cells and Pro-B Cells

    Get PDF
    To date there has not been a study directly comparing relative IgÎș rearrangement frequencies obtained from genomic DNA (gDNA) and cDNA and since each approach has potential biases, this is an important issue to clarify. Here we used deep sequencing to compare the unbiased gDNA and RNA IgÎș repertoire from the same pre-B cell pool. We find that ~20% of VÎș genes have rearrangement frequencies ≄2-fold up or down in RNA vs. DNA libraries, including many members of the VÎș3, VÎș4, and VÎș6 families. Regression analysis indicates Ikaros and E2A binding are associated with strong promoters. Within the pre-B cell repertoire, we observed that individual VÎș genes rearranged at very different frequencies, and also displayed very different JÎș usage. Regression analysis revealed that the greatly unequal VÎș gene rearrangement frequencies are best predicted by epigenetic marks of enhancers. In particular, the levels of newly arising H3K4me1 peaks associated with many VÎș genes in pre-B cells are most predictive of rearrangement levels. Since H3K4me1 is associated with long range chromatin interactions which are created during locus contraction, our data provides mechanistic insight into unequal rearrangement levels. Comparison of IgÎș rearrangements occurring in pro-B cells and pre-B cells from the same mice reveal a pro-B cell bias toward usage of JÎș-distal VÎș genes, particularly VÎș10-96 and VÎș1-135. Regression analysis indicates that PU.1 binding is the highest predictor of VÎș gene rearrangement frequency in pro-B cells. Lastly, the repertoires of iEÎș−/− pre-B cells reveal that iEÎș actively influences VÎș gene usage, particularly VÎș3 family genes, overlapping with a zone of iEÎș-regulated germline transcription. These represent new roles for iEÎș in addition to its critical function in promoting overall IgÎș rearrangement. Together, this study provides insight into many aspects of IgÎș repertoire formation

    Integrative Analysis of Low- and High-Resolution eQTL

    Get PDF
    The study of expression quantitative trait loci (eQTL) is a powerful way of detecting transcriptional regulators at a genomic scale and for elucidating how natural genetic variation impacts gene expression. Power and genetic resolution are heavily affected by the study population: whereas recombinant inbred (RI) strains yield greater statistical power with low genetic resolution, using diverse inbred or outbred strains improves genetic resolution at the cost of lower power. In order to overcome the limitations of both individual approaches, we combine data from RI strains with genetically more diverse strains and analyze hippocampus eQTL data obtained from mouse RI strains (BXD) and from a panel of diverse inbred strains (Mouse Diversity Panel, MDP). We perform a systematic analysis of the consistency of eQTL independently obtained from these two populations and demonstrate that a significant fraction of eQTL can be replicated. Based on existing knowledge from pathway databases we assess different approaches for using the high-resolution MDP data for fine mapping BXD eQTL. Finally, we apply this framework to an eQTL hotspot on chromosome 1 (Qrr1), which has been implicated in a range of neurological traits. Here we present the first systematic examination of the consistency between eQTL obtained independently from the BXD and MDP populations. Our analysis of fine-mapping approaches is based on ‘real life’ data as opposed to simulated data and it allows us to propose a strategy for using MDP data to fine map BXD eQTL. Application of this framework to Qrr1 reveals that this eQTL hotspot is not caused by just one (or few) ‘master regulators’, but actually by a set of polymorphic genes specific to the central nervous system

    Reductionist and Integrative approaches to explore the H.pylori genome

    Get PDF
    The reductionist approach of decomposing biological systems into their constituent parts has dominated molecular biology for half a century. Since organisms are composed solely of atoms and molecules without the participation of extraneous forces, it has been assumed that it should be possible to explain biological systems on the basis of the physico-chemical properties of their individual components, down to the atomic level. However, despite the remarkable success of methodological reductionism in analyzing individual cellular components, it is now generally accepted that the behavior of complex biological systems cannot be understood by studying their individual parts in isolation. To tackle the complexity inherent in understanding large networks of interacting biomolecules, the integrative viewpoint emphasizes cybernetic and systems theoretical methods, using a combination of mathematics, computation and empirical observation. Such an approach is beginning to become feasible in prokaryotes, combining an almost complete view of the genome and transcriptome with a reasonably extensive picture of the proteome. Pathogenic bacteria are undoubtedly the most investigated subjects among prokaryotes. A paradigmatic example is the the human pathogen H.pylori, a causative agent of severe gastroduodenal disorders that infects almost half of the world population. In this thesis, we investigated various aspects of Helicobacter pylori molecular physiology using both reductionist and integrative approaches. In Section I, we have employed a reductionist, bottom-up perspective in studying the Cysteine oxidised/reduced state and the disulphide bridge pattern of an unusual GroES homolog expressed by H.pylori, Heat Shock protein A (HspA). This protein possesses a high Cys content, is involved in nickel binding and exhibits an extended subcellular localization, ranging from cytoplasm to cell surface. We have produced and characterized a recombinant HspA and mutants Cys94Ala and C94A/C111A. The disulphide bridge pattern has been assigned by integrating biochemical methodologies with mass spectrometry. All Cys are engaged in disulphide bonds that force the C-term domain to assume a peculiar closed loop structure, prone to host nickel ions. This novel Ni binding structural arrangement can be related to the Ni uptake/delivery to the extracellular urease, essential for the bacterium survival. In Section II, we combined different computational methods with two main goals: 1) Analyze the H.pylori biomolecular interaction network in an attempt to select new molecular targets against H.pylori infection (Chapters 4 & 5); 2) Model and simulate the signaling perturbations induced by invading H.pylori proteins in the host ephitelial cells (Chapter 6). Chapter 4 explores the 'robust yet fragile' feature of the H.pylori cell, viewed as a complex system in which robustness in response to certain perturbation is inevitably associated with fragility in response to other perturbations. With this in mind, we developed a general strategy aimed at identify control points in bacterial metabolic networks, which could be targets for novel drugs. The methodology is implemented on Helicobacter pylori 26695. The entire metabolic network of the pathogen is analyzed to find biochemically critical points, e.g. enzymes which uniquely consume and/or produce a certain metabolite. Once identified, the list of critical enzymes is filtered in order to find candidate targets wich are non-homologous with the human enzymes. Finally, the essentiality of the identified targets is cross-validated by in silico deletion studies using flux-balance analysis (FBA) on a recent genome-scale metabolic model of H. pylori. Following this approach, we identified some enzymes which could be interesting targets for inhibition studies of H.pylori infection. The study reported in Chapter 5 extends the previously described approach in light of recent theoretical studies on biological networks. These studies suggested that multiple weak attacks on selected targets are inevitably more efficient than the knockout of a single target, thus providing a conceptual framework for the recent success of multi-target drugs. We used this concept to exploit H.pylori metabolic robustness through multiple weak attacks on selected enzymes, therefore directing us toward target-sets discovery for combinatorial therapies. We used the known metabolic and protein interaction data to build an integrated biomolecular network of the pathogen. The network was subsequently screened to find central elements of network communication, e.g. hubs, bridges with high betweenness centrality and overlaps of network communities. The selected enzymes were then classified on the basis of available data about cellular function and essentiality in an attempt to predict successful target-combinations. In order to evaluate the network effect triggered by the partial inactivation of candidate targets, robustness analysis was performed on small groups of selected enzymes using flux balance analysis (FBA) on a recent genome-scale metabolic model of H.pylori. In particular, the FBA simulation framework allowed to predict the growth phenotype associated to every partial inactivation set. The preliminary results obtained so far may help to restrict the initial target-pool in search of target-sets for novel combinatorial drugs against H.pylori persistence. However, our long-term goal is to better understand the indirect network effects that lie at the heart of multi-target drug action and, ultimately, how multiple weak hits can perturb complex biological systems. H.pylori produces various a cytotoxic protein, CagA, that interfere with a very important host signaling pathway, i.e. the epidermal growth factor receptor (EGFR) signaling network. EGFR signaling is one of the most extensively studied areas of signal transduction, since it regulates growth, survival, proliferation and differentiation in mammalian cells. In Chapter 6, we attempted to build an executable model of the EGFR-signaling core process using a process algebra approach. In the EGFR network, the core process is the heart of its underlying hour-glass architecture, as it plays a central role in downstream signaling cascades to gene expression through activation of multiple transcription factors. It consists in a dense array of molecules and interactions wich are tightly coupled to each other. In order to build the executable model, a small set of EGFR core molecules and their interactions is tentatively translated in a BetaWB model. BetaWB is a framework for modelling and simulating biological processes based on Beta-binders language and its stochastic extension. Once obtained, the computational model of the EGFR core process can be used to test and compare hypotheses regarding the principles of operation of the signaling network, i.e. how the EGFR network generates different responses for each set of combinatorial stimuli. In particular, probabilistic model checking can be used to explore the states and possible state changes of the computational model, whereas stochastic simulation (corresponding to the execution of the BetaWB model) may give quantitative insights into the dynamic behaviour of the system in response to different stimuli. Information from the above tecniques allows model validation through comparison within the experimental data available in the literature. The inherent compositionality of the process algebra modeling approach enables further expansion of the EGFR core model, as well as the study of its behavior under specific perturbations, such as invading H.pylori proteins. This latter aspect might be of great value for H.pylori pathogenesis research, as signaling through the EGF receptors is intricately involved in gastric cancer and in many other gastroduodenal diseases

    Dizeez: an online game for human gene-disease annotation.

    Get PDF
    Structured gene annotations are a foundation upon which many bioinformatics and statistical analyses are built. However the structured annotations available in public databases are a sparse representation of biological knowledge as a whole. The rate of biomedical data generation is such that centralized biocuration efforts struggle to keep up. New models for gene annotation need to be explored that expand the pace at which we are able to structure biomedical knowledge. Recently, online games have emerged as an effective way to recruit, engage and organize large numbers of volunteers to help address difficult biological challenges. For example, games have been successfully developed for protein folding (Foldit), multiple sequence alignment (Phylo) and RNA structure design (EteRNA). Here we present Dizeez, a simple online game built with the purpose of structuring knowledge of gene-disease associations. Preliminary results from game play online and at scientific conferences suggest that Dizeez is producing valid gene-disease annotations not yet present in any public database. These early results provide a basic proof of principle that online games can be successfully applied to the challenge of gene annotation. Dizeez is available at http://genegames.org
    • 

    corecore