26 research outputs found
Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation
BACKGROUND: High-throughput experiments, such as with DNA microarrays, typically result in hundreds of genes potentially relevant to the process under study, rendering the interpretation of these experiments problematic. Here, we propose and evaluate an approach to find functional associations between large numbers of genes and other biomedical concepts from free-text literature. For each gene, a profile of related concepts is constructed that summarizes the context in which the gene is mentioned in literature. We assign a weight to each concept in the profile based on a likelihood ratio measure. Gene concept profiles can then be clustered to find related genes and other concepts. RESULTS: The experimental validation was done in two steps. We first applied our method on a controlled test set. After this proved to be successful the datasets from two DNA microarray experiments were analyzed in the same way and the results were evaluated by domain experts. The first dataset was a gene-expression profile that characterizes the cancer cells of a group of acute myeloid leukemia patients. For this group of patients the biological background of the cancer cells is largely unknown. Using our methodology we found an association of these cells to monocytes, which agreed with other experimental evidence. The second data set consisted of differentially expressed genes following androgen receptor stimulation in a prostate cancer cell line. Based on the analysis we put forward a hypothesis about the biological processes induced in these studied cells: secretory lysosomes are involved in the production of prostatic fluid and their development and/or secretion are androgen-regulated processes. CONCLUSION: Our method can be used to analyze DNA microarray datasets based on information explicitly and implicitly available in the literature. We provide a publicly available tool, dubbed Anni, for this purpose
Role of Duplicate Genes in Robustness against Deleterious Human Mutations
It is now widely recognized that robustness is an inherent property of biological systems [1],[2],[3]. The contribution of close sequence homologs to genetic robustness against null mutations has been previously demonstrated in simple organisms [4],[5]. In this paper we investigate in detail the contribution of gene duplicates to back-up against deleterious human mutations. Our analysis demonstrates that the functional compensation by close homologs may play an important role in human genetic disease. Genes with a 90% sequence identity homolog are about 3 times less likely to harbor known disease mutations compared to genes with remote homologs. Moreover, close duplicates affect the phenotypic consequences of deleterious mutations by making a decrease in life expectancy significantly less likely. We also demonstrate that similarity of expression profiles across tissues significantly increases the likelihood of functional compensation by homologs
Identifying Highly Conserved and Highly Differentiated Gene Ontology Categories in Human Populations
Detecting and interpreting certain system-level characteristics associated with human population genetic differences is a challenge for human geneticists. In this study, we conducted a population genetic study using the HapMap genotype data to identify certain special Gene Ontology (GO) categories associated with high/low genetic difference among 11 Hapmap populations. Initially, the genetic differences in each gene region among these populations were measured using allele frequency, linkage disequilibrium (LD) pattern, and transferability of tagSNPs. The associations between each GO term and these genetic differences were then identified. The results showed that cellular process, catalytic activity, binding, and some of their sub-terms were associated with high levels of genetic difference, and genes involved in these functional categories displayed, on average, high genetic diversity among different populations. By contrast, multicellular organismal processes, molecular transducer activity, and some of their sub-terms were associated with low levels of genetic difference. In particular, the neurological system process under the multicellular organismal process category had low levels of genetic difference; the neurological function also showed high evolutionary conservation between species in some previous studies. These results may provide a new insight into the understanding of human evolutionary history at the system-level
Harvesting Candidate Genes Responsible for Serious Adverse Drug Reactions from a Chemical-Protein Interactome
Identifying genetic factors responsible for serious adverse drug reaction (SADR) is of critical importance to personalized medicine. However, genome-wide association studies are hampered due to the lack of case-control samples, and the selection of candidate genes is limited by the lack of understanding of the underlying mechanisms of SADRs. We hypothesize that drugs causing the same type of SADR might share a common mechanism by targeting unexpectedly the same SADR-mediating protein. Hence we propose an approach of identifying the common SADR-targets through constructing and mining an in silico chemical-protein interactome (CPI), a matrix of binding strengths among 162 drug molecules known to cause at least one type of SADR and 845 proteins. Drugs sharing the same SADR outcome were also found to possess similarities in their CPI profiles towards this 845 protein set. This methodology identified the candidate gene of sulfonamide-induced toxic epidermal necrolysis (TEN): all nine sulfonamides that cause TEN were found to bind strongly to MHC I (Cw*4), whereas none of the 17 control drugs that do not cause TEN were found to bind to it. Through an insight into the CPI, we found the Y116S substitution of MHC I (B*5703) enhances the unexpected binding of abacavir to its antigen presentation groove, which explains why B*5701, not B*5703, is the risk allele of abacavir-induced hypersensitivity. In conclusion, SADR targets and the patient-specific off-targets could be identified through a systematic investigation of the CPI, generating important hypotheses for prospective experimental validation of the candidate genes
Escherichia coli genome-wide promoter analysis: Identification of additional AtoC binding target elements
<p>Abstract</p> <p>Background</p> <p>Studies on bacterial signal transduction systems have revealed complex networks of functional interactions, where the response regulators play a pivotal role. The AtoSC system of <it>E. coli </it>activates the expression of <it>atoDAEB </it>operon genes, and the subsequent catabolism of short-chain fatty acids, upon acetoacetate induction. Transcriptome and phenotypic analyses suggested that <it>atoSC </it>is also involved in several other cellular activities, although we have recently reported a palindromic repeat within the <it>atoDAEB </it>promoter as the single, <it>cis</it>-regulatory binding site of the AtoC response regulator. In this work, we used a computational approach to explore the presence of yet unidentified AtoC binding sites within other parts of the <it>E. coli </it>genome.</p> <p>Results</p> <p>Through the implementation of a computational <it>de novo </it>motif detection workflow, a set of candidate motifs was generated, representing putative AtoC binding targets within the <it>E. coli </it>genome. In order to assess the biological relevance of the motifs and to select for experimental validation of those sequences related robustly with distinct cellular functions, we implemented a novel approach that applies Gene Ontology Term Analysis to the motif hits and selected those that were qualified through this procedure. The computational results were validated using Chromatin Immunoprecipitation assays to assess the <it>in vivo </it>binding of AtoC to the predicted sites. This process verified twenty-two additional AtoC binding sites, located not only within intergenic regions, but also within gene-encoding sequences.</p> <p>Conclusions</p> <p>This study, by tracing a number of putative AtoC binding sites, has indicated an AtoC-related cross-regulatory function. This highlights the significance of computational genome-wide approaches in elucidating complex patterns of bacterial cell regulation.</p
Comparative Genomics of the Apicomplexan Parasites Toxoplasma gondii and Neospora caninum: Coccidia Differing in Host Range and Transmission Strategy
Toxoplasma gondii is a zoonotic protozoan parasite which infects nearly one third of the human population and is found in an extraordinary range of vertebrate hosts. Its epidemiology depends heavily on horizontal transmission, especially between rodents and its definitive host, the cat. Neospora caninum is a recently discovered close relative of Toxoplasma, whose definitive host is the dog. Both species are tissue-dwelling Coccidia and members of the phylum Apicomplexa; they share many common features, but Neospora neither infects humans nor shares the same wide host range as Toxoplasma, rather it shows a striking preference for highly efficient vertical transmission in cattle. These species therefore provide a remarkable opportunity to investigate mechanisms of host restriction, transmission strategies, virulence and zoonotic potential. We sequenced the genome of N. caninum and transcriptomes of the invasive stage of both species, undertaking an extensive comparative genomics and transcriptomics analysis. We estimate that these organisms diverged from their common ancestor around 28 million years ago and find that both genomes and gene expression are remarkably conserved. However, in N. caninum we identified an unexpected expansion of surface antigen gene families and the divergence of secreted virulence factors, including rhoptry kinases. Specifically we show that the rhoptry kinase ROP18 is pseudogenised in N. caninum and that, as a possible consequence, Neospora is unable to phosphorylate host immunity-related GTPases, as Toxoplasma does. This defense strategy is thought to be key to virulence in Toxoplasma. We conclude that the ecological niches occupied by these species are influenced by a relatively small number of gene products which operate at the host-parasite interface and that the dominance of vertical transmission in N. caninum may be associated with the evolution of reduced virulence in this species
Rate and duration of hospitalisation for acute pulmonary embolism in the real-world clinical practice of different countries : Analysis from the RIETE registry
publishersversionPeer reviewe
Evaluating Computational Gene Ontology Annotations
Two avenues to understanding gene function are complementary and often overlapping: experimental work and computational prediction. While experimental annotation generally produces high-quality annotations, it is low throughput. Conversely, computational annotations have broad coverage, but the quality of annotations may be variable, and therefore evaluating the quality of computational annotations is a critical concern.
In this chapter, we provide an overview of strategies to evaluate the quality of computational annotations. First, we discuss why evaluating quality in this setting is not trivial. We highlight the various issues that threaten to bias the evaluation of computational annotations, most of which stem from the incompleteness of biological databases. Second, we discuss solutions that address these issues, for example, targeted selection of new experimental annotations and leveraging the existing experimental annotations.ISSN:1064-3745ISSN:1940-602