146 research outputs found

    Evolutionary algorithms for the selection of single nucleotide polymorphisms

    Get PDF
    BACKGROUND: Large databases of single nucleotide polymorphisms (SNPs) are available for use in genomics studies. Typically, investigators must choose a subset of SNPs from these databases to employ in their studies. The choice of subset is influenced by many factors, including estimated or known reliability of the SNP, biochemical factors, intellectual property, cost, and effectiveness of the subset for mapping genes or identifying disease loci. We present an evolutionary algorithm for multiobjective SNP selection. RESULTS: We implemented a modified version of the Strength-Pareto Evolutionary Algorithm (SPEA2) in Java. Our implementation, Multiobjective Analyzer for Genetic Marker Acquisition (MAGMA), approximates the set of optimal trade-off solutions for large problems in minutes. This set is very useful for the design of large studies, including those oriented towards disease identification, genetic mapping, population studies, and haplotype-block elucidation. CONCLUSION: Evolutionary algorithms are particularly suited for optimization problems that involve multiple objectives and a complex search space on which exact methods such as exhaustive enumeration cannot be applied. They provide flexibility with respect to the problem formulation if a problem description evolves or changes. Results are produced as a trade-off front, allowing the user to make informed decisions when prioritizing factors. MAGMA is open source and available at . Evolutionary algorithms are well suited for many other applications in genomics

    A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions

    Get PDF
    The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.

    High Functional Diversity in Mycobacterium tuberculosis Driven by Genetic Drift and Human Demography

    Get PDF
    Mycobacterium tuberculosis infects one third of the human world population and kills someone every 15 seconds. For more than a century, scientists and clinicians have been distinguishing between the human- and animal-adapted members of the M. tuberculosis complex (MTBC). However, all human-adapted strains of MTBC have traditionally been considered to be essentially identical. We surveyed sequence diversity within a global collection of strains belonging to MTBC using seven megabase pairs of DNA sequence data. We show that the members of MTBC affecting humans are more genetically diverse than generally assumed, and that this diversity can be linked to human demographic and migratory events. We further demonstrate that these organisms are under extremely reduced purifying selection and that, as a result of increased genetic drift, much of this genetic diversity is likely to have functional consequences. Our findings suggest that the current increases in human population, urbanization, and global travel, combined with the population genetic characteristics of M. tuberculosis described here, could contribute to the emergence and spread of drug-resistant tuberculosis

    The Innate Immune Database (IIDB)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>As part of a National Institute of Allergy and Infectious Diseases funded collaborative project, we have performed over 150 microarray experiments measuring the response of C57/BL6 mouse bone marrow macrophages to toll-like receptor stimuli. These microarray expression profiles are available freely from our project web site <url>http://www.innateImmunity-systemsbiology.org</url>. Here, we report the development of a database of computationally predicted transcription factor binding sites and related genomic features for a set of over 2000 murine immune genes of interest. Our database, which includes microarray co-expression clusters and a host of web-based query, analysis and visualization facilities, is available freely via the internet. It provides a broad resource to the research community, and a stepping stone towards the delineation of the network of transcriptional regulatory interactions underlying the integrated response of macrophages to pathogens.</p> <p>Description</p> <p>We constructed a database indexed on genes and annotations of the immediate surrounding genomic regions. To facilitate both gene-specific and systems biology oriented research, our database provides the means to analyze individual genes or an entire genomic locus. Although our focus to-date has been on mammalian toll-like receptor signaling pathways, our database structure is not limited to this subject, and is intended to be broadly applicable to immunology. By focusing on selected immune-active genes, we were able to perform computationally intensive expression and sequence analyses that would currently be prohibitive if applied to the entire genome. Using six complementary computational algorithms and methodologies, we identified transcription factor binding sites based on the Position Weight Matrices available in TRANSFAC. For one example transcription factor (ATF3) for which experimental data is available, over 50% of our predicted binding sites coincide with genome-wide chromatin immnuopreciptation (ChIP-chip) results. Our database can be interrogated via a web interface. Genomic annotations and binding site predictions can be automatically viewed with a customized version of the Argo genome browser.</p> <p>Conclusion</p> <p>We present the Innate Immune Database (IIDB) as a community resource for immunologists interested in gene regulatory systems underlying innate responses to pathogens. The database website can be freely accessed at <url>http://db.systemsbiology.net/IIDB</url>.</p

    Population-specific genetic modification of Huntington\u27s disease in Venezuela.

    Get PDF
    Modifiers of Mendelian disorders can provide insights into disease mechanisms and guide therapeutic strategies. A recent genome-wide association (GWA) study discovered genetic modifiers of Huntington\u27s disease (HD) onset in Europeans. Here, we performed whole genome sequencing and GWA analysis of a Venezuelan HD cluster whose families were crucial for the original mapping of the HD gene defect. The Venezuelan HD subjects develop motor symptoms earlier than their European counterparts, implying the potential for population-specific modifiers. The main Venezuelan HD family inherits HTT haplotype hap.03, which differs subtly at the sequence level from European HD hap.03, suggesting a different ancestral origin but not explaining the earlier age at onset in these Venezuelans. GWA analysis of the Venezuelan HD cluster suggests both population-specific and population-shared genetic modifiers. Genome-wide significant signals at 7p21.2-21.1 and suggestive association signals at 4p14 and 17q21.2 are evident only in Venezuelan HD, but genome-wide significant association signals at the established European chromosome 15 modifier locus are improved when Venezuelan HD data are included in the meta-analysis. Venezuelan-specific association signals on chromosome 7 center on SOSTDC1, which encodes a bone morphogenetic protein antagonist. The corresponding SNPs are associated with reduced expression of SOSTDC1 in non-Venezuelan tissue samples, suggesting that interaction of reduced SOSTDC1 expression with a population-specific genetic or environmental factor may be responsible for modification of HD onset in Venezuela. Detection of population-specific modification in Venezuelan HD supports the value of distinct disease populations in revealing novel aspects of a disease and population-relevant therapeutic strategies

    Rare variants implicate NMDA receptor signaling and cerebellar gene networks in risk for bipolar disorder

    Get PDF
    Bipolar disorder is an often-severe mental health condition characterized by alternation between extreme mood states of mania and depression. Despite strong heritability and the recent identification of 64 common variant risk loci of small effect, pathophysiological mechanisms remain unknown. Here, we analyzed genome sequences from 41 multiply-affected pedigrees and identified variants in 741 genes with nominally significant linkage or association with bipolar disorder. These 741 genes overlapped known risk genes for neurodevelopmental disorders and clustered within gene networks enriched for synaptic and nuclear functions. The top variant in this analysis - prioritized by statistical association, predicted deleteriousness, and network centrality - was a missense variant in the gene encoding D-amino acid oxidase (DAOG131V). Heterologous expression of DAOG131V in human cells resulted in decreased DAO protein abundance and enzymatic activity. In a knock-in mouse model of DAOG131, DaoG130V/+, we similarly found decreased DAO protein abundance in hindbrain regions, as well as enhanced stress susceptibility and blunted behavioral responses to pharmacological inhibition of N-methyl-D-aspartate receptors (NMDARs). RNA sequencing of cerebellar tissue revealed that DaoG130V resulted in decreased expression of two gene networks that are enriched for synaptic functions and for genes expressed, respectively, in Purkinje neurons or granule neurons. These gene networks were also down-regulated in the cerebellum of patients with bipolar disorder compared to healthy controls and were enriched for additional rare variants associated with bipolar disorder risk. These findings implicate dysregulation of NMDAR signaling and of gene expression in cerebellar neurons in bipolar disorder pathophysiology and provide insight into its genetic architecture

    Genomic and molecular characterization of preterm birth.

    Get PDF
    Preterm birth (PTB) complications are the leading cause of long-term morbidity and mortality in children. By using whole blood samples, we integrated whole-genome sequencing (WGS), RNA sequencing (RNA-seq), and DNA methylation data for 270 PTB and 521 control families. We analyzed this combined dataset to identify genomic variants associated with PTB and secondary analyses to identify variants associated with very early PTB (VEPTB) as well as other subcategories of disease that may contribute to PTB. We identified differentially expressed genes (DEGs) and methylated genomic loci and performed expression and methylation quantitative trait loci analyses to link genomic variants to these expression and methylation changes. We performed enrichment tests to identify overlaps between new and known PTB candidate gene systems. We identified 160 significant genomic variants associated with PTB-related phenotypes. The most significant variants, DEGs, and differentially methylated loci were associated with VEPTB. Integration of all data types identified a set of 72 candidate biomarker genes for VEPTB, encompassing genes and those previously associated with PTB. Notably, PTB-associated genes RAB31 and RBPJ were identified by all three data types (WGS, RNA-seq, and methylation). Pathways associated with VEPTB include EGFR and prolactin signaling pathways, inflammation- and immunity-related pathways, chemokine signaling, IFN-γ signaling, and Notch1 signaling. Progress in identifying molecular components of a complex disease is aided by integrated analyses of multiple molecular data types and clinical data. With these data, and by stratifying PTB by subphenotype, we have identified associations between VEPTB and the underlying biology

    Application of affymetrix array and massively parallel signature sequencing for identification of genes involved in prostate cancer progression

    Get PDF
    BACKGROUND: Affymetrix GeneChip Array and Massively Parallel Signature Sequencing (MPSS) are two high throughput methodologies used to profile transcriptomes. Each method has certain strengths and weaknesses; however, no comparison has been made between the data derived from Affymetrix arrays and MPSS. In this study, two lineage-related prostate cancer cell lines, LNCaP and C4-2, were used for transcriptome analysis with the aim of identifying genes associated with prostate cancer progression. METHODS: Affymetrix GeneChip array and MPSS analyses were performed. Data was analyzed with GeneSpring 6.2 and in-house perl scripts. Expression array results were verified with RT-PCR. RESULTS: Comparison of the data revealed that both technologies detected genes the other did not. In LNCaP, 3,180 genes were only detected by Affymetrix and 1,169 genes were only detected by MPSS. Similarly, in C4-2, 4,121 genes were only detected by Affymetrix and 1,014 genes were only detected by MPSS. Analysis of the combined transcriptomes identified 66 genes unique to LNCaP cells and 33 genes unique to C4-2 cells. Expression analysis of these genes in prostate cancer specimens showed CA1 to be highly expressed in bone metastasis but not expressed in primary tumor and EPHA7 to be expressed in normal prostate and primary tumor but not bone metastasis. CONCLUSION: Our data indicates that transcriptome profiling with a single methodology will not fully assess the expression of all genes in a cell line. A combination of transcription profiling technologies such as DNA array and MPSS provides a more robust means to assess the expression profile of an RNA sample. Finally, genes that were differentially expressed in cell lines were also differentially expressed in primary prostate cancer and its metastases
    corecore