64 research outputs found

    Defining genes: a computational framework

    Get PDF
    The precise elucidation of the gene concept has become the subject of intense discussion in light of results from several, large high-throughput surveys of transcriptomes and proteomes. In previous work, we proposed an approach for constructing gene concepts that combines genomic heritability with elements of function. Here, we introduce a definition of the gene within a computational framework of cellular interactions. The definition seeks to satisfy the practical requirements imposed by annotation, capture logical aspects of regulation, and encompass the evolutionary property of homology

    Proteinortho: Detection of (Co-)orthologs in large-scale analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases.</p> <p>Results</p> <p>The program <monospace>Proteinortho</monospace> described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply <monospace>Proteinortho</monospace> to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes.</p> <p>Conclusions</p> <p><monospace>Proteinortho</monospace> significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.</p

    SynBlast: Assisting the analysis of conserved synteny information

    Get PDF
    <p>Abstract</p> <p>Motivation</p> <p>In the last years more than 20 vertebrate genomes have been sequenced, and the rate at which genomic DNA information becomes available is rapidly accelerating. Gene duplication and gene loss events inherently limit the accuracy of orthology detection based on sequence similarity alone. Fully automated methods for orthology annotation do exist but often fail to identify individual members in cases of large gene families, or to distinguish missing data from traceable gene losses. This situation can be improved in many cases by including conserved synteny information.</p> <p>Results</p> <p>Here we present the <monospace>SynBlast</monospace> pipeline that is designed to construct and evaluate local synteny information. <monospace>SynBlast</monospace> uses the genomic region around a focal reference gene to retrieve candidates for homologous regions from a collection of target genomes and ranks them in accord with the available evidence for homology. The pipeline is intended as a tool to aid high quality manual annotation in particular in those cases where automatic procedures fail. We demonstrate how <monospace>SynBlast</monospace> is applied to retrieving orthologous and paralogous clusters using the vertebrate <it>Hox </it>and <it>ParaHox </it>clusters as examples.</p> <p>Software</p> <p>The <monospace>SynBlast</monospace> package written in <monospace>Perl</monospace> is available under the GNU General Public License at <url>http://www.bioinf.uni-leipzig.de/Software/SynBlast/</url>.</p

    Discovery of common and rare genetic risk variants for colorectal cancer.

    Get PDF
    To further dissect the genetic architecture of colorectal cancer (CRC), we performed whole-genome sequencing of 1,439 cases and 720 controls, imputed discovered sequence variants and Haplotype Reference Consortium panel variants into genome-wide association study data, and tested for association in 34,869 cases and 29,051 controls. Findings were followed up in an additional 23,262 cases and 38,296 controls. We discovered a strongly protective 0.3% frequency variant signal at CHD1. In a combined meta-analysis of 125,478 individuals, we identified 40 new independent signals at P < 5 × 10-8, bringing the number of known independent signals for CRC to ~100. New signals implicate lower-frequency variants, Krüppel-like factors, Hedgehog signaling, Hippo-YAP signaling, long noncoding RNAs and somatic drivers, and support a role for immune function. Heritability analyses suggest that CRC risk is highly polygenic, and larger, more comprehensive studies enabling rare variant analysis will improve understanding of biology underlying this risk and influence personalized screening strategies and drug development.Goncalo R Abecasis has received compensation from 23andMe and Helix. He is currently an employee of Regeneron Pharmaceuticals. Heather Hampel performs collaborative research with Ambry Genetics, InVitae Genetics, and Myriad Genetic Laboratories, Inc., is on the scientific advisory board for InVitae Genetics and Genome Medical, and has stock in Genome Medical. Rachel Pearlman has participated in collaborative funded research with Myriad Genetics Laboratories and Invitae Genetics but has no financial competitive interest

    Combining Asian and European genome-wide association studies of colorectal cancer improves risk prediction across racial and ethnic populations

    Full text link
    Polygenic risk scores (PRS) have great potential to guide precision colorectal cancer (CRC) prevention by identifying those at higher risk to undertake targeted screening. However, current PRS using European ancestry data have sub-optimal performance in non-European ancestry populations, limiting their utility among these populations. Towards addressing this deficiency, we expand PRS development for CRC by incorporating Asian ancestry data (21,731 cases; 47,444 controls) into European ancestry training datasets (78,473 cases; 107,143 controls). The AUC estimates (95% CI) of PRS are 0.63(0.62-0.64), 0.59(0.57-0.61), 0.62(0.60-0.63), and 0.65(0.63-0.66) in independent datasets including 1681-3651 cases and 8696-115,105 controls of Asian, Black/African American, Latinx/Hispanic, and non-Hispanic White, respectively. They are significantly better than the European-centric PRS in all four major US racial and ethnic groups (p-values < 0.05). Further inclusion of non-European ancestry populations, especially Black/African American and Latinx/Hispanic, is needed to improve the risk prediction and enhance equity in applying PRS in clinical practice

    Novel Common Genetic Susceptibility Loci for Colorectal Cancer

    Get PDF
    BACKGROUND: Previous genome-wide association studies (GWAS) have identified 42 loci (P < 5 × 10-8) associated with risk of colorectal cancer (CRC). Expanded consortium efforts facilitating the discovery of additional susceptibility loci may capture unexplained familial risk. METHODS: We conducted a GWAS in European descent CRC cases and control subjects using a discovery-replication design, followed by examination of novel findings in a multiethnic sample (cumulative n = 163 315). In the discovery stage (36 948 case subjects/30 864 control subjects), we identified genetic variants with a minor allele frequency of 1% or greater associated with risk of CRC using logistic regression followed by a fixed-effects inverse variance weighted meta-analysis. All novel independent variants reaching genome-wide statistical significance (two-sided P < 5 × 10-8) were tested for replication in separate European ancestry samples (12 952 case subjects/48 383 control subjects). Next, we examined the generalizability of discovered variants in East Asians, African Americans, and Hispanics (12 085 case subjects/22 083 control subjects). Finally, we examined the contributions of novel risk variants to familial relative risk and examined the prediction capabilities of a polygenic risk score. All statistical tests were two-sided. RESULTS: The discovery GWAS identified 11 variants associated with CRC at P < 5 × 10-8, of which nine (at 4q22.2/5p15.33/5p13.1/6p21.31/6p12.1/10q11.23/12q24.21/16q24.1/20q13.13) independently replicated at a P value of less than .05. Multiethnic follow-up supported the generalizability of discovery findings. These results demonstrated a 14.7% increase in familial relative risk explained by common risk alleles from 10.3% (95% confidence interval [CI] = 7.9% to 13.7%; known variants) to 11.9% (95% CI = 9.2% to 15.5%; known and novel variants). A polygenic risk score identified 4.3% of the population at an odds ratio for developing CRC of at least 2.0. CONCLUSIONS: This study provides insight into the architecture of common genetic variation contributing to CRC etiology and improves risk prediction for individualized screenin
    corecore