35 research outputs found

    Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort.

    Get PDF
    BACKGROUND: Genotypes obtained with commercial SNP arrays have been extensively used in many large case-control or population-based cohorts for SNP-based genome-wide association studies for a multitude of traits. Yet, these genotypes capture only a small fraction of the variance of the studied traits. Genomic structural variants (GSV) such as Copy Number Variation (CNV) may account for part of the missing heritability, but their comprehensive detection requires either next-generation arrays or sequencing. Sophisticated algorithms that infer CNVs by combining the intensities from SNP-probes for the two alleles can already be used to extract a partial view of such GSV from existing data sets. RESULTS: Here we present several advances to facilitate the latter approach. First, we introduce a novel CNV detection method based on a Gaussian Mixture Model. Second, we propose a new algorithm, PCA merge, for combining copy-number profiles from many individuals into consensus regions. We applied both our new methods as well as existing ones to data from 5612 individuals from the CoLaus study who were genotyped on Affymetrix 500K arrays. We developed a number of procedures in order to evaluate the performance of the different methods. This includes comparison with previously published CNVs as well as using a replication sample of 239 individuals, genotyped with Illumina 550K arrays. We also established a new evaluation procedure that employs the fact that related individuals are expected to share their CNVs more frequently than randomly selected individuals. The ability to detect both rare and common CNVs provides a valuable resource that will facilitate association studies exploring potential phenotypic associations with CNVs. CONCLUSION: Our new methodologies for CNV detection and their evaluation will help in extracting additional information from the large amount of SNP-genotyping data on various cohorts and use this to explore structural variants and their impact on complex traits

    Searching the expressed sequence tag (EST) databases: panning for genes.

    No full text
    The genomes of living organisms contain many elements, including genes coding for proteins. The portions of the genes expressed as mature mRNA, collectively known as the transcriptome, represent only a small part of the genome. The expressed sequence tag (EST) databases contain an increasingly large part of the transcriptome of many species. For this reason, these databases are probably the most abundant source of new coding sequences available today. However, the raw data deposited in the EST databases are to a large extent unorganised, unannotated, redundant and of relatively low quality. This paper reviews some of the characteristics of the EST data, and the methods that can be used to find novel protein sequences within them. It also documents a collection of databases, software and web sites that can be useful to biologists interested in mining the EST databases over the Internet, or in establishing a local environment for such analyses

    The need for a human gene index.

    No full text

    Similarities and differences of polyadenylation signals in human and fly.

    Get PDF
    BACKGROUND: Cleavage of messenger RNA (mRNA) precursors is an essential step in mRNA maturation. The signal recognized by the cleavage enzyme complex has been characterized as an A rich region upstream of the cleavage site containing a motif with consensus AAUAAA, followed by a U or UG rich region downstream of the cleavage site. RESULTS: We studied these signals using exhaustive databases of cleavage sites obtained from aligning raw expressed sequence tags (EST) sequences to genomic sequences in Homo sapiens and Drosophila melanogaster. These data show that the polyadenylation signal is highly conserved in human and fly. In addition, de novo motif searches generated a refined description of the U-rich downstream sequence (DSE) element, which shows more divergence between the two species. These refined motifs are applied, within a Hidden Markov Model (HMM) framework, to predict mRNA cleavage sites. CONCLUSION: We demonstrate that the DSE is a specific motif in both human and Drosophila. These findings shed light on the sequence correlates of a highly conserved biological process, and improve in silico prediction of 3' mRNA cleavage and polyadenylation sites

    Analysis of active site residues of the antiviral protein from summer leaves from Phytolacca americana by site-directed mutagenesis.

    No full text
    The summer leaf isoform of the pokeweed (Phytolacca americana) antiviral protein, PAP II, was produced in high yields from inclusion bodies in recombinant E. coli. On the basis of its sequence similarity with the spring leaf isoform (PAP I) and with the A chain of ricin, a three-dimensional model of the protein was constructed as an aid in the design of active site mutants. PAP II variants mutated in residues Asp 88 (D88N), Tyr 117 (Y117S), Glu 172 (E172Q), Arg 175 (R175H) and a combination of Asp 88 and Arg 175 (D88N/R175H) were produced in E. coli and assayed for their ability to inhibit protein synthesis in a rabbit reticulocyte lysate. All of these mutations had effects deleterious to the enzymatic activity of PAP II. The results were interpreted in the light of three reaction mechanisms proposed for ribosome-inactivating proteins (RIPs). We conclude that none of the proposed mechanisms is entirely consistent with the data presented here

    Survival-independent function of NF-kappaB/Rel during late stages of thymocyte differentiation.

    No full text
    Transcription factors of the NF-kappaB/Rel family are important mediators of extracellular signals. Their implication in positive selection of thymocytes is suggested by a defective thymic development in transgenic mice that over-express IkappaB in thymocytes. These mice exhibit an accumulation of an unusually prominent population of TCRhigh/CD4/CD8 double positive cells in the thymus and a dramatic reduction of CD4+ and CD8+ cells in the periphery. The present study addresses the role of NF-kappaB in survival and differentiation processes of maturing thymocytes using IkappaB/bcl-2 and IkappaB/HY double-transgenic mice. Neither the introduction of the anti-apoptosis gene bcl-2 nor the positively selecting background in female HY transgenic mice resulted in a rescue of the maturational defects observed in the thymus of IkappaB transgenic mice. Thus, rather than promoting survival the main role of NF-kappaB/Rel proteins during positive selection of thymocytes appears to be the mediation of differentiation signals

    Impaired fetal thymocyte development after efficient adenovirus-mediated inhibition of NF-kappa B activation.

    No full text
    We introduce a new experimental system combining adenovirus-mediated gene transfer and fetal thymic organ culture (FTOC). This system allowed us to efficiently express in developing thymocytes a mutant form of the NF-kappa B inhibitor I kappa B alpha (mut-I kappa B) and to study the maturation defects occurring when NF-kappa B activation is inhibited during fetal development. Fetal thymocytes infected with adenovirus containing mut-I kappa B were found to develop normally until the CD44-CD25+, CD4-CD8- double-negative stage, while production of more mature double-positive and single-positive populations was strongly decreased. Proliferation, as measured by the percentage of cells in cycle appeared normal, as did rearrangement and expression of the TCR beta-chain. However, apoptosis was much higher in FTOC infected with adenovirus containing mut-I kappa B than in FTOC infected with a control virus. Taken together, these results suggest that NF-kappa B plays a crucial role in ensuring the differentiation and survival of thymocytes in the early stages of their development

    ESTScan: a program for detecting, evaluating and reconstructing potential coding regions in EST sequences.

    No full text
    One of the problems associated with the large-scale analysis of unannotated, low quality EST sequences is the detection of coding regions and the correction of frameshift errors that they often contain. We introduce a new type of hidden Markov model that explicitly deals with the possibility of errors in the sequence to analyze, and incorporates a method for correcting these errors. This model was implemented in an efficient and robust program, ESTScan. We show that ESTScan can detect and extract coding regions from low-quality sequences with high selectivity and sensitivity, and is able to accurately correct frameshift errors. In the framework of genome sequencing projects, ESTScan could become a very useful tool for gene discovery, for quality control, and for the assembly of contigs representing the coding regions of genes
    corecore