5,471 research outputs found
Novel deletions causing pseudoxanthoma elasticum underscore the genomic instability of the ABCC6 region
Mutations in ABCC6 cause pseudoxanthoma elasticum (PXE), a heritable disease that affects elastic fibers. Thus far, >200 mutations have been characterized by various PCR-based techniques (primarily direct sequencing), identifying up to 90% of PXE-causing alleles. This study wanted to assess the importance of deletions and insertions in the ABCC6 genomic region, which is known to have a high recombinational potential. To detect ABCC6 deletions/insertions, which can be missed by direct sequencing, multiplex ligation-dependent probe amplification (MLPA) was applied in PXE patients with an incomplete genotype. MLPA was performed in 35 PXE patients with at least one unidentified mutant allele after exonic sequencing and exclusion of the recurrent exon 23-29 deletion. Six multi-exon deletions and four single-exon deletions were detected. Using MLPA in addition to sequencing, we expanded the ABCC6 mutation spectrum with 9 novel deletions and characterized 25% of unidentified disease alleles. Our results further illustrate the instability of the ABCC6 genomic region and stress the importance of screening for deletions in the molecular diagnosis of PXE. Journal of Human Genetics (2010) 55, 112-117; doi: 10.1038/jhg.2009.132; published online 15 January 201
Analysis of nucleosome positioning landscapes enables gene discovery in the human malaria parasite Plasmodium falciparum.
BackgroundPlasmodium falciparum, the deadliest malaria-causing parasite, has an extremely AT-rich (80.7 %) genome. Because of high AT-content, sequence-based annotation of genes and functional elements remains challenging. In order to better understand the regulatory network controlling gene expression in the parasite, a more complete genome annotation as well as analysis tools adapted for AT-rich genomes are needed. Recent studies on genome-wide nucleosome positioning in eukaryotes have shown that nucleosome landscapes exhibit regular characteristic patterns at the 5'- and 3'-end of protein and non-protein coding genes. In addition, nucleosome depleted regions can be found near transcription start sites. These unique nucleosome landscape patterns may be exploited for the identification of novel genes. In this paper, we propose a computational approach to discover novel putative genes based exclusively on nucleosome positioning data in the AT-rich genome of P. falciparum.ResultsUsing binary classifiers trained on nucleosome landscapes at the gene boundaries from two independent nucleosome positioning data sets, we were able to detect a total of 231 regions containing putative genes in the genome of Plasmodium falciparum, of which 67 highly confident genes were found in both data sets. Eighty-eight of these 231 newly predicted genes exhibited transcription signal in RNA-Seq data, indicative of active transcription. In addition, 20 out of 21 selected gene candidates were further validated by RT-PCR, and 28 out of the 231 genes showed significant matches using BLASTN against an expressed sequence tag (EST) database. Furthermore, 108 (47%) out of the 231 putative novel genes overlapped with previously identified but unannotated long non-coding RNAs. Collectively, these results provide experimental validation for 163 predicted genes (70.6%). Finally, 73 out of 231 genes were found to be potentially translated based on their signal in polysome-associated RNA-Seq representing transcripts that are actively being translated.ConclusionOur results clearly indicate that nucleosome positioning data contains sufficient information for novel gene discovery. As distinct nucleosome landscapes around genes are found in many other eukaryotic organisms, this methodology could be used to characterize the transcriptome of any organism, especially when coupled with other DNA-based gene finding and experimental methods (e.g., RNA-Seq)
Embryonic stem cell-specific signatures in cancer: insights into genomic regulatory networks and implications for medicine
Embryonic stem (ES) cells are of great interest as a model system for studying early developmental processes and because of their potential therapeutic applications in regenerative medicine. Obtaining a systematic understanding of the mechanisms that control the 'stemness' - self-renewal and pluripotency - of ES cells relies on high-throughput tools to define gene expression and regulatory networks at the genome level. Such recently developed systems biology approaches have revealed highly interconnected networks in which multiple regulatory factors act in combination. Interestingly, stem cells and cancer cells share some properties, notably self-renewal and a block in differentiation. Recently, several groups reported that expression signatures that are specific to ES cells are also found in many human cancers and in mouse cancer models, suggesting that these shared features might inform new approaches for cancer therapy. Here, we briefly summarize the key transcriptional regulators that contribute to the pluripotency of ES cells, the factors that account for the common gene expression patterns of ES and cancer cells, and the implications of these observations for future clinical applications.Institute for Cellular and Molecular [email protected]
Re-annotation and re-analysis of the Campylobacter jejuni NCTC11168 genome sequence
BACKGROUND: Campylobacter jejuni is the leading bacterial cause of human gastroenteritis in the developed world. To improve our understanding of this important human pathogen, the C. jejuni NCTC11168 genome was sequenced and published in 2000. The original annotation was a milestone in Campylobacter research, but is outdated. We now describe the complete re-annotation and re-analysis of the C. jejuni NCTC11168 genome using current database information, novel tools and annotation techniques not used during the original annotation. RESULTS: Re-annotation was carried out using sequence database searches such as FASTA, along with programs such as TMHMM for additional support. The re-annotation also utilises sequence data from additional Campylobacter strains and species not available during the original annotation. Re-annotation was accompanied by a full literature search that was incorporated into the updated EMBL file [EMBL: AL111168]. The C. jejuni NCTC11168 re-annotation reduced the total number of coding sequences from 1654 to 1643, of which 90.0% have additional information regarding the identification of new motifs and/or relevant literature. Re-annotation has led to 18.2% of coding sequence product functions being revised. CONCLUSIONS: Major updates were made to genes involved in the biosynthesis of important surface structures such as lipooligosaccharide, capsule and both O- and N-linked glycosylation. This re-annotation will be a key resource for Campylobacter research and will also provide a prototype for the re-annotation and re-interpretation of other bacterial genomes
N-terminal proteomics assisted profiling of the unexplored translation initiation landscape in Arabidopsis thaliana
Proteogenomics is an emerging research field yet lacking a uniform method of analysis. Proteogenomic studies in which N-terminal proteomics and ribosome profiling are combined, suggest that a high number of protein start sites are currently missing in genome annotations. We constructed a proteogenomic pipeline specific for the analysis of N-terminal proteomics data, with the aim of discovering novel translational start sites outside annotated protein coding regions. In summary, unidentified MS/MS spectra were matched to a specific N-terminal peptide library encompassing protein N termini encoded in the Arabidopsis thaliana genome. After a stringent false discovery rate filtering, 117 protein N termini compliant with N-terminal methionine excision specificity and indicative of translation initiation were found. These include N-terminal protein extensions and translation from transposable elements and pseudogenes. Gene prediction provided supporting protein-coding models for approximately half of the protein N termini. Besides the prediction of functional domains (partially) contained within the newly predicted ORFs, further supporting evidence of translation was found in the recently released Araport11 genome re-annotation of Arabidopsis and computational translations of sequences stored in public repositories. Most interestingly, complementary evidence by ribosome profiling was found for 23 protein N termini. Finally, by analyzing protein N-terminal peptides, an in silico analysis demonstrates the applicability of our N-terminal proteogenomics strategy in revealing protein-coding potential in species with well-and poorly-annotated genomes
Spatiotemporal Expression of Pregnancy-Specific Glycoprotein Gene rnCGMl in Rat Placenta
As a basis towards a better understanding of the role of the pregnancy-specific glycoprotein (PSG) family in the maintenance of pregnancy, detailed investigations are described on the expression of a recently identified rat PSG gene (rnCGM1) at the mRNA and protein levels. Using specific oligonucleotide primers, rnCGM1 transcripts were identified after reverse transcription, polymerase chain reaction, and hybridization with a radiolabelled, internal oligonucleotide. Transcripts were only found in significant amounts in placenta. In situ hybridization visualized rnCGM1 transcripts at day 14 post coitum (p.c.), in secondary trophoblast giant cells and in the spongiotrophoblast. Only those secondary giant cells lining the maternal decidua were positive. In contrast, primary giant cells did not contain rnCGM1 mRNA. At day 18 p.c., rnCGM1. transcripts were almost exclusively detectable in the spongiotrophoblast. No rnCGM1 transcripts were found in rat embryos of these two developmental stages. Rabbit antisera were generated against the amino-terminal immunoglobulin variable-like domain and against a synthetic peptide containing the last 13 carboxy-terminal amino acids of rnCGM1. Bothe antisera recognized a 124 kDa protein in day 18 rat placental extracts as identified by Western blot analysis. The anti-peptide antiserum recognized a 116 kDa protein in the serum of a 14 day p.c. pregnant rat that is absent from the sera of non-pregnant females. Taken together, these results confirm exclusive expression of rnCGM1 in the rat trophoblast, but unlike human PSG, negligible or no expression is found in other organs, such as fetal liver or salivary glands, indicating a more specialized function of rnCGM1. Its spatiotemporal expression pattern is conducive with a potential role of PSG in protecting the fetus against the maternal immune system and/or in regulating the invasive growth of trophoblast cells
Recommended from our members
Exploring the loblolly pine (Pinus taeda L.) genome by BAC sequencing and Cot analysis.
Loblolly pine (LP; Pinus taeda L.) is an economically and ecologically important tree in the southeastern U.S. To advance understanding of the loblolly pine (LP; Pinus taeda L.) genome, we sequenced and analyzed 100 BAC clones and performed a Cot analysis. The Cot analysis indicates that the genome is composed of 57, 24, and 10% highly-repetitive, moderately-repetitive, and single/low-copy sequences, respectively (the remaining 9% of the genome is a combination of fold back and damaged DNA). Although single/low-copy DNA only accounts for 10% of the LP genome, the amount of single/low-copy DNA in LP is still 14 times the size of the Arabidopsis genome. Since gene numbers in LP are similar to those in Arabidopsis, much of the single/low-copy DNA of LP would appear to be composed of DNA that is both gene- and repeat-poor. Macroarrays prepared from a LP bacterial artificial chromosome (BAC) library were hybridized with probes designed from cell wall synthesis/wood development cDNAs, and 50 of the "targeted" clones were selected for further analysis. An additional 25 clones were selected because they contained few repeats, while 25 more clones were selected at random. The 100 BAC clones were Sanger sequenced and assembled. Of the targeted BACs, 80% contained all or part of the cDNA used to target them. One targeted BAC was found to contain fungal DNA and was eliminated from further analysis. Combinations of similarity-based and ab initio gene prediction approaches were utilized to identify and characterize potential coding regions in the 99 BACs containing LP DNA. From this analysis, we identified 154 gene models (GMs) representing both putative protein-coding genes and likely pseudogenes. Ten of the GMs (all of which were specifically targeted) had enough support to be classified as intact genes. Interestingly, the 154 GMs had statistically indistinguishable (α = 0.05) distributions in the targeted and random BAC clones (15.18 and 12.61 GM/Mb, respectively), whereas the low-repeat BACs contained significantly fewer GMs (7.08 GM/Mb). However, when GM length was considered, the targeted BACs had a significantly greater percentage of their length in GMs (3.26%) when compared to random (1.63%) and low-repeat (0.62%) BACs. The results of our study provide insight into LP evolution and inform ongoing efforts to produce a reference genome sequence for LP, while characterization of genes involved in cell wall production highlights carbon metabolism pathways that can be leveraged for increasing wood production
Assessing the Gene Content of the Megagenome: Sugar Pine (Pinus lambertiana).
Sugar pine (Pinus lambertiana Douglas) is within the subgenus Strobus with an estimated genome size of 31 Gbp. Transcriptomic resources are of particular interest in conifers due to the challenges presented in their megagenomes for gene identification. In this study, we present the first comprehensive survey of the P. lambertiana transcriptome through deep sequencing of a variety of tissue types to generate more than 2.5 billion short reads. Third generation, long reads generated through PacBio Iso-Seq have been included for the first time in conifers to combat the challenges associated with de novo transcriptome assembly. A technology comparison is provided here to contribute to the otherwise scarce comparisons of second and third generation transcriptome sequencing approaches in plant species. In addition, the transcriptome reference was essential for gene model identification and quality assessment in the parallel project responsible for sequencing and assembly of the entire genome. In this study, the transcriptomic data were also used to address questions surrounding lineage-specific Dicer-like proteins in conifers. These proteins play a role in the control of transposable element proliferation and the related genome expansion in conifers
Whole Genome Sequences of Three Treponema pallidum ssp. pertenue Strains: Yaws and Syphilis Treponemes Differ in Less than 0.2% of the Genome Sequence
Spirochete Treponema pallidum ssp. pertenue (TPE) is the causative agent of yaws while strains of Treponema pallidum ssp. pallidum (TPA) cause syphilis. Both yaws and syphilis are distinguished on the basis of epidemiological characteristics and clinical symptoms. Neither treponeme can reproduce outside the host organism, which precludes the use of standard molecular biology techniques used to study cultivable pathogens. In this study, we determined high quality whole genome sequences of TPE strains and compared them to known genetic information for T. pallidum ssp. pallidum strains. The genome structure was identical in all three TPE strains and also between TPA and TPE strains. The TPE genome length ranged between 1,139,330 bp and 1,139,744 bp. The overall sequence identity between TPA and TPE genomes was 99.8%, indicating that the two pathogens are extremely closely related. A set of 34 TPE genes (3.5%) encoded proteins containing six or more amino acid replacements or other major sequence changes. These genes more often belonged to the group of genes with predicted virulence and unknown functions suggesting their involvement in infection differences between yaws and syphilis
- …