401 research outputs found
MEXPRESS : visualizing expression, DNA methylation and clinical TCGA data
Background: In recent years, increasing amounts of genomic and clinical cancer data have become publically available through large-scale collaborative projects such as The Cancer Genome Atlas (TCGA). However, as long as these datasets are difficult to access and interpret, they are essentially useless for a major part of the research community and their scientific potential will not be fully realized. To address these issues we developed MEXPRESS, a straightforward and easy-to-use web tool for the integration and visualization of the expression, DNA methylation and clinical TCGA data on a single-gene level (http://mexpress.be).
Results: In comparison to existing tools, MEXPRESS allows researchers to quickly visualize and interpret the different TCGA datasets and their relationships for a single gene, as demonstrated for GSTP1 in prostate adenocarcinoma. We also used MEXPRESS to reveal the differences in the DNA methylation status of the PAM50 marker gene MLPH between the breast cancer subtypes and how these differences were linked to the expression of MPLH.
Conclusions: We have created a user-friendly tool for the visualization and interpretation of TCGA data, offering clinical researchers a simple way to evaluate the TCGA data for their genes or candidate biomarkers of interest
My-Forensic-Loci-queries (MyFLq) framework for analysis of forensic STR data generated by massive parallel sequencing
Forensic scientists are currently investigating how to transition from capillary electrophoresis (CE) to massive parallel sequencing (MPS) for analysis of forensic DNA profiles. MPS offers several advantages over CE such as virtually unlimited multiplexy of loci, combining both short tandem repeat (STR) and single nucleotide polymorphism (SNP) loci, small amplicons without constraints of size separation, more discrimination power, deep mixture resolution and sample multiplexing. We present our bioinformatic framework My-Forensic-Loci-queries (MyFLq) for analysis of MPS forensic data. For allele calling, the framework uses a MySQL reference allele database with automatically determined regions of interest (ROIs) by a generic maximal flanking algorithm which makes it possible to use any STR or SNP forensic locus. Python scripts were designed to automatically make allele calls starting from raw MPS data. We also present a method to assess the usefulness and overall performance of a forensic locus with respect to MPS, as well as methods to estimate whether an unknown allele, which sequence is not present in the MySQL database, is in fact a new allele or a sequencing error. The MyFLq framework was applied to an Illumina MiSeq dataset of a forensic Illumina amplicon library, generated from multilocus STR polymerase chain reaction (PCR) on both single contributor samples and multiple person DNA mixtures. Although the multilocus PCR was not yet optimized for MPS in terms of amplicon length or locus selection, the results show excellent results for most loci. The results show a high signal-to-noise ratio, correct allele calls, and a low limit of detection for minor DNA contributors in mixed DNA samples. Technically, forensic MPS affords great promise for routine implementation in forensic genomics. The method is also applicable to adjacent disciplines such as mitochondrial DNA research
DNA methylation profiling of primary neuroblastoma tumors using methyl-CpG-binding domain sequencing
Comprehensive genome-wide DNA methylation studies in neuroblastoma (NB), a childhood tumor that originates from precursor cells of the sympathetic nervous system, are scarce. Recently, we profiled the DNA methylome of 102 well-annotated primary NB tumors by methyl-CpG-binding domain (MBD) sequencing, in order to identify prognostic biomarker candidates. In this data descriptor, we give details on how this data set was generated and which bioinformatics analyses were applied during data processing. Through a series of technical validations, we illustrate that the data are of high quality and that the sequenced fragments represent methylated genomic regions. Furthermore, genes previously described to be methylated in NB are confirmed. As such, these MBD sequencing data are a valuable resource to further study the association of NB risk factors with the NB methylome, and offer the opportunity to integrate methylome data with other -omic data sets on the same tumor samples such as gene copy number and gene expression, also publically available
An update on sORFs.org : a repository of small ORFs identified by ribosome profiling
sORFs.org (http://www.sorfs.org) is a public repository of small open reading frames (sORFs) identified by ribosome profiling (RIBO-seq). This update elaborates on the major improvements implemented since its initial release. sORFs.org now additionally supports three more species (zebrafish, rat and Caenorhabditis elegans) and currently includes 78 RIBO-seq datasets, a vast increase compared to the three that were processed in the initial release. Therefore, a novel pipeline was constructed that also enables sORF detection in RIBO-seq datasets comprising solely elongating RIBO-seq data while previously, matching initiating RIBO-seq data was necessary to delineate the sORFs. Furthermore, a novel noise filtering algorithm was designed, able to distinguish sORFs with true ribosomal activity from simulated noise, consequently reducing the false positive identification rate. The inclusion of other species also led to the development of an inner BLAST pipeline, assessing sequence similarity between sORFs in the repository. Building on the proof of concept model in the initial release of sORFs.org, a full PRIDE-ReSpin pipeline was now released, reprocessing publicly available MS-based proteomics PRIDE datasets, reporting on true translation events. Next to reporting those identified peptides, sORFs.org allows visual inspection of the annotated spectra within the Lorikeet MS/MS viewer, thus enabling detailed manual inspection and interpretation
Accurate long read mapping using enhanced suffix arrays
With the rise of high throughput sequencing, new programs have been developed for dealing with the alignment of a huge amount of short read data to reference genomes. Recent developments in sequencing technology allow longer reads, but the mappers for short reads are not suited for reads of several hundreds of base pairs. We propose an algorithm for mapping longer reads, which is based on chaining maximal exact matches and uses heuristics and the Needleman-Wunsch algorithm to bridge the gaps. To compute maximal exact matches we use a specialized index structure, called enhanced suffix array. The proposed algorithm is very accurate and can handle large reads with mutations and long insertions and deletions
Systemic suppression of the shoot metabolism upon rice root nematode infection
Hirschmanniella oryzae is the most common plant-parasitic nematode in flooded rice cultivation systems. These migratory animals penetrate the plant roots and feed on the root cells, creating large cavities, extensive root necrosis and rotting. The objective of this study was to investigate the systemic response of the rice plant upon root infection by this nematode. RNA sequencing was applied on the above-ground parts of the rice plants at 3 and 7 days post inoculation. The data revealed significant modifications in the primary metabolism of the plant shoot, with a general suppression of for instance chlorophyll biosynthesis, the brassinosteroid pathway, and amino acid production. In the secondary metabolism, we detected a repression of the isoprenoid and shikimate pathways. These molecular changes can have dramatic consequences for the growth and yield of the rice plants, and could potentially change their susceptibility to above-ground pathogens and pests
Mass spectrometry and ribosome profiling, a perfect combination towards a more comprehensive identification strategy of true in vivo protein forms
An increasing number of studies involve integrative analysis of gene and protein expression data, taking advantage of new technologies such as next-generation transcriptome sequencing (RNA-Seq) and highly sensitive mass spectrometry (MS). Recently, a strategy, termed ribosome profiling, based on deep sequencing of ribosome-protected mRNA fragments, indirectly monitoring protein synthesis, has been described. In contrast to routinely employed protein databases in proteomics searches, RIBO-seq derived data gives a more representative expression state and accounts for sequence variation information and alternative translation initiation.
To verify the potential of ribosome profiling in providing us with a true snapshot of the translational landscape, we devised a proteogenomic approach generating a database of translation products based on ribosome profiling experiments. The raw and untreated RIBO-seq data is analyzed for both splice isoforms and single nucleotide polymorphisms, as such taking into account transcriptional variation. Next to that, RIBO-seq data for translation start site discovery (treated with harringtonine, lactomidomycin or puromycin) is used to obtain a genome wide blueprint of all possible translation initiation sites and as such taking into account translation variation. By adding protein-DB annotation to the genomic RIBO-seq derived data and after in silico translation a protein database is constructed reflecting the full complexity of the proteome.
Using a first version of our proteogenomic approach on an undifferentiated mouse embryonic stem cell line (E14) we could demonstrate an increase of the overall protein identification rate with 2.5% as compared to only searching UniProtKB-SwissProt. Furthermore, identification of N-terminal COFRADIC data resulted in detection of 16 alternative start sites giving rise to N-terminally extended protein variants besides the identification of four translated uORFs
Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs
Background: It was long assumed that proteins are at least 100 amino acids (AAs) long. Moreover, the detection of short translation products (e. g. coded from small Open Reading Frames, sORFs) is very difficult as the short length makes it hard to distinguish true coding ORFs from ORFs occurring by chance. Nevertheless, over the past few years many such non-canonical genes (with ORFs < 100 AAs) have been discovered in different organisms like Arabidopsis thaliana, Saccharomyces cerevisiae, and Drosophila melanogaster. Thanks to advances in sequencing, bioinformatics and computing power, it is now possible to scan the genome in unprecedented scrutiny, for example in a search of this type of small ORFs.
Results: Using bioinformatics methods, we performed a systematic search for putatively functional sORFs in the Mus musculus genome. A genome-wide scan detected all sORFs which were subsequently analyzed for their coding potential, based on evolutionary conservation at the AA level, and ranked using a Support Vector Machine (SVM) learning model. The ranked sORFs are finally overlapped with ribosome profiling data, hinting to sORF translation. All candidates are visually inspected using an in-house developed genome browser. In this way dozens of highly conserved sORFs, targeted by ribosomes were identified in the mouse genome, putatively encoding micropeptides.
Conclusion: Our combined genome-wide approach leads to the prediction of a comprehensive but manageable set of putatively coding sORFs, a very important first step towards the identification of a new class of bioactive peptides, called micropeptides
The human homologue of Caenorhabditis elegans CED-6 specifically promotes phagocytosis of apoptotic cells
AbstractA key feature of the process of programmed cell death (apoptosis) is the efficiency with which the dying cells are recognized and engulfed by phagocytes [1]. Apoptotic cells are rapidly cleared either by neighbouring cells acting as semi-professional phagocytes or by experts of the macrophage line, so that an inflammatory response is avoided [2]. The Caenorhabditis elegans gene ced-6 is required for efficient engulfment of apoptotic cells [3] and is one of a group of genes that define two partially redundant parallel pathways for the engulfment process [4,5]. These pathways may be conserved across evolution, as two other engulfment genes have human homologues. A CED-5 homologue is part of a human CrkII–DOCK180–Rac signaling pathway proposed to mediate cytoskeletal reorganization [6–8] and a CED-7 homologue is similar to the ABC transporters [9,10]. Here, we report the cloning and characterization of human CED-6, a human homologue of C. elegans CED-6. The 34 kDa hCED-6 protein is expressed in most tissues, some human cancer cells, and in primary human macrophages. We developed an assay that quantitates the phagocytic activity of mammalian macrophages: the number of apoptotic cells that have been internalized is measured by the uptake of lacZ-positive apoptotic cells by adherent transgenic macrophages. The results of this assay demonstrate that overexpression of hCED-6 promotes phagocytosis only of apoptotic cells and suggest that hCED-6 is the mammalian orthologue of C. elegans CED-6 and is a part of a highly conserved pathway that specifically mediates the phagocytosis of apoptotic cells
- …