Search CORE

511,867 research outputs found

PseudoFuN: Deriving functional potentials of pseudogenes from integrative relationships with genes and microRNAs across 32 cancers

Author: Campbell Moray J.
Dan Li Shuyu
Franz Eric
Huang Kun
Huang Zhi
Johnson Travis S.
Li Sihong
Zhang Yan
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/05/2019
Field of study

BACKGROUND: Long thought "relics" of evolution, not until recently have pseudogenes been of medical interest regarding regulation in cancer. Often, these regulatory roles are a direct by-product of their close sequence homology to protein-coding genes. Novel pseudogene-gene (PGG) functional associations can be identified through the integration of biomedical data, such as sequence homology, functional pathways, gene expression, pseudogene expression, and microRNA expression. However, not all of the information has been integrated, and almost all previous pseudogene studies relied on 1:1 pseudogene-parent gene relationships without leveraging other homologous genes/pseudogenes. RESULTS: We produce PGG families that expand beyond the current 1:1 paradigm. First, we construct expansive PGG databases by (i) CUDAlign graphics processing unit (GPU) accelerated local alignment of all pseudogenes to gene families (totaling 1.6 billion individual local alignments and >40,000 GPU hours) and (ii) BLAST-based assignment of pseudogenes to gene families. Second, we create an open-source web application (PseudoFuN [Pseudogene Functional Networks]) to search for integrative functional relationships of sequence homology, microRNA expression, gene expression, pseudogene expression, and gene ontology. We produce four "flavors" of CUDAlign-based databases (>462,000,000 PGG pairwise alignments and 133,770 PGG families) that can be queried and downloaded using PseudoFuN. These databases are consistent with previous 1:1 PGG annotation and also are much more powerful including millions of de novo PGG associations. For example, we find multiple known (e.g., miR-20a-PTEN-PTENP1) and novel (e.g., miR-375-SOX15-PPP4R1L) microRNA-gene-pseudogene associations in prostate cancer. PseudoFuN provides a "one stop shop" for identifying and visualizing thousands of potential regulatory relationships related to pseudogenes in The Cancer Genome Atlas cancers. CONCLUSIONS: Thousands of new PGG associations can be explored in the context of microRNA-gene-pseudogene co-expression and differential expression with a simple-to-use online tool by bioinformaticians and oncologists alike

ScholarWorks IU Indianapolis

IUPUIScholarWorks

Pre-processing for noise detection in gene expression classification data

Author: CARVALHO André Carlos Ponce de Leon Ferreira de
LIBRALON Giampaolo Luiz
LORENA Ana Carolina
Publication venue: Sociedade Brasileira de Computação
Publication date: 01/01/2009
Field of study

Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.São Paulo State Research Foundation (FAPESP)CNP

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Springer - Publisher Connector

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Repositório da Produção USP (Univ. de São Paulo)

Scalable Inference of Gene Regulatory Networks with the Spark Distributed Computing Platform Cristo

Author: Aldana-Montes Jose Francisco
Barba-González Cristóbal
Benítez-Hidalgo Antonio
García-Nieto José
Publication venue
Publication date: 05/11/2018
Field of study

Inference of Gene Regulatory Networks (GRNs) remains an important open challenge in computational biology. The goal of bio-model inference is to, based on time-series of gene expression data, obtain the sparse topological structure and the parameters that quantitatively understand and reproduce the dynamics of biological system. Nevertheless, the inference of a GRN is a complex optimization problem that involve processing S-System models, which include large amount of gene expression data from hundreds (even thousands) of genes in multiple time-series (essays). This complexity, along with the amount of data managed, make the inference of GRNs to be a computationally expensive task. Therefore, the genera- tion of parallel algorithmic proposals that operate efficiently on distributed processing platforms is a must in current reconstruction of GRNs. In this paper, a parallel multi-objective approach is proposed for the optimal inference of GRNs, since min- imizing the Mean Squared Error using S-System model and Topology Regularization value. A flexible and robust multi-objective cellular evolutionary algorithm is adapted to deploy parallel tasks, in form of Spark jobs. The proposed approach has been developed using the framework jMetal, so in order to perform parallel computation, we use Spark on a cluster of distributed nodes to evaluate candidate solutions modeling the interactions of genes in biological networks.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Repositorio Institucional Universidad de Málaga

Reproducible probe-level analysis of the Affymetrix Exon 1.0 ST array with R/Bioconductor

Author: Bødker Julie Støve
Bøgsted Martin
Dybkær Karen
Falgreen Steffen
Johnsen Hans Erik
Kjeldsen Malene Krag
Rodrigo-Domingo Maria
Waagepetersen Rasmus
Publication venue
Publication date: 18/02/2013
Field of study

The presence of different transcripts of a gene across samples can be analysed by whole-transcriptome microarrays. Reproducing results from published microarray data represents a challenge due to the vast amounts of data and the large variety of pre-processing and filtering steps employed before the actual analysis is carried out. To guarantee a firm basis for methodological development where results with new methods are compared with previous results it is crucial to ensure that all analyses are completely reproducible for other researchers. We here give a detailed workflow on how to perform reproducible analysis of the GeneChip Human Exon 1.0 ST Array at probe and probeset level solely in R/Bioconductor, choosing packages based on their simplicity of use. To exemplify the use of the proposed workflow we analyse differential splicing and differential gene expression in a publicly available dataset using various statistical methods. We believe this study will provide other researchers with an easy way of accessing gene expression data at different annotation levels and with the sufficient details needed for developing their own tools for reproducible analysis of the GeneChip Human Exon 1.0 ST Array

arXiv.org e-Print Archive

PubMed Central

VBN (Videnbasen) Aalborg Universitets forskningsportal

DNA methylation profiling of primary neuroblastoma tumors using methyl-CpG-binding domain sequencing

Author: Decock Anneleen
Ongenaert Maté
Speleman Franki
Van Criekinge Wim
Vandesompele Jo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Comprehensive genome-wide DNA methylation studies in neuroblastoma (NB), a childhood tumor that originates from precursor cells of the sympathetic nervous system, are scarce. Recently, we profiled the DNA methylome of 102 well-annotated primary NB tumors by methyl-CpG-binding domain (MBD) sequencing, in order to identify prognostic biomarker candidates. In this data descriptor, we give details on how this data set was generated and which bioinformatics analyses were applied during data processing. Through a series of technical validations, we illustrate that the data are of high quality and that the sequenced fragments represent methylated genomic regions. Furthermore, genes previously described to be methylated in NB are confirmed. As such, these MBD sequencing data are a valuable resource to further study the association of NB risk factors with the NB methylome, and offer the opportunity to integrate methylome data with other -omic data sets on the same tumor samples such as gene copy number and gene expression, also publically available

Crossref

Ghent University Academic Bibliography

PubMed Central

A snoRNA modulates mRNA 3' end processing and regulates the expression of a subset of mRNAs.

Author: Ding Junjun
Guo Yibin
Huang Chunliu
Huang Shanshan
Huang Weijun
Huang Xi
Jia Jie
Ming Siqi
Shi Junjie
Shi Yongsheng
Wu Xingui
Xiang Andy Peng
Yao Chengguo
Zhang Rui
Zhao Wei
Publication venue: eScholarship, University of California
Publication date: 26/07/2017
Field of study

mRNA 3' end processing is an essential step in gene expression. It is well established that canonical eukaryotic pre-mRNA 3' processing is carried out within a macromolecular machinery consisting of dozens of trans-acting proteins. However, it is unknown whether RNAs play any role in this process. Unexpectedly, we found that a subset of small nucleolar RNAs (snoRNAs) are associated with the mammalian mRNA 3' processing complex. These snoRNAs primarily interact with Fip1, a component of cleavage and polyadenylation specificity factor (CPSF). We have functionally characterized one of these snoRNAs and our results demonstrated that the U/A-rich SNORD50A inhibits mRNA 3' processing by blocking the Fip1-poly(A) site (PAS) interaction. Consistently, SNORD50A depletion altered the Fip1-RNA interaction landscape and changed the alternative polyadenylation (APA) profiles and/or transcript levels of a subset of genes. Taken together, our data revealed a novel function for snoRNAs and provided the first evidence that non-coding RNAs may play an important role in regulating mRNA 3' processing

Crossref

eScholarship - University of California

Netter: re-ranking gene network inference predictions using structural network properties

Author: Demeester Piet
Dhaene Tom
Ruyssinck Joeri
Saeys Yvan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background: Many algorithms have been developed to infer the topology of gene regulatory networks from gene expression data. These methods typically produce a ranking of links between genes with associated confidence scores, after which a certain threshold is chosen to produce the inferred topology. However, the structural properties of the predicted network do not resemble those typical for a gene regulatory network, as most algorithms only take into account connections found in the data and do not include known graph properties in their inference process. This lowers the prediction accuracy of these methods, limiting their usability in practice. Results: We propose a post-processing algorithm which is applicable to any confidence ranking of regulatory interactions obtained from a network inference method which can use, inter alia, graphlets and several graph-invariant properties to re-rank the links into a more accurate prediction. To demonstrate the potential of our approach, we re-rank predictions of six different state-of-the-art algorithms using three simple network properties as optimization criteria and show that Netter can improve the predictions made on both artificially generated data as well as the DREAM4 and DREAM5 benchmarks. Additionally, the DREAM5 E. coli. community prediction inferred from real expression data is further improved. Furthermore, Netter compares favorably to other post-processing algorithms and is not restricted to correlation-like predictions. Lastly, we demonstrate that the performance increase is robust for a wide range of parameter settings. Netter is available at http://bioinformatics. intec. ugent. be. Conclusions: Network inference from high-throughput data is a long-standing challenge. In this work, we present Netter, which can further refine network predictions based on a set of user-defined graph properties. Netter is a flexible system which can be applied in unison with any method producing a ranking from omics data. It can be tailored to specific prior knowledge by expert users but can also be applied in general uses cases. Concluding, we believe that Netter is an interesting second step in the network inference process to further increase the quality of prediction

Crossref

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central

Recommended from our members

Common CHD8 Genomic Targets Contrast With Model-Specific Transcriptional Impacts of CHD8 Haploinsufficiency.

Author: Catta-Preta Rinaldo
Lim Kenneth
Nord Alex S
Wade A Ayanna
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

The packaging of DNA into chromatin determines the transcriptional potential of cells and is central to eukaryotic gene regulation. Case sequencing studies have revealed mutations to proteins that regulate chromatin state, known as chromatin remodeling factors, with causal roles in neurodevelopmental disorders. Chromodomain helicase DNA binding protein 8 (CHD8) encodes a chromatin remodeling factor with among the highest de novo loss-of-function mutation rates in patients with autism spectrum disorder (ASD). However, mechanisms associated with CHD8 pathology have yet to be elucidated. We analyzed published transcriptomic data across CHD8 in vitro and in vivo knockdown and knockout models and CHD8 binding across published ChIP-seq datasets to identify convergent mechanisms of gene regulation by CHD8. Differentially expressed genes (DEGs) across models varied, but overlap was observed between downregulated genes involved in neuronal development and function, cell cycle, chromatin dynamics, and RNA processing, and between upregulated genes involved in metabolism and immune response. Considering the variability in transcriptional changes and the cells and tissues represented across ChIP-seq analysis, we found a surprisingly consistent set of high-affinity CHD8 genomic interactions. CHD8 was enriched near promoters of genes involved in basic cell functions and gene regulation. Overlap between high-affinity CHD8 targets and DEGs shows that reduced dosage of CHD8 directly relates to decreased expression of cell cycle, chromatin organization, and RNA processing genes, but only in a subset of studies. This meta-analysis verifies CHD8 as a master regulator of gene expression and reveals a consistent set of high-affinity CHD8 targets across human, mouse, and rat in vivo and in vitro studies. These conserved regulatory targets include many genes that are also implicated in ASD. Our findings suggest a model where perturbation to dosage-sensitive CHD8 genomic interactions with a highly-conserved set of regulatory targets leads to model-specific downstream transcriptional impacts

eScholarship - University of California

The Francis Crick Institute

Virtual Environment for Next Generation Sequencing Analysis

Author: Abate Francesco
Acquaviva Andrea
Mossucca L.
Provenzano R.
Terzo Olivier
Publication venue: IARIA
Publication date: 01/01/2012
Field of study

Next Generation Sequencing technology, on the one hand, allows a more accurate analysis, and, on the other hand, increases the amount of data to process. A new protocol for sequencing the messenger RNA in a cell, known as RNA- Seq, generates millions of short sequence fragments in a single run. These fragments, or reads, can be used to measure levels of gene expression and to identify novel splice variants of genes. The proposed solution is a distributed architecture consisting of a Grid Environment and a Virtual Grid Environment, in order to reduce processing time by making the system scalable and flexibl

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino