49,280 research outputs found
Orchestrated transcription of biological processes in the marine picoeukaryote Ostreococcus exposed to light/dark cycles
Background: Picoeukaryotes represent an important, yet poorly characterized component of marine phytoplankton. The recent genome availability for two species of Ostreococcus and Micromonas has led to the emergence of picophytoplankton comparative genomics. Sequencing has revealed many unexpected features about genome structure and led to several hypotheses on Ostreococcus biology and physiology. Despite the accumulation of genomic data, little is known about gene expression in eukaryotic picophytoplankton.
Results: We have conducted a genome-wide analysis of gene expression in Ostreococcus tauri cells exposed to light/dark cycles (L/D). A Bayesian Fourier Clustering method was implemented to cluster rhythmic genes according to their expression waveform. In a single L/D condition nearly all expressed genes displayed rhythmic patterns of expression. Clusters of genes were associated with the main biological processes such as transcription in the nucleus and the organelles, photosynthesis, DNA replication and mitosis.
Conclusions: Light/Dark time-dependent transcription of the genes involved in the main steps leading to protein synthesis (transcription basic machinery, ribosome biogenesis, translation and aminoacid synthesis) was observed, to an unprecedented extent in eukaryotes, suggesting a major input of transcriptional regulations in Ostreococcus. We propose that the diurnal co-regulation of genes involved in photoprotection, defence against oxidative stress and DNA repair might be an efficient mechanism, which protects cells against photo-damage thereby, contributing to the ability of O. tauri to grow under a wide range of light intensities
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Visualizing dimensionality reduction of systems biology data
One of the challenges in analyzing high-dimensional expression data is the
detection of important biological signals. A common approach is to apply a
dimension reduction method, such as principal component analysis. Typically,
after application of such a method the data is projected and visualized in the
new coordinate system, using scatter plots or profile plots. These methods
provide good results if the data have certain properties which become visible
in the new coordinate system and which were hard to detect in the original
coordinate system. Often however, the application of only one method does not
suffice to capture all important signals. Therefore several methods addressing
different aspects of the data need to be applied. We have developed a framework
for linear and non-linear dimension reduction methods within our visual
analytics pipeline SpRay. This includes measures that assist the interpretation
of the factorization result. Different visualizations of these measures can be
combined with functional annotations that support the interpretation of the
results. We show an application to high-resolution time series microarray data
in the antibiotic-producing organism Streptomyces coelicolor as well as to
microarray data measuring expression of cells with normal karyotype and cells
with trisomies of human chromosomes 13 and 21
An Overview of the Use of Neural Networks for Data Mining Tasks
In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks
Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-seq and ESTs
The reference annotations made for a genome sequence provide the framework
for all subsequent analyses of the genome. Correct annotation is particularly
important when interpreting the results of RNA-seq experiments where short
sequence reads are mapped against the genome and assigned to genes according to
the annotation. Inconsistencies in annotations between the reference and the
experimental system can lead to incorrect interpretation of the effect on RNA
expression of an experimental treatment or mutation in the system under study.
Until recently, the genome-wide annotation of 3-prime untranslated regions
received less attention than coding regions and the delineation of intron/exon
boundaries. In this paper, data produced for samples in Human, Chicken and A.
thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing
technology from Helicos Biosciences which locates 3-prime polyadenylation sites
to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine
examples are illustrated where this combination of data allowed: (1) gene and
3-prime UTR re-annotation (including extension of one 3-prime UTR by 5.9 kb);
(2) disentangling of gene expression in complex regions; (3) clearer
interpretation of small RNA expression and (4) identification of novel genes.
While the specific examples displayed here may become obsolete as genome
sequences and their annotations are refined, the principles laid out in this
paper will be of general use both to those annotating genomes and those seeking
to interpret existing publically available annotations in the context of their
own experimental dataComment: 44 pages, 9 figure
Genome-wide profiling of chromosome interactions in Plasmodium falciparum characterizes nuclear architecture and reconfigurations associated with antigenic variation.
Spatial relationships within the eukaryotic nucleus are essential for proper nuclear function. In Plasmodium falciparum, the repositioning of chromosomes has been implicated in the regulation of the expression of genes responsible for antigenic variation, and the formation of a single, peri-nuclear nucleolus results in the clustering of rDNA. Nevertheless, the precise spatial relationships between chromosomes remain poorly understood, because, until recently, techniques with sufficient resolution have been lacking. Here we have used chromosome conformation capture and second-generation sequencing to study changes in chromosome folding and spatial positioning that occur during switches in var gene expression. We have generated maps of chromosomal spatial affinities within the P. falciparum nucleus at 25 Kb resolution, revealing a structured nucleolus, an absence of chromosome territories, and confirming previously identified clustering of heterochromatin foci. We show that switches in var gene expression do not appear to involve interaction with a distant enhancer, but do result in local changes at the active locus. These maps reveal the folding properties of malaria chromosomes, validate known physical associations, and characterize the global landscape of spatial interactions. Collectively, our data provide critical information for a better understanding of gene expression regulation and antigenic variation in malaria parasites
Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images
Beyond sample curation and basic pathologic characterization, the digitized H&E-stained images
of TCGA samples remain underutilized. To highlight this resource, we present mappings of tumorinfiltrating lymphocytes (TILs) based on H&E images from 13 TCGA tumor types. These TIL
maps are derived through computational staining using a convolutional neural network trained to
classify patches of images. Affinity propagation revealed local spatial structure in TIL patterns and
correlation with overall survival. TIL map structural patterns were grouped using standard
histopathological parameters. These patterns are enriched in particular T cell subpopulations
derived from molecular measures. TIL densities and spatial structure were differentially enriched
among tumor types, immune subtypes, and tumor molecular subtypes, implying that spatial
infiltrate state could reflect particular tumor cell aberration states. Obtaining spatial lymphocytic
patterns linked to the rich genomic characterization of TCGA samples demonstrates one use for
the TCGA image archives with insights into the tumor-immune microenvironment
- …