62,550 research outputs found
Genomic applications of statistical signal processing
Biological phenomena in the cells can be explained in terms of the interactions among
biological macro-molecules, e.g., DNAs, RNAs and proteins. These interactions can
be modeled by genetic regulatory networks (GRNs). This dissertation proposes to
reverse engineering the GRNs based on heterogeneous biological data sets, including
time-series and time-independent gene expressions, Chromatin ImmunoPrecipatation
(ChIP) data, gene sequence and motifs and other possible sources of knowledge. The
objective of this research is to propose novel computational methods to catch pace
with the fast evolving biological databases.
Signal processing techniques are exploited to develop computationally efficient,
accurate and robust algorithms, which deal individually or collectively with various
data sets. Methods of power spectral density estimation are discussed to identify
genes participating in various biological processes. Information theoretic methods are
applied for non-parametric inference. Bayesian methods are adopted to incorporate several sources with prior knowledge. This work aims to construct an inference system
which takes into account different sources of information such that the absence of some
components will not interfere with the rest of the system.
It has been verified that the proposed algorithms achieve better inference accuracy
and higher computational efficiency compared with other state-of-the-art schemes,
e.g. REVEAL, ARACNE, Bayesian Networks and Relevance Networks, at presence
of artificial time series and steady state microarray measurements. The proposed algorithms
are especially appealing when the the sample size is small. Besides, they are
able to integrate multiple heterogeneous data sources, e.g. ChIP and sequence data,
so that a unified GRN can be inferred. The analysis of biological literature and in
silico experiments on real data sets for fruit fly, yeast and human have corroborated
part of the inferred GRN. The research has also produced a set of potential control
targets for designing gene therapy strategies
Detection of Rheumatic Arthritis Disease Based on Genomic Analysis by Applying Wavelet transform
In Recent years there is greater advance and innovations in bioinformatics. Bioinformatics is concerned with applying statistical and computational methods and also genomic signal processing techniques for analysis of data determined from sequenced DNA or RNA or Protein. To use genomic signal processing principle for analyze of DNA sequence first the DNA sequences of alphabetic string as to be converted into string of numeric sequence. This paper present the applications of wavelet transform based on the energy levels of approximation and detailed coefficients for sequence analysis with Chargaff’s rule, internucleotide distance to compare two sequence similarities and determine the impact score so that to diagnose the Rheumatic Arthritis(RA)
Recovering Sparse Signals Using Sparse Measurement Matrices in Compressed DNA Microarrays
Microarrays (DNA, protein, etc.) are massively parallel affinity-based biosensors capable of detecting and quantifying a large number of different genomic particles simultaneously. Among them, DNA microarrays comprising tens of thousands of probe spots are currently being employed to test multitude of targets in a single experiment. In conventional microarrays, each spot contains a large number of copies of a single probe designed to capture a single target, and, hence, collects only a single data point. This is a wasteful use of the sensing resources in comparative DNA microarray experiments, where a test sample is measured relative to a reference sample. Typically, only a fraction of the total number of genes represented by the two samples is differentially expressed, and, thus, a vast number of probe spots may not provide any useful information. To this end, we propose an alternative design, the so-called compressed microarrays, wherein each spot contains copies of several different probes and the total number of spots is potentially much smaller than the number of targets being tested. Fewer spots directly translates to significantly lower costs due to cheaper array manufacturing, simpler image acquisition and processing, and smaller amount of genomic material needed for experiments. To recover signals from compressed microarray measurements, we leverage ideas from compressive sampling. For sparse measurement matrices, we propose an algorithm that has significantly lower computational complexity than the widely used linear-programming-based methods, and can also recover signals with less sparsity
Computational identification and analysis of noncoding RNAs - Unearthing the buried treasures in the genome
The central dogma of molecular biology states that the genetic information flows from DNA to RNA to protein. This dogma has exerted a substantial influence on our understanding of the genetic activities in the cells. Under this influence, the prevailing assumption until the recent past was that genes are basically repositories for protein coding information, and proteins are responsible for most of the important biological functions in all cells. In the meanwhile, the importance of RNAs has remained rather obscure, and RNA was mainly viewed as a passive intermediary that bridges the gap between DNA and protein. Except for classic examples such as tRNAs (transfer RNAs) and rRNAs (ribosomal RNAs), functional noncoding RNAs were considered to be rare.
However, this view has experienced a dramatic change during the last decade, as systematic screening of various genomes identified myriads of noncoding RNAs (ncRNAs), which are RNA molecules that function without being translated into proteins [11], [40]. It has been realized that many ncRNAs play important roles in various biological processes. As RNAs can interact with other RNAs and DNAs in a sequence-specific manner, they are especially useful in tasks that require highly specific nucleotide recognition [11]. Good examples are the miRNAs (microRNAs) that regulate gene expression by targeting mRNAs (messenger RNAs) [4], [20], and the siRNAs (small interfering RNAs) that take part in the RNAi (RNA interference) pathways for gene silencing [29], [30]. Recent developments show that ncRNAs are extensively involved in many gene regulatory mechanisms [14], [17].
The roles of ncRNAs known to this day are truly diverse. These include transcription and translation control, chromosome replication, RNA processing and modification, and protein degradation and translocation [40], just to name a few. These days, it is even claimed that ncRNAs dominate the genomic output of the higher organisms such as mammals, and it is being suggested that the greater portion of their genome (which does not encode proteins) is dedicated to the control and regulation of cell development [27]. As more and more evidence piles up, greater attention is paid to ncRNAs, which have been neglected for a long time. Researchers began to realize that the vast majority of the genome that was regarded as “junk,” mainly because it was not well understood, may indeed hold the key for the best kept secrets in life, such as the mechanism of alternative splicing, the control of epigenetic variations and so forth [27]. The complete range and extent of the role of ncRNAs are not so obvious at this point, but it is certain that a comprehensive understanding of cellular processes is not possible without understanding the functions of ncRNAs [47]
A decision-theoretic approach for segmental classification
This paper is concerned with statistical methods for the segmental
classification of linear sequence data where the task is to segment and
classify the data according to an underlying hidden discrete state sequence.
Such analysis is commonplace in the empirical sciences including genomics,
finance and speech processing. In particular, we are interested in answering
the following question: given data and a statistical model of
the hidden states , what should we report as the prediction under
the posterior distribution ? That is, how should you make a
prediction of the underlying states? We demonstrate that traditional approaches
such as reporting the most probable state sequence or most probable set of
marginal predictions can give undesirable classification artefacts and offer
limited control over the properties of the prediction. We propose a decision
theoretic approach using a novel class of Markov loss functions and report
via the principle of minimum expected loss (maximum expected
utility). We demonstrate that the sequence of minimum expected loss under the
Markov loss function can be enumerated exactly using dynamic programming
methods and that it offers flexibility and performance improvements over
existing techniques. The result is generic and applicable to any probabilistic
model on a sequence, such as Hidden Markov models, change point or product
partition models.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS657 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Local Binary Patterns as a Feature Descriptor in Alignment-free Visualisation of Metagenomic Data
Shotgun sequencing has facilitated the analysis of complex microbial communities. However, clustering and visualising these communities without prior taxonomic information is a major challenge. Feature descriptor methods can be utilised to extract these taxonomic relations from the data. Here, we present a novel approach consisting of local binary patterns (LBP) coupled with randomised singular value decomposition (RSVD) and Barnes-Hut t-stochastic neighbor embedding (BH-tSNE) to highlight the underlying taxonomic structure of the metagenomic data. The effectiveness of our approach is demonstrated using several simulated and a real metagenomic datasets
Information profiles for DNA pattern discovery
Finite-context modeling is a powerful tool for compressing and hence for
representing DNA sequences. We describe an algorithm to detect genomic
regularities, within a blind discovery strategy. The algorithm uses information
profiles built using suitable combinations of finite-context models. We used
the genome of the fission yeast Schizosaccharomyces pombe strain 972 h- for
illustration, unveilling locations of low information content, which are
usually associated with DNA regions of potential biological interest.Comment: Full version of DCC 2014 paper "Information profiles for DNA pattern
discovery
Cellular decision-making bias: the missing ingredient in cell functional diversity
Cell functional diversity is a significant determinant on how biological
processes unfold. Most accounts of diversity involve a search for sequence or
expression differences. Perhaps there are more subtle mechanisms at work. Using
the metaphor of information processing and decision-making might provide a
clearer view of these subtleties. Understanding adaptive and transformative
processes (such as cellular reprogramming) as a series of simple decisions
allows us to use a technique called cellular signal detection theory (cellular
SDT) to detect potential bias in mechanisms that favor one outcome over
another. We can apply method of detecting cellular reprogramming bias to
cellular reprogramming and other complex molecular processes. To demonstrate
scope of this method, we will critically examine differences between cell
phenotypes reprogrammed to muscle fiber and neuron phenotypes. In cases where
the signature of phenotypic bias is cryptic, signatures of genomic bias
(pre-existing and induced) may provide an alternative. The examination of these
alternates will be explored using data from a series of fibroblast cell lines
before cellular reprogramming (pre-existing) and differences between fractions
of cellular RNA for individual genes after drug treatment (induced). In
conclusion, the usefulness and limitations of this method and associated
analogies will be discussed.Comment: 18 pages; 6 figures, 2 tables, 4 supplemental figure
A statistical approach for array CGH data analysis
BACKGROUND: Microarray-CGH experiments are used to detect and map chromosomal imbalances, by hybridizing targets of genomic DNA from a test and a reference sample to sequences immobilized on a slide. These probes are genomic DNA sequences (BACs) that are mapped on the genome. The signal has a spatial coherence that can be handled by specific statistical tools. Segmentation methods seem to be a natural framework for this purpose. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose BACs share the same relative copy number on average. We model a CGH profile by a random Gaussian process whose distribution parameters are affected by abrupt changes at unknown coordinates. Two major problems arise : to determine which parameters are affected by the abrupt changes (the mean and the variance, or the mean only), and the selection of the number of segments in the profile. RESULTS: We demonstrate that existing methods for estimating the number of segments are not well adapted in the case of array CGH data, and we propose an adaptive criterion that detects previously mapped chromosomal aberrations. The performances of this method are discussed based on simulations and publicly available data sets. Then we discuss the choice of modeling for array CGH data and show that the model with a homogeneous variance is adapted to this context. CONCLUSIONS: Array CGH data analysis is an emerging field that needs appropriate statistical tools. Process segmentation and model selection provide a theoretical framework that allows precise biological interpretations. Adaptive methods for model selection give promising results concerning the estimation of the number of altered regions on the genome
- …