32 research outputs found
Gene specific co-regulation discovery: an improved approach
[Abstract]: Discovering gene co-regulatory relationships is a new but important research problem in DNA microarray data analysis. The problem of gene specific co-regulation discovery is to, for a particular gene of interest, called the target gene, identify its strongly co-regulated genes and the condition subsets where such strong gene co-regulations are observed. The study on this problem can contribute to a better understanding and characterization of the target gene. The existing method, using the genetic algorithm (GA), is slow due to its expensive fitness evaluation and long individual representation. In this paper, we propose an improved method for finding gene specific co-regulations. Compared with the current method, our method features a notably improved effciency. We employ kNN Search Table to substantially speed up fitness evaluation in the GA. We also propose a more compact representation scheme for encoding individuals in the GA, which contributes to faster crossover and mutation operations. Experimental results with a real-life gene mi-croarray data set demonstrate the improved effciency of our technique compared with the current method
DNA meets the SVD
This paper introduces an important area of computational cell biology where complex, publicly available genomic data is being examined by linear algebra methods, with the aim of revealing biological and medical insights
Nonlinear Model-Based Method for Clustering Periodically Expressed Genes
Clustering periodically expressed genes from their time-course expression data could help understand the molecular mechanism of those biological processes. In this paper, we propose a nonlinear model-based clustering method for periodically expressed gene profiles. As periodically expressed genes are associated with periodic biological processes, the proposed method naturally assumes that a periodically expressed gene dataset is generated by a number of periodical processes. Each periodical process is modelled by a linear combination of trigonometric sine and cosine functions in time plus a Gaussian noise term. A two stage method is proposed to estimate the model parameter, and a relocation-iteration algorithm is employed to assign each gene to an appropriate cluster. A bootstrapping method and an average adjusted Rand index (AARI) are employed to measure the quality of clustering. One synthetic dataset and two biological datasets were employed to evaluate the performance of the proposed method. The results show that our method allows the better quality clustering than other clustering methods (e.g., k-means) for periodically expressed gene data, and thus it is an effective cluster analysis method for periodically expressed gene data
Outlier Filtering for Identification of Gene Regulations in Microarray Time-Series Data
[[abstract]]Microarray technology provides an opportunity for scientists to analyze thousands of gene expression profiles simultaneously. Time-series microarray data are gene expression values generated from microarray experiments within certain time intervals. Scientists can infer gene regulations in a biological system by judging whether two genes present similar gene expression values in microarray time-series data. Recently, a great many methods are widely applied on microarray time-series data to find out the similarity and the correlation degree among genes. Existing approaches including traditional Pearson coefficient correlation, Bayesian networks, clustering analysis, classification methods, and correlation analysis have individual disadvantages such as high computational complexity or they may be unsuitable for some microarray data. Traditional Pearson correlation coefficient is a numeric measuring method which gives novel effectiveness on two sets of numeric data. However, it is not suitable to be applied on microarray time-series data because of the existence of outliers among gene expression values. This paper presents a novel method of applying Pearson correlation coefficient along with an outlier filtering procedure on the widely-used microarray time-series datasets. Results show that the proposed method produces a better outcome compared with traditional Pearson correlation coefficient on the same dataset. Results show that the proposed method not only can find out certain more known regulatory gene pairs, but also keeps rational computational time.[[conferencetype]]國際[[conferencedate]]20090316~20090319[[iscallforpapers]]Y[[conferencelocation]]Fukuoka, Japa
Determination of the minimum number of microarray experiments for discovery of gene expression patterns
BACKGROUND: One type of DNA microarray experiment is discovery of gene expression patterns for a cell line undergoing a biological process over a series of time points. Two important issues with such an experiment are the number of time points, and the interval between them. In the absence of biological knowledge regarding appropriate values, it is natural to question whether the behaviour of progressively generated data may by itself determine a threshold beyond which further microarray experiments do not contribute to pattern discovery. Additionally, such a threshold implies a minimum number of microarray experiments, which is important given the cost of these experiments. RESULTS: We have developed a method for determining the minimum number of microarray experiments (i.e. time points) for temporal gene expression, assuming that the span between time points is given and the hierarchical clustering technique is used for gene expression pattern discovery. The key idea is a similarity measure for two clusterings which is expressed as a function of the data for progressive time points. While the experiments are underway, this function is evaluated. When the function reaches its maximum, it indicates the set of experiments reach a saturated state. Therefore, further experiments do not contribute to the discrimination of patterns. CONCLUSION: The method has been verified with two previously published gene expression datasets. For both experiments, the number of time points determined with our method is less than in the published experiments. It is noted that the overall approach is applicable to other clustering techniques
MapReduce Algorithms for Inferring Gene Regulatory Networks from Time-Series Microarray Data Using an Information-Theoretic Approach
Gene regulation is a series of processes that control gene expression and its
extent. The connections among genes and their regulatory molecules, usually
transcription factors, and a descriptive model of such connections, are known
as gene regulatory networks (GRNs). Elucidating GRNs is crucial to understand
the inner workings of the cell and the complexity of gene interactions. To
date, numerous algorithms have been developed to infer gene regulatory
networks. However, as the number of identified genes increases and the
complexity of their interactions is uncovered, networks and their regulatory
mechanisms become cumbersome to test. Furthermore, prodding through
experimental results requires an enormous amount of computation, resulting in
slow data processing. Therefore, new approaches are needed to expeditiously
analyze copious amounts of experimental data resulting from cellular GRNs. To
meet this need, cloud computing is promising as reported in the literature.
Here we propose new MapReduce algorithms for inferring gene regulatory networks
on a Hadoop cluster in a cloud environment. These algorithms employ an
information-theoretic approach to infer GRNs using time-series microarray data.
Experimental results show that our MapReduce program is much faster than an
existing tool while achieving slightly better prediction accuracy than the
existing tool.Comment: 19 pages, 5 figure
Data analytics with mapreduce in apache spark and hadoop systems
MapReduce comes from a traditional problem solving method: separating a big problem and solving each small parts. With the target of computing larger dataset in more efficient and cheaper way, this is implement into a programming mode to deal with massive quantity of data. The users get a map function and use it to abstract dataset into key / value logical pair and then use a reduce function to group all value with the same key. With this mode, task can be automatic spread the job into clusters grouped by lots of normal computers. MapReduce program can be easily implemented and gain much more efficiency than tradition computing programs. In this paper there are some sample programs and one GRN detection algorithm program to study about it.
Detecting gene regulatory networks (GRN), the regulatory molecules connection among various genes, is one of the main subjects in understanding gene biology. Although there are algorithms developed for this target, the increase of gene size and their complexity make the processing time more and more hard and slow. MapReduce mode with parallelize computing can be one way to overcome these problems. In this paper, a well-defined framework to parallelize mutual information algorithm is presented. The experiments and result performances shows the improvement of using parallelizing MapReduce model
Extracting binary signals from microarray time-course data
This article presents a new method for analyzing microarray time courses by identifying genes that undergo abrupt transitions in expression level, and the time at which the transitions occur. The algorithm matches the sequence of expression levels for each gene against temporal patterns having one or two transitions between two expression levels. The algorithm reports a P-value for the matching pattern of each gene, and a global false discovery rate can also be computed. After matching, genes can be sorted by the direction and time of transitions. Genes can be partitioned into sets based on the direction and time of change for further analysis, such as comparison with Gene Ontology annotations or binding site motifs. The method is evaluated on simulated and actual time-course data. On microarray data for budding yeast, it is shown that the groups of genes that change in similar ways and at similar times have significant and relevant Gene Ontology annotations
Extracting binary signals from microarray time-course data
This article presents a new method for analyzing microarray time courses by identifying genes that undergo abrupt transitions in expression level, and the time at which the transitions occur. The algorithm matches the sequence of expression levels for each gene against temporal patterns having one or two transitions between two expression levels. The algorithm reports a P-value for the matching pattern of each gene, and a global false discovery rate can also be computed. After matching, genes can be sorted by the direction and time of transitions. Genes can be partitioned into sets based on the direction and time of change for further analysis, such as comparison with Gene Ontology annotations or binding site motifs. The method is evaluated on simulated and actual time-course data. On microarray data for budding yeast, it is shown that the groups of genes that change in similar ways and at similar times have significant and relevant Gene Ontology annotations