8,013 research outputs found
Information visualization for DNA microarray data analysis: A critical review
Graphical representation may provide effective means of making sense of the complexity and sheer volume of data produced by DNA microarray experiments that monitor the expression patterns of thousands of genes simultaneously. The ability to use ldquoabstractrdquo graphical representation to draw attention to areas of interest, and more in-depth visualizations to answer focused questions, would enable biologists to move from a large amount of data to particular records they are interested in, and therefore, gain deeper insights in understanding the microarray experiment results. This paper starts by providing some background knowledge of microarray experiments, and then, explains how graphical representation can be applied in general to this problem domain, followed by exploring the role of visualization in gene expression data analysis. Having set the problem scene, the paper then examines various multivariate data visualization techniques that have been applied to microarray data analysis. These techniques are critically reviewed so that the strengths and weaknesses of each technique can be tabulated. Finally, several key problem areas as well as possible solutions to them are discussed as being a source for future work
Relatedness Measures to Aid the Transfer of Building Blocks among Multiple Tasks
Multitask Learning is a learning paradigm that deals with multiple different
tasks in parallel and transfers knowledge among them. XOF, a Learning
Classifier System using tree-based programs to encode building blocks
(meta-features), constructs and collects features with rich discriminative
information for classification tasks in an observed list. This paper seeks to
facilitate the automation of feature transferring in between tasks by utilising
the observed list. We hypothesise that the best discriminative features of a
classification task carry its characteristics. Therefore, the relatedness
between any two tasks can be estimated by comparing their most appropriate
patterns. We propose a multiple-XOF system, called mXOF, that can dynamically
adapt feature transfer among XOFs. This system utilises the observed list to
estimate the task relatedness. This method enables the automation of
transferring features. In terms of knowledge discovery, the resemblance
estimation provides insightful relations among multiple data. We experimented
mXOF on various scenarios, e.g. representative Hierarchical Boolean problems,
classification of distinct classes in the UCI Zoo dataset, and unrelated tasks,
to validate its abilities of automatic knowledge-transfer and estimating task
relatedness. Results show that mXOF can estimate the relatedness reasonably
between multiple tasks to aid the learning performance with the dynamic feature
transferring.Comment: accepted by The Genetic and Evolutionary Computation Conference
(GECCO 2020
Learning spatio-temporal representations for action recognition: A genetic programming approach
Extracting discriminative and robust features from video sequences is the first and most critical step in human action recognition. In this paper, instead of using handcrafted features, we automatically learn spatio-temporal motion features for action recognition. This is achieved via an evolutionary method, i.e., genetic programming (GP), which evolves the motion feature descriptor on a population of primitive 3D operators (e.g., 3D-Gabor and wavelet). In this way, the scale and shift invariant features can be effectively extracted from both color and optical flow sequences. We intend to learn data adaptive descriptors for different datasets with multiple layers, which makes fully use of the knowledge to mimic the physical structure of the human visual cortex for action recognition and simultaneously reduce the GP searching space to effectively accelerate the convergence of optimal solutions. In our evolutionary architecture, the average cross-validation classification error, which is calculated by an support-vector-machine classifier on the training set, is adopted as the evaluation criterion for the GP fitness function. After the entire evolution procedure finishes, the best-so-far solution selected by GP is regarded as the (near-)optimal action descriptor obtained. The GP-evolving feature extraction method is evaluated on four popular action datasets, namely KTH, HMDB51, UCF YouTube, and Hollywood2. Experimental results show that our method significantly outperforms other types of features, either hand-designed or machine-learned
Gene expression data analysis using novel methods: Predicting time delayed correlations and evolutionarily conserved functional modules
Microarray technology enables the study of gene expression on a large scale. One of the main challenges has been to devise methods to cluster genes that share similar expression profiles. In gene expression time courses, a particular gene may encode transcription factor and thus controlling several genes downstream; in this case, the gene expression profiles may be staggered, indicating a time-delayed response in transcription of the later genes. The standard clustering algorithms consider gene expression profiles in a global way, thus often ignoring such local time-delayed correlations. We have developed novel methods to capture time-delayed correlations between expression profiles: (1) A method using dynamic programming and (2) CLARITY, an algorithm that uses a local shape based similarity measure to predict time-delayed correlations and local correlations. We used CLARITY on a dataset describing the change in gene expression during the mitotic cell cycle in Saccharomyces cerevisiae. The obtained clusters were significantly enriched with genes that share similar functions, reflecting the fact that genes with a similar function are often co-regulated and thus co-expressed. Time-shifted as well as local correlations could also be predicted using CLARITY.
In datasets, where the expression profiles of independent experiments are compared, the standard clustering algorithms often cluster according to all conditions, considering all genes. This increases the background noise and can lead to the missing of genes that change the expression only under particular conditions. We have employed a genetic algorithm based module predictor that is capable to identify group of genes that change their expression only in a subset of conditions. With the aim of supplementing the Ustilago maydis genome annotation, we have used the module prediction algorithm on various independent datasets from Ustilago maydis. The predicted modules were cross-referenced in various Saccharomyces cerevisiae datasets to check its evolutionarily conservation between these two organisms. The key contributions of this thesis are novel methods that explore biological information from DNA microarray data
Gene expression data analysis using novel methods: Predicting time delayed correlations and evolutionarily conserved functional modules
Microarray technology enables the study of gene expression on a large scale. One of the main challenges has been to devise methods to cluster genes that share similar expression profiles. In gene expression time courses, a particular gene may encode transcription factor and thus controlling several genes downstream; in this case, the gene expression profiles may be staggered, indicating a time-delayed response in transcription of the later genes. The standard clustering algorithms consider gene expression profiles in a global way, thus often ignoring such local time-delayed correlations. We have developed novel methods to capture time-delayed correlations between expression profiles: (1) A method using dynamic programming and (2) CLARITY, an algorithm that uses a local shape based similarity measure to predict time-delayed correlations and local correlations. We used CLARITY on a dataset describing the change in gene expression during the mitotic cell cycle in Saccharomyces cerevisiae. The obtained clusters were significantly enriched with genes that share similar functions, reflecting the fact that genes with a similar function are often co-regulated and thus co-expressed. Time-shifted as well as local correlations could also be predicted using CLARITY.
In datasets, where the expression profiles of independent experiments are compared, the standard clustering algorithms often cluster according to all conditions, considering all genes. This increases the background noise and can lead to the missing of genes that change the expression only under particular conditions. We have employed a genetic algorithm based module predictor that is capable to identify group of genes that change their expression only in a subset of conditions. With the aim of supplementing the Ustilago maydis genome annotation, we have used the module prediction algorithm on various independent datasets from Ustilago maydis. The predicted modules were cross-referenced in various Saccharomyces cerevisiae datasets to check its evolutionarily conservation between these two organisms. The key contributions of this thesis are novel methods that explore biological information from DNA microarray data
Feature-based time-series analysis
This work presents an introduction to feature-based time-series analysis. The
time series as a data type is first described, along with an overview of the
interdisciplinary time-series analysis literature. I then summarize the range
of feature-based representations for time series that have been developed to
aid interpretable insights into time-series structure. Particular emphasis is
given to emerging research that facilitates wide comparison of feature-based
representations that allow us to understand the properties of a time-series
dataset that make it suited to a particular feature-based representation or
analysis algorithm. The future of time-series analysis is likely to embrace
approaches that exploit machine learning methods to partially automate human
learning to aid understanding of the complex dynamical patterns in the time
series we measure from the world.Comment: 28 pages, 9 figure
The era of big data: Genome-scale modelling meets machine learning
With omics data being generated at an unprecedented rate, genome-scale modelling has become pivotal in its organisation and analysis. However, machine learning methods have been gaining ground in cases where knowledge is insufficient to represent the mechanisms underlying such data or as a means for data curation prior to attempting mechanistic modelling. We discuss the latest advances in genome-scale modelling and the development of optimisation algorithms for network and error reduction, intracellular constraining and applications to strain design. We further review applications of supervised and unsupervised machine learning methods to omics datasets from microbial and mammalian cell systems and present efforts to harness the potential of both modelling approaches through hybrid modelling
- …