18,103 research outputs found
Information visualization for DNA microarray data analysis: A critical review
Graphical representation may provide effective means of making sense of the complexity and sheer volume of data produced by DNA microarray experiments that monitor the expression patterns of thousands of genes simultaneously. The ability to use ldquoabstractrdquo graphical representation to draw attention to areas of interest, and more in-depth visualizations to answer focused questions, would enable biologists to move from a large amount of data to particular records they are interested in, and therefore, gain deeper insights in understanding the microarray experiment results. This paper starts by providing some background knowledge of microarray experiments, and then, explains how graphical representation can be applied in general to this problem domain, followed by exploring the role of visualization in gene expression data analysis. Having set the problem scene, the paper then examines various multivariate data visualization techniques that have been applied to microarray data analysis. These techniques are critically reviewed so that the strengths and weaknesses of each technique can be tabulated. Finally, several key problem areas as well as possible solutions to them are discussed as being a source for future work
Exploring the spectroscopic diversity of type Ia supernovae with DRACULA: a machine learning approach
The existence of multiple subclasses of type Ia supernovae (SNeIa) has been
the subject of great debate in the last decade. One major challenge inevitably
met when trying to infer the existence of one or more subclasses is the time
consuming, and subjective, process of subclass definition. In this work, we
show how machine learning tools facilitate identification of subtypes of SNeIa
through the establishment of a hierarchical group structure in the continuous
space of spectral diversity formed by these objects. Using Deep Learning, we
were capable of performing such identification in a 4 dimensional feature space
(+1 for time evolution), while the standard Principal Component Analysis barely
achieves similar results using 15 principal components. This is evidence that
the progenitor system and the explosion mechanism can be described by a small
number of initial physical parameters. As a proof of concept, we show that our
results are in close agreement with a previously suggested classification
scheme and that our proposed method can grasp the main spectral features behind
the definition of such subtypes. This allows the confirmation of the velocity
of lines as a first order effect in the determination of SNIa subtypes,
followed by 91bg-like events. Given the expected data deluge in the forthcoming
years, our proposed approach is essential to allow a quick and statistically
coherent identification of SNeIa subtypes (and outliers). All tools used in
this work were made publicly available in the Python package Dimensionality
Reduction And Clustering for Unsupervised Learning in Astronomy (DRACULA) and
can be found within COINtoolbox (https://github.com/COINtoolbox/DRACULA).Comment: 16 pages, 12 figures, accepted for publication in MNRA
Discrete Fourier Transform Improves the Prediction of the Electronic Properties of Molecules in Quantum Machine Learning
High-throughput approximations of quantum mechanics calculations and
combinatorial experiments have been traditionally used to reduce the search
space of possible molecules, drugs and materials. However, the interplay of
structural and chemical degrees of freedom introduces enormous complexity,
which the current state-of-the-art tools are not yet designed to handle. The
availability of large molecular databases generated by quantum mechanics (QM)
computations using first principles open new venues for data science to
accelerate the discovery of new compounds. In recent years, models that combine
QM with machine learning (ML) known as QM/ML models have been successful at
delivering the accuracy of QM at the speed of ML. The goals are to develop a
framework that will accelerate the extraction of knowledge and to get insights
from quantitative process-structure-property-performance relationships hidden
in materials data via a better search of the chemical compound space, and to
infer new materials with targeted properties. In this study, we show that by
integrating well-known signal processing techniques such as discrete Fourier
transform in the QM/ML pipeline, the outcomes can be significantly improved in
some cases. We also show that the spectrogram of a molecule may represent an
interesting molecular visualization tool.Comment: 4 pages, 3 figures, 2 tables. Accepted to present at 32nd IEEE
Canadian Conference in Electrical Engineering and Computer Scienc
- âŠ