12,143 research outputs found
Geometric deep learning: going beyond Euclidean data
Many scientific fields study data with an underlying structure that is a
non-Euclidean space. Some examples include social networks in computational
social sciences, sensor networks in communications, functional networks in
brain imaging, regulatory networks in genetics, and meshed surfaces in computer
graphics. In many applications, such geometric data are large and complex (in
the case of social networks, on the scale of billions), and are natural targets
for machine learning techniques. In particular, we would like to use deep
neural networks, which have recently proven to be powerful tools for a broad
range of problems from computer vision, natural language processing, and audio
analysis. However, these tools have been most successful on data with an
underlying Euclidean or grid-like structure, and in cases where the invariances
of these structures are built into networks used to model them. Geometric deep
learning is an umbrella term for emerging techniques attempting to generalize
(structured) deep neural models to non-Euclidean domains such as graphs and
manifolds. The purpose of this paper is to overview different examples of
geometric deep learning problems and present available solutions, key
difficulties, applications, and future research directions in this nascent
field
An Emergent Space for Distributed Data with Hidden Internal Order through Manifold Learning
Manifold-learning techniques are routinely used in mining complex
spatiotemporal data to extract useful, parsimonious data
representations/parametrizations; these are, in turn, useful in nonlinear model
identification tasks. We focus here on the case of time series data that can
ultimately be modelled as a spatially distributed system (e.g. a partial
differential equation, PDE), but where we do not know the space in which this
PDE should be formulated. Hence, even the spatial coordinates for the
distributed system themselves need to be identified - to emerge from - the data
mining process. We will first validate this emergent space reconstruction for
time series sampled without space labels in known PDEs; this brings up the
issue of observability of physical space from temporal observation data, and
the transition from spatially resolved to lumped (order-parameter-based)
representations by tuning the scale of the data mining kernels. We will then
present actual emergent space discovery illustrations. Our illustrative
examples include chimera states (states of coexisting coherent and incoherent
dynamics), and chaotic as well as quasiperiodic spatiotemporal dynamics,
arising in partial differential equations and/or in heterogeneous networks. We
also discuss how data-driven spatial coordinates can be extracted in ways
invariant to the nature of the measuring instrument. Such gauge-invariant data
mining can go beyond the fusion of heterogeneous observations of the same
system, to the possible matching of apparently different systems
Hyperspectral colon tissue cell classification
A novel algorithm to discriminate between normal and malignant tissue cells of the human colon is presented. The microscopic level images of human colon tissue cells were acquired using hyperspectral imaging technology at contiguous wavelength intervals of visible light. While hyperspectral imagery data provides a wealth of information, its large size normally means high computational processing complexity. Several methods exist to avoid the so-called curse of dimensionality and hence reduce the computational complexity. In this study, we experimented with Principal Component Analysis (PCA) and two modifications of Independent Component Analysis (ICA). In the first stage of the algorithm, the extracted components are used to separate four constituent parts of the colon tissue: nuclei, cytoplasm, lamina propria, and lumen. The segmentation is performed in an unsupervised fashion using the nearest centroid clustering algorithm. The segmented image is further used, in the second stage of the classification algorithm, to exploit the spatial relationship between the labeled constituent parts. Experimental results using supervised Support Vector Machines (SVM) classification based on multiscale morphological features reveal the discrimination between normal and malignant tissue cells with a reasonable degree of accuracy
The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch
Recent and forthcoming advances in instrumentation, and giant new surveys,
are creating astronomical data sets that are not amenable to the methods of
analysis familiar to astronomers. Traditional methods are often inadequate not
merely because of the size in bytes of the data sets, but also because of the
complexity of modern data sets. Mathematical limitations of familiar algorithms
and techniques in dealing with such data sets create a critical need for new
paradigms for the representation, analysis and scientific visualization (as
opposed to illustrative visualization) of heterogeneous, multiresolution data
across application domains. Some of the problems presented by the new data sets
have been addressed by other disciplines such as applied mathematics,
statistics and machine learning and have been utilized by other sciences such
as space-based geosciences. Unfortunately, valuable results pertaining to these
problems are mostly to be found only in publications outside of astronomy. Here
we offer brief overviews of a number of concepts, techniques and developments,
some "old" and some new. These are generally unknown to most of the
astronomical community, but are vital to the analysis and visualization of
complex datasets and images. In order for astronomers to take advantage of the
richness and complexity of the new era of data, and to be able to identify,
adopt, and apply new solutions, the astronomical community needs a certain
degree of awareness and understanding of the new concepts. One of the goals of
this paper is to help bridge the gap between applied mathematics, artificial
intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in
Astronomy, special issue "Robotic Astronomy
Improved decision making with similarity based machine learning
Despite their fundamental importance for science and society at large,
experimental design decisions are often plagued by extreme data scarcity which
severely hampers the use of modern ready-made machine learning models as they
rely heavily on the paradigm, 'the bigger the data the better'. Presenting
similarity based machine learning we show how to reduce these data needs such
that decision making can be objectively improved in certain problem classes.
After introducing similarity machine learning for the harmonic oscillator and
the Rosenbrock function, we describe real-world applications to very scarce
data scenarios which include (i) quantum mechanics based molecular design, (ii)
organic synthesis planning, and (iii) real estate investment decisions in the
city of Berlin, Germany
- …