2,466 research outputs found
Data Deluge in Astrophysics: Photometric Redshifts as a Template Use Case
Astronomy has entered the big data era and Machine Learning based methods
have found widespread use in a large variety of astronomical applications. This
is demonstrated by the recent huge increase in the number of publications
making use of this new approach. The usage of machine learning methods, however
is still far from trivial and many problems still need to be solved. Using the
evaluation of photometric redshifts as a case study, we outline the main
problems and some ongoing efforts to solve them.Comment: 13 pages, 3 figures, Springer's Communications in Computer and
Information Science (CCIS), Vol. 82
Disturbance Grassmann Kernels for Subspace-Based Learning
In this paper, we focus on subspace-based learning problems, where data
elements are linear subspaces instead of vectors. To handle this kind of data,
Grassmann kernels were proposed to measure the space structure and used with
classifiers, e.g., Support Vector Machines (SVMs). However, the existing
discriminative algorithms mostly ignore the instability of subspaces, which
would cause the classifiers misled by disturbed instances. Thus we propose
considering all potential disturbance of subspaces in learning processes to
obtain more robust classifiers. Firstly, we derive the dual optimization of
linear classifiers with disturbance subject to a known distribution, resulting
in a new kernel, Disturbance Grassmann (DG) kernel. Secondly, we research into
two kinds of disturbance, relevant to the subspace matrix and singular values
of bases, with which we extend the Projection kernel on Grassmann manifolds to
two new kernels. Experiments on action data indicate that the proposed kernels
perform better compared to state-of-the-art subspace-based methods, even in a
worse environment.Comment: This paper include 3 figures, 10 pages, and has been accpeted to
SIGKDD'1
Network Density of States
Spectral analysis connects graph structure to the eigenvalues and
eigenvectors of associated matrices. Much of spectral graph theory descends
directly from spectral geometry, the study of differentiable manifolds through
the spectra of associated differential operators. But the translation from
spectral geometry to spectral graph theory has largely focused on results
involving only a few extreme eigenvalues and their associated eigenvalues.
Unlike in geometry, the study of graphs through the overall distribution of
eigenvalues - the spectral density - is largely limited to simple random graph
models. The interior of the spectrum of real-world graphs remains largely
unexplored, difficult to compute and to interpret.
In this paper, we delve into the heart of spectral densities of real-world
graphs. We borrow tools developed in condensed matter physics, and add novel
adaptations to handle the spectral signatures of common graph motifs. The
resulting methods are highly efficient, as we illustrate by computing spectral
densities for graphs with over a billion edges on a single compute node. Beyond
providing visually compelling fingerprints of graphs, we show how the
estimation of spectral densities facilitates the computation of many common
centrality measures, and use spectral densities to estimate meaningful
information about graph structure that cannot be inferred from the extremal
eigenpairs alone.Comment: 10 pages, 7 figure
Data mining in soft computing framework: a survey
The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included
Mining Heterogeneous Multivariate Time-Series for Learning Meaningful Patterns: Application to Home Health Telecare
For the last years, time-series mining has become a challenging issue for
researchers. An important application lies in most monitoring purposes, which
require analyzing large sets of time-series for learning usual patterns. Any
deviation from this learned profile is then considered as an unexpected
situation. Moreover, complex applications may involve the temporal study of
several heterogeneous parameters. In that paper, we propose a method for mining
heterogeneous multivariate time-series for learning meaningful patterns. The
proposed approach allows for mixed time-series -- containing both pattern and
non-pattern data -- such as for imprecise matches, outliers, stretching and
global translating of patterns instances in time. We present the early results
of our approach in the context of monitoring the health status of a person at
home. The purpose is to build a behavioral profile of a person by analyzing the
time variations of several quantitative or qualitative parameters recorded
through a provision of sensors installed in the home
Data mining in manufacturing: a review based on the kind of knowledge
In modern manufacturing environments, vast amounts of data are collected in database management systems and data warehouses from all involved areas, including product and process design, assembly, materials planning, quality control, scheduling, maintenance, fault detection etc. Data mining has emerged as an important tool for knowledge acquisition from the manufacturing databases. This paper reviews the literature dealing with knowledge discovery and data mining applications in the broad domain of manufacturing with a special emphasis on the type of functions to be performed on the data. The major data mining functions to be performed include characterization and description, association, classification, prediction, clustering and evolution analysis. The papers reviewed have therefore been categorized in these five categories. It has been shown that there is a rapid growth in the application of data mining in the context of manufacturing processes and enterprises in the last 3 years. This review reveals the progressive applications and existing gaps identified in the context of data mining in manufacturing. A novel text mining approach has also been used on the abstracts and keywords of 150 papers to identify the research gaps and find the linkages between knowledge area, knowledge type and the applied data mining tools and techniques
- …