39,665 research outputs found
On mining complex sequential data by means of FCA and pattern structures
Nowadays data sets are available in very complex and heterogeneous ways.
Mining of such data collections is essential to support many real-world
applications ranging from healthcare to marketing. In this work, we focus on
the analysis of "complex" sequential data by means of interesting sequential
patterns. We approach the problem using the elegant mathematical framework of
Formal Concept Analysis (FCA) and its extension based on "pattern structures".
Pattern structures are used for mining complex data (such as sequences or
graphs) and are based on a subsumption operation, which in our case is defined
with respect to the partial order on sequences. We show how pattern structures
along with projections (i.e., a data reduction of sequential structures), are
able to enumerate more meaningful patterns and increase the computing
efficiency of the approach. Finally, we show the applicability of the presented
method for discovering and analyzing interesting patient patterns from a French
healthcare data set on cancer. The quantitative and qualitative results (with
annotations and analysis from a physician) are reported in this use case which
is the main motivation for this work.
Keywords: data mining; formal concept analysis; pattern structures;
projections; sequences; sequential data.Comment: An accepted publication in International Journal of General Systems.
The paper is created in the wake of the conference on Concept Lattice and
their Applications (CLA'2013). 27 pages, 9 figures, 3 table
Pattern Recognition In Non-Kolmogorovian Structures
We present a generalization of the problem of pattern recognition to
arbitrary probabilistic models. This version deals with the problem of
recognizing an individual pattern among a family of different species or
classes of objects which obey probabilistic laws which do not comply with
Kolmogorov's axioms. We show that such a scenario accommodates many important
examples, and in particular, we provide a rigorous definition of the classical
and the quantum pattern recognition problems, respectively. Our framework
allows for the introduction of non-trivial correlations (as entanglement or
discord) between the different species involved, opening the door to a new way
of harnessing these physical resources for solving pattern recognition
problems. Finally, we present some examples and discuss the computational
complexity of the quantum pattern recognition problem, showing that the most
important quantum computation algorithms can be described as non-Kolmogorovian
pattern recognition problems
Informative Data Projections: A Framework and Two Examples
Methods for Projection Pursuit aim to facilitate the visual exploration of
high-dimensional data by identifying interesting low-dimensional projections. A
major challenge is the design of a suitable quality metric of projections,
commonly referred to as the projection index, to be maximized by the Projection
Pursuit algorithm. In this paper, we introduce a new information-theoretic
strategy for tackling this problem, based on quantifying the amount of
information the projection conveys to a user given their prior beliefs about
the data. The resulting projection index is a subjective quantity, explicitly
dependent on the intended user. As a useful illustration, we developed this
idea for two particular kinds of prior beliefs. The first kind leads to PCA
(Principal Component Analysis), shining new light on when PCA is (not)
appropriate. The second kind leads to a novel projection index, the
maximization of which can be regarded as a robust variant of PCA. We show how
this projection index, though non-convex, can be effectively maximized using a
modified power method as well as using a semidefinite programming relaxation.
The usefulness of this new projection index is demonstrated in comparative
empirical experiments against PCA and a popular Projection Pursuit method
Recommended from our members
A conceptual design tool: Sketch and fuzzy logic based system
A real time sketch and fuzzy logic based prototype system for conceptual design has been developed. This system comprises four phases. In the first one, the system accepts the input of on-line free-hand sketches, and segments them into meaningful parts by using fuzzy knowledge to detect corners and inflection points on the sketched curves. The fuzzy knowledge is applied to capture userâs drawing intention in terms of sketching position, direction, speed and acceleration. During the second phase, each segmented sub-part (curve) can be classified and identified as one of the following 2D primitives: straight lines, circles, circular arcs, ellipses, elliptical arcs or B-spline curves. Then, 2D topology information (connectivity, unitary constraints and pairwise constraints) is extracted dynamically from the identified 2D primitives. From the extracted information, a more accurate 2D geometry can be built up by a 2D geometric constraint solver. The 2D topology and geometry information is then employed to further interpretation of a 3D geometry. The system can not only accept sketched input, but also usersâ interactive input of 2D and 3D primitives.
This makes it friendly and easier to use, in comparison with âsketched input onlyâ, or âinteractive input onlyâ systems.
Finally, examples are given to illustrate the system
Nonlinear tube-fitting for the analysis of anatomical and functional structures
We are concerned with the estimation of the exterior surface and interior
summaries of tube-shaped anatomical structures. This interest is motivated by
two distinct scientific goals, one dealing with the distribution of HIV
microbicide in the colon and the other with measuring degradation in
white-matter tracts in the brain. Our problem is posed as the estimation of the
support of a distribution in three dimensions from a sample from that
distribution, possibly measured with error. We propose a novel tube-fitting
algorithm to construct such estimators. Further, we conduct a simulation study
to aid in the choice of a key parameter of the algorithm, and we test our
algorithm with validation study tailored to the motivating data sets. Finally,
we apply the tube-fitting algorithm to a colon image produced by single photon
emission computed tomography (SPECT) and to a white-matter tract image produced
using diffusion tensor imaging (DTI).Comment: Published in at http://dx.doi.org/10.1214/10-AOAS384 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- âŠ