1,243,226 research outputs found
Sparse Sequential Dirichlet Coding
This short paper describes a simple coding technique, Sparse Sequential
Dirichlet Coding, for multi-alphabet memoryless sources. It is appropriate in
situations where only a small, unknown subset of the possible alphabet symbols
can be expected to occur in any particular data sequence. We provide a
competitive analysis which shows that the performance of Sparse Sequential
Dirichlet Coding will be close to that of a Sequential Dirichlet Coder that
knows in advance the exact subset of occurring alphabet symbols. Empirically we
show that our technique can perform similarly to the more computationally
demanding Sequential Sub-Alphabet Estimator, while using less computational
resources.Comment: 7 page
Duration and Interval Hidden Markov Model for Sequential Data Analysis
Analysis of sequential event data has been recognized as one of the essential
tools in data modeling and analysis field. In this paper, after the examination
of its technical requirements and issues to model complex but practical
situation, we propose a new sequential data model, dubbed Duration and Interval
Hidden Markov Model (DI-HMM), that efficiently represents "state duration" and
"state interval" of data events. This has significant implications to play an
important role in representing practical time-series sequential data. This
eventually provides an efficient and flexible sequential data retrieval.
Numerical experiments on synthetic and real data demonstrate the efficiency and
accuracy of the proposed DI-HMM
Fringe analysis for parallel MacroSplit insertion algorithms in 2--3 trees
We extend the fringe analysis (used to study the expected behavior of balanced search trees under sequential insertions) to deal with synchronous parallel insertions on 2--3 trees. Given an insertion of k keys in a tree with n nodes, the fringe evolves following a transition matrix whose coefficients take care of the precise form of the algorithm but does not depend on k or n. The derivation of this matrix uses the binomial transform recently developed by P. Poblete, J. Munro and Th. Papadakis. Due to the complexity of the preceding exact analysis, we develop also two approximations. A first one based on a simplified parallel model, and a second one based on the sequential model.
These two approximated analysis prove that the parallel insertions case does not differ significantly from the sequential case, namely
on the terms O(1/n^2).Postprint (published version
On mining complex sequential data by means of FCA and pattern structures
Nowadays data sets are available in very complex and heterogeneous ways.
Mining of such data collections is essential to support many real-world
applications ranging from healthcare to marketing. In this work, we focus on
the analysis of "complex" sequential data by means of interesting sequential
patterns. We approach the problem using the elegant mathematical framework of
Formal Concept Analysis (FCA) and its extension based on "pattern structures".
Pattern structures are used for mining complex data (such as sequences or
graphs) and are based on a subsumption operation, which in our case is defined
with respect to the partial order on sequences. We show how pattern structures
along with projections (i.e., a data reduction of sequential structures), are
able to enumerate more meaningful patterns and increase the computing
efficiency of the approach. Finally, we show the applicability of the presented
method for discovering and analyzing interesting patient patterns from a French
healthcare data set on cancer. The quantitative and qualitative results (with
annotations and analysis from a physician) are reported in this use case which
is the main motivation for this work.
Keywords: data mining; formal concept analysis; pattern structures;
projections; sequences; sequential data.Comment: An accepted publication in International Journal of General Systems.
The paper is created in the wake of the conference on Concept Lattice and
their Applications (CLA'2013). 27 pages, 9 figures, 3 table
- …
