16,789 research outputs found
Building Efficient and Compact Data Structures for Simplicial Complexes
The Simplex Tree (ST) is a recently introduced data structure that can
represent abstract simplicial complexes of any dimension and allows efficient
implementation of a large range of basic operations on simplicial complexes. In
this paper, we show how to optimally compress the Simplex Tree while retaining
its functionalities. In addition, we propose two new data structures called the
Maximal Simplex Tree (MxST) and the Simplex Array List (SAL). We analyze the
compressed Simplex Tree, the Maximal Simplex Tree, and the Simplex Array List
under various settings.Comment: An extended abstract appeared in the proceedings of SoCG 201
The Grow-Shrink strategy for learning Markov network structures constrained by context-specific independences
Markov networks are models for compactly representing complex probability
distributions. They are composed by a structure and a set of numerical weights.
The structure qualitatively describes independences in the distribution, which
can be exploited to factorize the distribution into a set of compact functions.
A key application for learning structures from data is to automatically
discover knowledge. In practice, structure learning algorithms focused on
"knowledge discovery" present a limitation: they use a coarse-grained
representation of the structure. As a result, this representation cannot
describe context-specific independences. Very recently, an algorithm called
CSPC was designed to overcome this limitation, but it has a high computational
complexity. This work tries to mitigate this downside presenting CSGS, an
algorithm that uses the Grow-Shrink strategy for reducing unnecessary
computations. On an empirical evaluation, the structures learned by CSGS
achieve competitive accuracies and lower computational complexity with respect
to those obtained by CSPC.Comment: 12 pages, and 8 figures. This works was presented in IBERAMIA 201
Identifying Mislabeled Training Data
This paper presents a new approach to identifying and eliminating mislabeled
training instances for supervised learning. The goal of this approach is to
improve classification accuracies produced by learning algorithms by improving
the quality of the training data. Our approach uses a set of learning
algorithms to create classifiers that serve as noise filters for the training
data. We evaluate single algorithm, majority vote and consensus filters on five
datasets that are prone to labeling errors. Our experiments illustrate that
filtering significantly improves classification accuracy for noise levels up to
30 percent. An analytical and empirical evaluation of the precision of our
approach shows that consensus filters are conservative at throwing away good
data at the expense of retaining bad data and that majority filters are better
at detecting bad data at the expense of throwing away good data. This suggests
that for situations in which there is a paucity of data, consensus filters are
preferable, whereas majority vote filters are preferable for situations with an
abundance of data
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
- …