35,476 research outputs found
Revisiting Pattern Structures for Structured Attribute Sets
International audienceIn this paper, we revisit an original proposition on pattern structures for structured sets of attributes. There are several reasons for carrying out this kind of research work. The original proposition does not give many details on the whole framework, and especially on the possible ways of implementing the similarity operation. There exists an alternative definition without any reference to pattern structures, and we would like to make a parallel between two points of view. Moreover we discuss an efficient implementation of the intersection operation in the corresponding pattern structure. Finally, we discovered that pattern structures for structured attribute sets are very well adapted to the classification and the analysis of RDF data. We terminate the paper by an experimental section where it is shown that the provided implementation of pattern structures for structured attribute sets is quite efficient
On the performance impact of using JSON, beyond impedance mismatch
NOSQL database management systems adopt semi-structured data models, such as JSON, to easily accommodate schema evolution and overcome the overhead generated from transforming internal structures to tabular data (i.e., impedance mismatch). There exist multiple, and equivalent, ways to physically represent semi-structured data, but there is a lack of evidence about the potential impact on space and query performance. In this paper, we embark on the task of quantifying that, precisely for document stores. We empirically compare multiple ways of representing semi-structured data, which allows us to derive a set of guidelines for efficient physical database design considering both JSON and relational options in the same palette.Partly funded by the European Commission through the programme “EM IT4BI-DC”.Peer ReviewedPostprint (author's final draft
Evolving Ensemble Fuzzy Classifier
The concept of ensemble learning offers a promising avenue in learning from
data streams under complex environments because it addresses the bias and
variance dilemma better than its single model counterpart and features a
reconfigurable structure, which is well suited to the given context. While
various extensions of ensemble learning for mining non-stationary data streams
can be found in the literature, most of them are crafted under a static base
classifier and revisits preceding samples in the sliding window for a
retraining step. This feature causes computationally prohibitive complexity and
is not flexible enough to cope with rapidly changing environments. Their
complexities are often demanding because it involves a large collection of
offline classifiers due to the absence of structural complexities reduction
mechanisms and lack of an online feature selection mechanism. A novel evolving
ensemble classifier, namely Parsimonious Ensemble pENsemble, is proposed in
this paper. pENsemble differs from existing architectures in the fact that it
is built upon an evolving classifier from data streams, termed Parsimonious
Classifier pClass. pENsemble is equipped by an ensemble pruning mechanism,
which estimates a localized generalization error of a base classifier. A
dynamic online feature selection scenario is integrated into the pENsemble.
This method allows for dynamic selection and deselection of input features on
the fly. pENsemble adopts a dynamic ensemble structure to output a final
classification decision where it features a novel drift detection scenario to
grow the ensemble structure. The efficacy of the pENsemble has been numerically
demonstrated through rigorous numerical studies with dynamic and evolving data
streams where it delivers the most encouraging performance in attaining a
tradeoff between accuracy and complexity.Comment: this paper has been published by IEEE Transactions on Fuzzy System
Form and function in hillslope hydrology : in situ imaging and characterization of flow-relevant structures
Thanks to Elly Karle and the Engler-BunteInstitute, KIT, for the IC measurements of bromide. We are grateful to Selina Baldauf, Marcel Delock, Razije Fiden, Barbara Herbstritt, Lisei Köhn, Jonas Lanz, Francois Nyobeu, Marvin Reich and Begona Lorente Sistiaga for their support in the lab and during fieldwork, as well as Markus Morgner and Jean Francois Iffly for technical support and Britta Kattenstroth for hydrometeorological data acquisition. Laurent Pfister and Jean-Francois Iffly from the Luxembourg Institute of Science and Technology (LIST) are acknowledged for organizing the permissions for the experiments. Moreover, we thank Markus Weiler (University of Freiburg) for his strong support during the planning of the hillslope experiment and the preparation of the manuscript. This study is part of the DFG-funded CAOS project “From Catchments as Organised Systems to Models based on Dynamic Functional Units” (FOR 1598). The manuscript was substantially improved based on the critical and constructive comments of the anonymous reviewers, Christian Stamm and Alexander Zimmermann, and the editor Ross Woods during the open review process, which is highly appreciated.Peer reviewedPublisher PD
Ontologies and Information Extraction
This report argues that, even in the simplest cases, IE is an ontology-driven
process. It is not a mere text filtering method based on simple pattern
matching and keywords, because the extracted pieces of texts are interpreted
with respect to a predefined partial domain model. This report shows that
depending on the nature and the depth of the interpretation to be done for
extracting the information, more or less knowledge must be involved. This
report is mainly illustrated in biology, a domain in which there are critical
needs for content-based exploration of the scientific literature and which
becomes a major application domain for IE
- …