Search CORE

220 research outputs found

An approach to graph-based analysis of textual documents

Author: Bronselaer Antoon
Pasi Gabriella
Publication venue: 'Atlantis Press'
Publication date: 01/01/2013
Field of study

In this paper a new graph-based model is proposed for the representation of textual documents. Graph-structures are obtained from textual documents by making use of the well-known Part-Of-Speech (POS) tagging technique. More specifically, a simple rule-based (re) classifier is used to map each tag onto graph vertices and edges. As a result, a decomposition of textual documents is obtained where tokens are automatically parsed and attached to either a vertex or an edge. It is shown how textual documents can be aggregated through their graph-structures and finally, it is shown how vertex-ranking methods can be used to find relevant tokens.(1)

Crossref

Ghent University Academic Bibliography

Mining data quality rules based on T-dependence

Author: Boeckling Toon
Bronselaer Antoon
De Tré Guy
Publication venue: 'Atlantis Press'
Publication date: 01/01/2019
Field of study

Since their introduction in 1976, edit rules have been a standard tool in statistical analysis. Basically, edit rules are a compact representation of non-permitted combinations of values in a dataset. In this paper, we propose a technique to automatically find edit rules by use of the concept of T-dependence. We first generalize the traditional notion of lift, to that of T-lift, where stochastic independence is generalized to T-dependence. A combination of values is declared as an edit rule under a t-norm T if there is a strong negative correlation under T-dependence. We show several interesting properties of this approach. In particular, we show that under the minimum t-norm, edit rules can be computed efficiently by use of frequent pattern trees. Experimental results show that there is a weak to medium correlation in the rank order of edit rules obtained under T_M and T_P, indicating that the semantics of these kinds of dependencies are different

Crossref

Ghent University Academic Bibliography

Comparing fbeta-optimal with distance based merge functions

Author: Bronselaer Antoon
De Tré Guy
Van Britsom Daan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Merge functions informally combine information from a certain universe into a solution over that same universe. This typically results in a, preferably optimal, summarization. In previous research, merge functions over sets have been looked into extensively. A specic case concerns sets that allow elements to appear more than once, multisets. In this paper we compare two types of merge functions over multisets against each other. We examine both general properties as practical usability in a real world application

Ghent University Academic Bibliography

Quantifier based aggregation in LSP suitability map construction

Author: Bronselaer Antoon
De Tré Guy
Dujmović Jozo
Matthé Tom
Publication venue
Publication date: 01/01/2011
Field of study

Ghent University Academic Bibliography

Automated cleansing of POI databases

Author: A. Bronselaer
A. Bronselaer
A. Bronselaer
A. Bronselaer
A. Bronselaer
C. Baral
C. Baral
D. Dubois
G. Bordogna
G. Cooman De
G. Nachouki
G. Tré De
H. Foley
I. Bloch
I. Fellegi
J. Dujmović
J. Lin
J. Lin
L.A. Zadeh
L.A. Zadeh
M. Bright
M.A. Rodríguez
P. Carrara
R. Torres
R. Yager
R. Yager
R. Yager
R.W. Sinnott
S. Destercke
S. Konieczny
S. Rahimi
S. Sandri
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref

Ghent University Academic Bibliography

Bipolarity in ear biometrics

Author: Bronselaer Antoon
De Tré Guy
Matthé Tom
Nielandt Joachim
Publication venue: 'Atlantis Press'
Publication date: 01/01/2011
Field of study

Identifying people using their biometric data is a problem that is getting increasingly more attention. This paper investigates a method that allows the matching of people in the context of victim identification by using their ear biometric data. A high quality picture (taken professionally) is matched against a set of low quality pictures (family albums). In this paper soft computing methods are used to model different kinds of uncertainty that arise when manually annotating the pictures. More specifically, we study the use of bipolar satisfaction degrees to explicitly handle the bipolar information about the available ear biometrics

Ghent University Academic Bibliography

Coreference detection of low quality objects

Author: A. Bronselaer
D. Gusfield
I. Fellegi
P. Lehti
P. Ravikumar
S. Tejada
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

The problem of record linkage is a widely studied problem that aims to identify coreferent (i.e. duplicate) data in a structured data source. As indicated by Winkler, a solution to the record linkage problem is only possible if the error rate is sufficiently low. In other words, in order to succesfully deduplicate a database, the objects in the database must be of sufficient quality. However, this assumption is not always feasible. In this paper, it is investigated how merging of low quality objects into one high quality object can improve the process of record linkage. This general idea is illustrated in the context of strings comparison, where strings of low quality (i.e. with a high typographical error rate) are merged into a string of high quality by using an n-dimensional Levenshtein distance matrix and compute the optimal alignment between the dirty strings. Results are presented and possible refinements are proposed

Crossref

Ghent University Academic Bibliography

A measure-theoretic foundation for data quality

Author: Bronselaer Antoon
De Mol Robin
De Tré Guy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study