220 research outputs found

    An approach to graph-based analysis of textual documents

    Get PDF
    In this paper a new graph-based model is proposed for the representation of textual documents. Graph-structures are obtained from textual documents by making use of the well-known Part-Of-Speech (POS) tagging technique. More specifically, a simple rule-based (re) classifier is used to map each tag onto graph vertices and edges. As a result, a decomposition of textual documents is obtained where tokens are automatically parsed and attached to either a vertex or an edge. It is shown how textual documents can be aggregated through their graph-structures and finally, it is shown how vertex-ranking methods can be used to find relevant tokens.(1)

    Mining data quality rules based on T-dependence

    Get PDF
    Since their introduction in 1976, edit rules have been a standard tool in statistical analysis. Basically, edit rules are a compact representation of non-permitted combinations of values in a dataset. In this paper, we propose a technique to automatically find edit rules by use of the concept of T-dependence. We first generalize the traditional notion of lift, to that of T-lift, where stochastic independence is generalized to T-dependence. A combination of values is declared as an edit rule under a t-norm T if there is a strong negative correlation under T-dependence. We show several interesting properties of this approach. In particular, we show that under the minimum t-norm, edit rules can be computed efficiently by use of frequent pattern trees. Experimental results show that there is a weak to medium correlation in the rank order of edit rules obtained under T_M and T_P, indicating that the semantics of these kinds of dependencies are different

    Comparing fbeta-optimal with distance based merge functions

    Get PDF
    Merge functions informally combine information from a certain universe into a solution over that same universe. This typically results in a, preferably optimal, summarization. In previous research, merge functions over sets have been looked into extensively. A specic case concerns sets that allow elements to appear more than once, multisets. In this paper we compare two types of merge functions over multisets against each other. We examine both general properties as practical usability in a real world application

    Bipolarity in ear biometrics

    Get PDF
    Identifying people using their biometric data is a problem that is getting increasingly more attention. This paper investigates a method that allows the matching of people in the context of victim identification by using their ear biometric data. A high quality picture (taken professionally) is matched against a set of low quality pictures (family albums). In this paper soft computing methods are used to model different kinds of uncertainty that arise when manually annotating the pictures. More specifically, we study the use of bipolar satisfaction degrees to explicitly handle the bipolar information about the available ear biometrics

    Coreference detection of low quality objects

    Get PDF
    The problem of record linkage is a widely studied problem that aims to identify coreferent (i.e. duplicate) data in a structured data source. As indicated by Winkler, a solution to the record linkage problem is only possible if the error rate is sufficiently low. In other words, in order to succesfully deduplicate a database, the objects in the database must be of sufficient quality. However, this assumption is not always feasible. In this paper, it is investigated how merging of low quality objects into one high quality object can improve the process of record linkage. This general idea is illustrated in the context of strings comparison, where strings of low quality (i.e. with a high typographical error rate) are merged into a string of high quality by using an n-dimensional Levenshtein distance matrix and compute the optimal alignment between the dirty strings. Results are presented and possible refinements are proposed

    A measure-theoretic foundation for data quality

    Get PDF

    Coreferentie van atomaire en complexe objecten

    Get PDF
    corecore