21,927 research outputs found
ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
Entity resolution (ER), an important and common data cleaning problem, is
about detecting data duplicate representations for the same external entities,
and merging them into single representations. Relatively recently, declarative
rules called "matching dependencies" (MDs) have been proposed for specifying
similarity conditions under which attribute values in database records are
merged. In this work we show the process and the benefits of integrating four
components of ER: (a) Building a classifier for duplicate/non-duplicate record
pairs built using machine learning (ML) techniques; (b) Use of MDs for
supporting the blocking phase of ML; (c) Record merging on the basis of the
classifier results; and (d) The use of the declarative language "LogiQL" -an
extended form of Datalog supported by the "LogicBlox" platform- for all
activities related to data processing, and the specification and enforcement of
MDs.Comment: Final journal version, with some minor technical corrections.
Extended version of arXiv:1508.0601
Relatedness Measures to Aid the Transfer of Building Blocks among Multiple Tasks
Multitask Learning is a learning paradigm that deals with multiple different
tasks in parallel and transfers knowledge among them. XOF, a Learning
Classifier System using tree-based programs to encode building blocks
(meta-features), constructs and collects features with rich discriminative
information for classification tasks in an observed list. This paper seeks to
facilitate the automation of feature transferring in between tasks by utilising
the observed list. We hypothesise that the best discriminative features of a
classification task carry its characteristics. Therefore, the relatedness
between any two tasks can be estimated by comparing their most appropriate
patterns. We propose a multiple-XOF system, called mXOF, that can dynamically
adapt feature transfer among XOFs. This system utilises the observed list to
estimate the task relatedness. This method enables the automation of
transferring features. In terms of knowledge discovery, the resemblance
estimation provides insightful relations among multiple data. We experimented
mXOF on various scenarios, e.g. representative Hierarchical Boolean problems,
classification of distinct classes in the UCI Zoo dataset, and unrelated tasks,
to validate its abilities of automatic knowledge-transfer and estimating task
relatedness. Results show that mXOF can estimate the relatedness reasonably
between multiple tasks to aid the learning performance with the dynamic feature
transferring.Comment: accepted by The Genetic and Evolutionary Computation Conference
(GECCO 2020
Structural Data Recognition with Graph Model Boosting
This paper presents a novel method for structural data recognition using a
large number of graph models. In general, prevalent methods for structural data
recognition have two shortcomings: 1) Only a single model is used to capture
structural variation. 2) Naive recognition methods are used, such as the
nearest neighbor method. In this paper, we propose strengthening the
recognition performance of these models as well as their ability to capture
structural variation. The proposed method constructs a large number of graph
models and trains decision trees using the models. This paper makes two main
contributions. The first is a novel graph model that can quickly perform
calculations, which allows us to construct several models in a feasible amount
of time. The second contribution is a novel approach to structural data
recognition: graph model boosting. Comprehensive structural variations can be
captured with a large number of graph models constructed in a boosting
framework, and a sophisticated classifier can be formed by aggregating the
decision trees. Consequently, we can carry out structural data recognition with
powerful recognition capability in the face of comprehensive structural
variation. The experiments shows that the proposed method achieves impressive
results and outperforms existing methods on datasets of IAM graph database
repository.Comment: 8 page
Efficient schemes on solving fractional integro-differential equations
Fractional integro-differential equation (FIDE) emerges in various modelling of
physical phenomena. In most cases, finding the exact analytical solution for FIDE is
difficult or not possible. Hence, the methods producing highly accurate numerical
solution in efficient ways are often sought after. This research has designed some
methods to find the approximate solution of FIDE. The analytical expression of
Genocchi polynomial operational matrix for left-sided and right-sided Caputo’s
derivative and kernel matrix has been derived. Linear independence of Genocchi
polynomials has been proved by deriving the expression for Genocchi polynomial
Gram determinant. Genocchi polynomial method with collocation has been
introduced and applied in solving both linear and system of linear FIDE. The
numerical results of solving linear FIDE by Genocchi polynomial are compared with
certain existing methods. The analytical expression of Bernoulli polynomial
operational matrix of right-sided Caputo’s fractional derivative and the Bernoulli
expansion coefficient for a two-variable function is derived. Linear FIDE with mixed
left and right-sided Caputo’s derivative is first considered and solved by applying the
Bernoulli polynomial with spectral-tau method. Numerical results obtained show that
the method proposed achieves very high accuracy. The upper bounds for th
- …