12,267 research outputs found
ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
Entity resolution (ER), an important and common data cleaning problem, is
about detecting data duplicate representations for the same external entities,
and merging them into single representations. Relatively recently, declarative
rules called "matching dependencies" (MDs) have been proposed for specifying
similarity conditions under which attribute values in database records are
merged. In this work we show the process and the benefits of integrating four
components of ER: (a) Building a classifier for duplicate/non-duplicate record
pairs built using machine learning (ML) techniques; (b) Use of MDs for
supporting the blocking phase of ML; (c) Record merging on the basis of the
classifier results; and (d) The use of the declarative language "LogiQL" -an
extended form of Datalog supported by the "LogicBlox" platform- for all
activities related to data processing, and the specification and enforcement of
MDs.Comment: Final journal version, with some minor technical corrections.
Extended version of arXiv:1508.0601
ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
Entity resolution (ER), an important and common data cleaning problem, is
about detecting data duplicate representations for the same external entities,
and merging them into single representations. Relatively recently, declarative
rules called matching dependencies (MDs) have been proposed for specifying
similarity conditions under which attribute values in database records are
merged. In this work we show the process and the benefits of integrating three
components of ER: (a) Classifiers for duplicate/non-duplicate record pairs
built using machine learning (ML) techniques, (b) MDs for supporting both the
blocking phase of ML and the merge itself; and (c) The use of the declarative
language LogiQL -an extended form of Datalog supported by the LogicBlox
platform- for data processing, and the specification and enforcement of MDs.Comment: To appear in Proc. SUM, 201
A web-based teaching/learning environment to support collaborative knowledge construction in design
A web-based application has been developed as part of a recently completed research which proposed a conceptual framework to collect, analyze and compare different design experiences and to construct structured representations of the emerging knowledge in digital architectural design. The paper introduces the theoretical and practical development of this application as a teaching/learning environment which has significantly contributed to the development and testing of the ideas developed throughout the research. Later in the paper, the application of BLIP in two experimental (design) workshops is reported and evaluated according to the extent to which the application facilitates generation, modification and utilization of design knowledge
Recommended from our members
An approach to modeling database activity
Results in the field of data modeling currently suffer from many of the same ills which plagued data management systems in the late 1960's. Advanced semantic modeling systems such as the Semantic Data Model and the Relational Model/Tasmania are extremely complex to understand as well as somewhat ad hoc in design. Such systems capture only static snapshots of activity in the world being modeled. On the other hand, behavioral models which do attempt to model system dynamics typically provide less overall modeling power than comprehensive semantic models. Further, the specifications of behavior which can be expressed with such models are themselves static snapshots which are not integrated with other database objects.This work describes one approach for capturing dynamic relationships by distilling the concepts found in semantic and behavioral data models into a small number of flexible constructs. The resulting Prototype Activity Modeling System (PAMS) captures the containment, feedback, operational, and state dependency roles of entities in the world being modeled. Further, these definitions of database activity are captured as database objects (rather than as a schema) so as to allow dynamic manipulation of entity roles.The key concept of the approach is the bundle - a purposefully designed extension of time-proven relational database modeling concepts which includes support for presentation ordering and complex Cartesian aggregations. By applying the basic nested bundle principle, it is possible to obtain complex hierarchies of static structural information. The static templates so constructed, when used with a non-procedural query language and the value nomination principle which reduces relations to scalar values when necessary, provide a conventional database modeling system for applications. By extending these templates with the non-procedural thunk principle which embeds query specifications within object definitions, variations caused by dependencies within the application can cause the apparent contents of the database description to change. When further extended by the activity monitoring principle which records the interaction between the application and its environment, these dynamic templates can account for changes outside the scope of the application
- …