Search CORE

12 research outputs found

Meaningful Labeling of Integrated Query Interfaces

Author: Clement Yu
Clement Yu
Eduard C. Dragut
Eduard Dragut
Purdue E-pubs
Weiyi Meng
Weiyi Meng
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2006
Field of study

The contents of Web databases are accessed through queries formulated on complex user interfaces. In many domains of interest (e.g. Auto) users are interested in obtaining information from alternative sources. Thus, they have to access many individual Web databases via query interfaces. We aim to construct automatically a well-designed query interface that integrates a set of interfaces in the same domain. This will permit users to access information uniformly from multiple sources. Earlier research in this area includes matching attributes across multiple query interfaces in the same domain and grouping related attributes. In this paper, we investigate the naming of the attributes in the integrated query interface. We provide a set of properties which are required in order to have consistent labels for the attributes within an integrated interface so that users have no difficulty in understanding it. Based on these properties, we design algorithms to systematically label the attributes. Experimental results on seven domains validate our theoretical study. In the process of naming attributes, a set of logical inference rules among the textual labels is discovered. These inferences are also likely to be applicable to other integration problems sensitive to naming: HTML forms, HTML tables or concept hierarchies in the semantic Web

CiteSeerX

Purdue E-Pubs

ABSTRACT Meaningful Labeling of Integrated Query Interfaces

Author: Eduard C. Dragut
Publication venue
Publication date
Field of study

The contents of Web databases are accessed through queries formulated on complex user interfaces. In many domains of interest (e.g. Auto) users are interested in obtaining information from alternative sources. Thus, they have to access many individual Web databases via query interfaces. We aim to construct automatically a well-designed query interface that integrates a set of interfaces in the same domain. This will permit users to access information uniformly from multiple sources. Earlier research in this area includes matching attributes across multiple query interfaces in the same domain and grouping related attributes. In this paper, we investigate the naming of the attributes in the integrated query interface. We provide a set of properties which are required in order to have consistent labels for the attributes within an integrated interface so that users have no difficulty in understanding it. Based on these properties, we design algorithms to systematically label the attributes. Experimental results on seven domains validate our theoretical study. In the process of naming attributes, a set of logical inference rules among the textual labels is discovered. These inferences are also likely to be applicable to other integration problems sensitive to naming: e.g., HTML forms, HTML tables or concept hierarchies in the semantic Web. 1

CiteSeerX

Normalization of Duplicate Records from Multiple Sources

Author: Eduard C. Dragut
Weiyi Meng
Yongquan Dong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Query-Time Record Linkage and Fusion over Web Databases

Author: Ahmed K Elmagarmid
Eduard C Dragut
El Kindi Rezig
Mourad Ouzzani
Publication venue
Publication date: 02/04/2020
Field of study

Abstract-Data-intensive Web applications usually require integrating data from Web sources at query time. The sources may refer to the same real-world entity in different ways and some may even provide outdated or erroneous data. An important task is to recognize and merge the records that refer to the same real world entity at query time. Most existing duplicate detection and fusion techniques work in the off-line setting and do not meet the online constraint. There are at least two aspects that differentiate online duplicate detection and fusion from its offline counterpart. (i) The latter assumes that the entire data is available, while the former cannot make such an assumption. (ii) Several query submissions may be required to compute the "ideal" representation of an entity in the online setting. This paper presents a general framework for the online setting based on an iterative record-based caching technique. A set of frequently requested records is deduplicated off-line and cached for future reference. Newly arriving records in response to a query are deduplicated jointly with the records in the cache, presented to the user and appended to the cache. Experiments with real and synthetic data show the benefit of our solution over traditional record linkage techniques applied to an online setting

CiteSeerX

Polarity Consistency Checking for Domain Independent Sentiment Dictionaries

Author: Clement Yu
Eduard C. Dragut
Hong Wang
Prasad Sistla
Weiyi Meng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref