Search CORE

5,150 research outputs found

Handling Failures in Data Quality Measures

Author: Nurul A. Emran
Publication venue: 'Global Science and Technology Forum'
Publication date: 01/01/2013
Field of study

Successful data quality (DQ) measure is important for many data consumers (or data guardians) to decide on the acceptability of data of concerned. Nevertheless, little is known about how “failures” of DQ measures can be handled by data guardians in the presence of factor(s) that contributes to the failures. This paper presents a review of failure handling mechanisms for DQ measures. The failure factors faced by existing DQ measures will be presented, together with the research gaps in respect to failure handling mechanisms in DQ frameworks. We propose ways to maximise the situations in which data quality scores can be produced when factors that would cause the failure of currently proposed scoring mechanisms are present. By understanding how failures can be handled, a systematic failure handling mechanism for robust DQ measures can be designed

Handling Failures in Data Quality Measures

Author: . Azwa Abdul Aziz
. Noraswaliza Abdullah
. Nurul A. Emran
Publication venue: GSTF Journal on Computing (JoC)
Publication date: 27/08/2014
Field of study

Successful data quality (DQ) measure is importantfor many data consumers (or data guardians) to decide on theacceptability of data of concerned. Nevertheless, little is knownabout how “failures” of DQ measures can be handled by dataguardians in the presence of factor(s) that contributes to thefailures. This paper presents a review of failure handling mechanismsfor DQ measures. The failure factors faced by existing DQmeasures will be presented, together with the research gaps inrespect to failure handling mechanisms in DQ frameworks. Inparticular, by comparing existing DQ frameworks in terms of: theinputs used to measure DQ, the way DQ scores are computed andthey way DQ scores are stored, we identified failure factorsinherent within the frameworks. Understanding of how failurescan be handled will lead to the design of a systematic failurehandling mechanism for robust DQ measures

Signed Distance-based Deep Memory Recommender

Author: Kong Xiangnan
Lee Kyumin
Liu Xinyue
Tran Thanh
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Personalized recommendation algorithms learn a user's preference for an item by measuring a distance/similarity between them. However, some of the existing recommendation models (e.g., matrix factorization) assume a linear relationship between the user and item. This approach limits the capacity of recommender systems, since the interactions between users and items in real-world applications are much more complex than the linear relationship. To overcome this limitation, in this paper, we design and propose a deep learning framework called Signed Distance-based Deep Memory Recommender, which captures non-linear relationships between users and items explicitly and implicitly, and work well in both general recommendation task and shopping basket-based recommendation task. Through an extensive empirical study on six real-world datasets in the two recommendation tasks, our proposed approach achieved significant improvement over ten state-of-the-art recommendation models

arXiv.org e-Print Archive

Evaluation Measures for Relevance and Credibility in Ranked Lists

Author: Larsen Birger
Lioma Christina
Simonsen Jakob Grue
Publication venue
Publication date: 01/01/2017
Field of study

Recent discussions on alternative facts, fake news, and post truth politics have motivated research on creating technologies that allow people not only to access information, but also to assess the credibility of the information presented to them by information retrieval systems. Whereas technology is in place for filtering information according to relevance and/or credibility, no single measure currently exists for evaluating the accuracy or precision (and more generally effectiveness) of both the relevance and the credibility of retrieved results. One obvious way of doing so is to measure relevance and credibility effectiveness separately, and then consolidate the two measures into one. There at least two problems with such an approach: (I) it is not certain that the same criteria are applied to the evaluation of both relevance and credibility (and applying different criteria introduces bias to the evaluation); (II) many more and richer measures exist for assessing relevance effectiveness than for assessing credibility effectiveness (hence risking further bias). Motivated by the above, we present two novel types of evaluation measures that are designed to measure the effectiveness of both relevance and credibility in ranked lists of retrieval results. Experimental evaluation on a small human-annotated dataset (that we make freely available to the research community) shows that our measures are expressive and intuitive in their interpretation

arXiv.org e-Print Archive

Copenhagen University Research Information System

VBN

Completeness of Information Sources

Author: Freytag Johann-Christoph
Leser Ulf
Naumann Felix
Publication venue: Humboldt-Universität zu Berlin
Publication date: 01/01/2003
Field of study

Information quality plays a crucial role in every application that integrates data from autonomous sources. However, information quality is hard to measure and complex to consider for the tasks of information integration, even if the integrating sources cooperate. We present a systematic and formal approach to the measurement of information quality and the combination of such measurements for information integration. Our approach is based on a value model that incorporates both extensional value (coverage) and intensional value (density) of information. For both aspects we provide merge functions for adequately scoring integrated results. Also, we combine the two criteria to an overall completeness criterion that formalizes the intuitive notion of completeness of query results. This completeness measure is a valuable tool to assess source size and to predict result sizes of queries in integrated information systems. We propose this measure as an important step towards the usage of information quality for source selection, query planning, query optimization, and quality feedback to users.Peer Reviewe

An In-depth Investigation of User Response Simulation for Conversational Search

Author: Ai Qingyao
Srikumar Vivek
Wang Zhenduo
Xu Zhichao
Publication venue
Publication date: 23/08/2023
Field of study

Conversational search has seen increased recent attention in both the IR and NLP communities. It seeks to clarify and solve a user's search need through multi-turn natural language interactions. However, most existing systems are trained and demonstrated with recorded or artificial conversation logs. Eventually, conversational search systems should be trained, evaluated, and deployed in an open-ended setting with unseen conversation trajectories. A key challenge is that training and evaluating such systems both require a human-in-the-loop, which is expensive and does not scale. One strategy for this is to simulate users, thereby reducing the scaling costs. However, current user simulators are either limited to only respond to yes-no questions from the conversational search system, or unable to produce high quality responses in general. In this paper, we show that current state-of-the-art user simulation system could be significantly improved by replacing it with a smaller but advanced natural language generation model. But rather than merely reporting this new state-of-the-art, we present an in-depth investigation of the task of simulating user response for conversational search. Our goal is to supplement existing works with an insightful hand-analysis of what challenges are still unsolved by the advanced model, as well as to propose our solutions for them. The challenges we identified include (1) dataset noise, (2) a blind spot that is difficult for existing models to learn, and (3) a specific type of misevaluation in the standard empirical setup. Except for the dataset noise issue, we propose solutions to cover the training blind spot and to avoid the misevaluation. Our proposed solutions lead to further improvements. Our best system improves the previous state-of-the-art significantly.Comment: 9 page

arXiv.org e-Print Archive

From Databases to Information Systems

Author: Naumann Felix
Publication venue: Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II
Publication date: 01/11/2001
Field of study

Research and business is currently moving from centralized databases towards information systems integrating distributed and autonomous data sources. Simultaneously, it is a well acknowledged fact that consideration of information quality_IQreasoning _is an important issue for large-scale integrated information systems. We show that IQ-reasoning can be the driving force of the current shift from databases to integrated information systems. In this paper, we explore the implications and consequences of this shift. All areas of answering user queries are affected – from user input, to query planning and query optimization, and finally to building the query result. The application of IQ-reasoning brings both challenges, such as new cost models for optimization, and opportunities, such as improved query planning. We highlight several emerging aspects and suggest solutions toward a pervasion of information quality in information systems.Peer Reviewe

From Frequency to Meaning: Vector Space Models of Semantics

Author: Pantel Patrick
Turney Peter D.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2010
Field of study

Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

arXiv.org e-Print Archive

CiteSeerX