Search CORE

11,373 research outputs found

Message from the ICDE 2015 Program Committee and general chairs

Author: Cha Sang Kyun
Gehrke Johannes
Lehner Wolfgang
Lohman Guy
Shim Kyuseok
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/01/2023
Field of study

Since its inception in 1984, the IEEE International Conference on Data Engineering (ICDE) has become a premier forum for the exchange and dissemination of data management research results among researchers, users, practitioners, and developers. Continuing this long-standing tradition, the 31st ICDE will be hosted this year in Seoul, South Korea, from April 13 to April 17, 2015. It is our great pleasure to welcome you to ICDE 2015 and to present its proceedings to you

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Scalable and interpretable product recommendations via overlapping co-clustering

Author: Dünner Celestine
Heckel Reinhard
Parnell Thomas
Vlachos Michail
Publication venue
Publication date: 01/04/2017
Field of study

We consider the problem of generating interpretable recommendations by identifying overlapping co-clusters of clients and products, based only on positive or implicit feedback. Our approach is applicable on very large datasets because it exhibits almost linear complexity in the input examples and the number of co-clusters. We show, both on real industrial data and on publicly available datasets, that the recommendation accuracy of our algorithm is competitive to that of state-of-art matrix factorization techniques. In addition, our technique has the advantage of offering recommendations that are textually and visually interpretable. Finally, we examine how to implement our technique efficiently on Graphical Processing Units (GPUs).Comment: In IEEE International Conference on Data Engineering (ICDE) 201

arXiv.org e-Print Archive

Serveur académique lausannois

Enabling Quality Control for Entity Resolution: A Human and Machine Cooperation Framework

Author: Chen Qun
Chen Zhaoqiang
Fan Fengfeng
Li Zhanhuai
Liu Hailong
Nafa Youcef
Pan Wei
Wang Yanyan
Wang Zhuo
Publication venue
Publication date: 02/04/2018
Field of study

Even though many machine algorithms have been proposed for entity resolution, it remains very challenging to find a solution with quality guarantees. In this paper, we propose a novel HUman and Machine cOoperation (HUMO) framework for entity resolution (ER), which divides an ER workload between the machine and the human. HUMO enables a mechanism for quality control that can flexibly enforce both precision and recall levels. We introduce the optimization problem of HUMO, minimizing human cost given a quality requirement, and then present three optimization approaches: a conservative baseline one purely based on the monotonicity assumption of precision, a more aggressive one based on sampling and a hybrid one that can take advantage of the strengths of both previous approaches. Finally, we demonstrate by extensive experiments on real and synthetic datasets that HUMO can achieve high-quality results with reasonable return on investment (ROI) in terms of human cost, and it performs considerably better than the state-of-the-art alternatives in quality control.Comment: 12 pages, 11 figures. Camera-ready version of the paper submitted to ICDE 2018, In Proceedings of the 34th IEEE International Conference on Data Engineering (ICDE 2018

arXiv.org e-Print Archive

Crossref

Evaluating probabilistic queries over uncertain matching

Author: Cheng J
Cheng R
Cheung DWL
Gong J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

A matching between two database schemas, generated by machine learning techniques (e.g., COMA++), is often uncertain. Handling the uncertainty of schema matching has recently raised a lot of research interest, because the quality of applications rely on the matching result. We study query evaluation over an inexact schema matching, which is represented as a set of 'possible mappings', as well as the probabilities that they are correct. Since the number of possible mappings can be large, evaluating queries through these mappings can be expensive. By observing the fact that the possible mappings between two schemas often exhibit a high degree of overlap, we develop two efficient solutions. We also present a fast algorithm to compute answers with the k highest probabilities. An extensive evaluation on real schemas shows that our approaches improve the query performance by almost an order of magnitude. © 2012 IEEE.published_or_final_versionThe IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, D.C., 1-5 April 2012. In International Conference on Data Engineering Proceedings, 2012, p. 1096-110

HKU Scholars Hub

Physical Representation-based Predicate Optimization for a Visual Analytics Database

Author: Anderson Michael R.
Cafarella Michael
Ros German
Wenisch Thomas F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/02/2019
Field of study

Querying the content of images, video, and other non-textual data sources requires expensive content extraction methods. Modern extraction techniques are based on deep convolutional neural networks (CNNs) and can classify objects within images with astounding accuracy. Unfortunately, these methods are slow: processing a single image can take about 10 milliseconds on modern GPU-based hardware. As massive video libraries become ubiquitous, running a content-based query over millions of video frames is prohibitive. One promising approach to reduce the runtime cost of queries of visual content is to use a hierarchical model, such as a cascade, where simple cases are handled by an inexpensive classifier. Prior work has sought to design cascades that optimize the computational cost of inference by, for example, using smaller CNNs. However, we observe that there are critical factors besides the inference time that dramatically impact the overall query time. Notably, by treating the physical representation of the input image as part of our query optimization---that is, by including image transforms, such as resolution scaling or color-depth reduction, within the cascade---we can optimize data handling costs and enable drastically more efficient classifier cascades. In this paper, we propose Tahoma, which generates and evaluates many potential classifier cascades that jointly optimize the CNN architecture and input data representation. Our experiments on a subset of ImageNet show that Tahoma's input transformations speed up cascades by up to 35 times. We also find up to a 98x speedup over the ResNet50 classifier with no loss in accuracy, and a 280x speedup if some accuracy is sacrificed.Comment: Camera-ready version of the paper submitted to ICDE 2019, In Proceedings of the 35th IEEE International Conference on Data Engineering (ICDE 2019

arXiv.org e-Print Archive

Crossref

Time-decaying Sketches for Robust Aggregation of Sensor Data

Author: Cormode Graham
Tirthapura Srikanta
Xu Bojian
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2009
Field of study

We present a new sketch for summarizing network data. The sketch has the following properties which make it useful in communication-efficient aggregation in distributed streaming scenarios, such as sensor networks: the sketch is duplicate insensitive, i.e., reinsertions of the same data will not affect the sketch and hence the estimates of aggregates. Unlike previous duplicate-insensitive sketches for sensor data aggregation [S. Nath et al., Synposis diffusion for robust aggregation in sensor networks, in Proceedings of the 2nd International Conference on Embedded Network Sensor Systems, (2004), pp. 250–262], [J. Considine et al., Approximate aggregation techniques for sensor databases, in Proceedings of the 20th International Conference on Data Engineering (ICDE), 2004, pp. 449–460], it is also time decaying, so that the weight of a data item in the sketch can decrease with time according to a user-specified decay function. The sketch can give provably approximate guarantees for various aggregates of data, including the sum, median, quantiles, and frequent elements. The size of the sketch and the time taken to update it are both polylogarithmic in the size of the relevant data. Further, multiple sketches computed over distributed data can be combined without loss of accuracy. To our knowledge, this is the first sketch that combines all the above properties

Digital Repository @ Iowa State University (ISU)