Search CORE

530 research outputs found

Resource-aware ubiquitous data stream querying

Author: Agarwal I.
Gaber M.
Krishnaswamy S.
Publication venue
Publication date: 01/01/2005
Field of study

Portsmouth University Research Portal (Pure)

The EDAM Project: Mining Atmospheric Aerosol Datasets

Author: Chen Lei
Gross Deborah S.
Huang Zheng
Musicant David R.
Ramakrishnan Raghu
Schauer James J.
Shafer Martin M.
Publication venue: Carleton Digital Commons
Publication date: 01/01/2005
Field of study

Data mining has been a very active area of research in the database, machine learning, and mathematical programming communities in recent years. EDAM (Exploratory Data Analysis and Management) is a joint project between researchers in Atmospheric Chemistry and Computer Science at Carleton College and the University of Wisconsin-Madison that aims to develop data mining techniques for advancing the state of the art in analyzing atmospheric aerosol datasets. There is a great need to better understand the sources, dynamics, and compositions of atmospheric aerosols. The traditional approach for particle measurement, which is the collection of bulk samples of particulates on filters, is not adequate for studying particle dynamics and real-time correlations. This has led to the development of a new generation of real-time instruments that provide continuous or semi-continuous streams of data about certain aerosol properties. However, these instruments have added a significant level of complexity to atmospheric aerosol data, and dramatically increased the amounts of data to be collected, managed, and analyzed. Our abilit y to integrate the data from all of these new and complex instruments now lags far behind our data-collection capabilities, and severely limits our ability to understand the data and act upon it in a timely manner. In this paper, we present an overview of the EDAM project. The goal of the project, which is in its early stages, is to develop novel data mining algorithms and approaches to managing and monitoring multiple complex data streams. An important objective is data quality assurance, and real-time data mining offers great potential. The approach that we take should also provide good techniques to deal with gas-phase and semi-volatile data. While atmospheric aerosol analysis is an important and challenging domain that motivates us with real problems and serves as a concrete test of our results, our objective is to develop techniques that have broader applicability, and to explore some fundamental challenges in data mining that are not specific to any given application domain

Carleton College: Digital Commons

m-tables: Representing Missing Data

Author: Koutris Paraschos
Lang Willis
Naughton Jeffrey
Sundarmurthy Bruhathi
Tannen Val
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 20th International Conference on Database Theory (ICDT 2017)
Publication date: 01/01/2017
Field of study

Representation systems have been widely used to capture different forms of incomplete data in various settings. However, existing representation systems are not expressive enough to handle the more complex scenarios of missing data that can occur in practice: these could vary from missing attribute values, missing a known number of tuples, or even missing an unknown number of tuples. In this work, we propose a new representation system called m-tables, that can represent many different types of missing data. We show that m-tables form a closed, complete and strong representation system under both set and bag semantics and are strictly more expressive than conditional tables under both the closed and open world assumptions. We further study the complexity of computing certain and possible answers in m-tables. Finally, we discuss how to "interpret" m-tables through a novel labeling scheme that marks a type of generalized tuples as certain or possible

Dagstuhl Research Online Publication Server

A fractional number based labeling scheme for dynamic XML updating

Author: Fathi Leila
Ibrahim Hamidah
Mamat Ali
Mirabi Meghdad
Udzir Nur Izura
Publication venue
Publication date: 01/01/2011
Field of study

Recently, XML query processing based on labeling schemes has been proposed.Based on labeling schemes, the structural relationship between XML nodes can be determined quickly without the need of accessing the XML document.However, labeling schemes have to re label the pre-existing nodes or re-calculate the label values when a new node is inserted into the XML document during the update process.In this paper, we propose a novel labeling scheme based on fractional numbers.The key feature of fractional numbers is that infinite number of fractional numbers can be inserted between any two unequal fractional numbers.Therefore, the problem of re-labeling the pre-existing nodes during the XML updating can be solved if the XML nodes are label by the fractional numbers

UUM Repository

Universiti Putra Malaysia Institutional Repository

Knowledge Refinement via Rule Selection

Author: Kolaitis Phokion G.
Popa Lucian
Qian Kun
Publication venue
Publication date: 28/01/2019
Field of study

In several different applications, including data transformation and entity resolution, rules are used to capture aspects of knowledge about the application at hand. Often, a large set of such rules is generated automatically or semi-automatically, and the challenge is to refine the encapsulated knowledge by selecting a subset of rules based on the expected operational behavior of the rules on available data. In this paper, we carry out a systematic complexity-theoretic investigation of the following rule selection problem: given a set of rules specified by Horn formulas, and a pair of an input database and an output database, find a subset of the rules that minimizes the total error, that is, the number of false positive and false negative errors arising from the selected rules. We first establish computational hardness results for the decision problems underlying this minimization problem, as well as upper and lower bounds for its approximability. We then investigate a bi-objective optimization version of the rule selection problem in which both the total error and the size of the selected rules are taken into account. We show that testing for membership in the Pareto front of this bi-objective optimization problem is DP-complete. Finally, we show that a similar DP-completeness result holds for a bi-level optimization version of the rule selection problem, where one minimizes first the total error and then the size

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Optimizing Spatial Databases

Author: Anda VELICANU
Åžtefan OLARU
Publication venue
Publication date
Field of study

This paper describes the best way to improve the optimization of spatial databases: through spatial indexes. The most commune and utilized spatial indexes are R-tree and Quadtree and they are presented, analyzed and compared in this paper. Also there are given a few examples of queries that run in Oracle Spatial and are being supported by an R-tree spatial index. Spatial databases offer special features that can be very helpful when needing to represent such data. But in terms of storage and time costs, spatial data can require a lot of resources. This is why optimizing the database is one of the most important aspects when working with large volumes of data.Spatial Database, Spatial Index, R-tree, Quadtree, Optimization

Research Papers in Economics