Search CORE

10,307 research outputs found

Towards a Holistic Integration of Spreadsheets with Databases: A Scalable Storage Engine for Presentational Data Management

Author: Bendre Mangesh
Chang Kevin
Parameswaran Aditya
Venkataraman Vipul
Zhou Xinyan
Publication venue
Publication date: 05/10/2017
Field of study

Spreadsheet software is the tool of choice for interactive ad-hoc data management, with adoption by billions of users. However, spreadsheets are not scalable, unlike database systems. On the other hand, database systems, while highly scalable, do not support interactivity as a first-class primitive. We are developing DataSpread, to holistically integrate spreadsheets as a front-end interface with databases as a back-end datastore, providing scalability to spreadsheets, and interactivity to databases, an integration we term presentational data management (PDM). In this paper, we make a first step towards this vision: developing a storage engine for PDM, studying how to flexibly represent spreadsheet data within a database and how to support and maintain access by position. We first conduct an extensive survey of spreadsheet use to motivate our functional requirements for a storage engine for PDM. We develop a natural set of mechanisms for flexibly representing spreadsheet data and demonstrate that identifying the optimal representation is NP-Hard; however, we develop an efficient approach to identify the optimal representation from an important and intuitive subclass of representations. We extend our mechanisms with positional access mechanisms that don't suffer from cascading update issues, leading to constant time access and modification performance. We evaluate these representations on a workload of typical spreadsheets and spreadsheet operations, providing up to 20% reduction in storage, and up to 50% reduction in formula evaluation time

arXiv.org e-Print Archive

Crossref

Preprocessing and Feature Selection on Group Structure Analysis using Entropy and Thresholding

Author: Ms. Prajakta Deshmukh, Prof. Jayant Adhikari
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/06/2017
Field of study

Many real data increase dynamically in size. We have been observing in many fields that data grow with time in size. This has led to the development of several new analytic techniques. This phenomenon occurs in several fields including economics, population studies, and medical research. As an effective and efficient mechanism to deal with such data, incremental technique has been proposed in the literature and attracted much attention, which stimulates the result in this paper. When a group of objects are added to a decision table, we first introduce incremental mechanisms for three representative information entropies and then develop a group incremental rough feature selection algorithm based on information entropy.When multiple objects are added to a decision table, the algorithm aims to find the new feature subset in a much shorter time. Experiments have been carried out on eight UCI data sets and the experimental results show that the algorithm is effective and efficient

International Journal on Recent and Innovation Trends in Computing and Communication

Incremental Predictive Process Monitoring: How to Deal with the Variability of Real Environments

Author: Di Francescomarino Chiara
Ghidini Chiara
Maggi Fabrizio Maria
Persia Cosimo Damiano
Rizzi Williams
Publication venue
Publication date: 11/04/2018
Field of study

A characteristic of existing predictive process monitoring techniques is to first construct a predictive model based on past process executions, and then use it to predict the future of new ongoing cases, without the possibility of updating it with new cases when they complete their execution. This can make predictive process monitoring too rigid to deal with the variability of processes working in real environments that continuously evolve and/or exhibit new variant behaviors over time. As a solution to this problem, we propose the use of algorithms that allow the incremental construction of the predictive model. These incremental learning algorithms update the model whenever new cases become available so that the predictive model evolves over time to fit the current circumstances. The algorithms have been implemented using different case encoding strategies and evaluated on a number of real and synthetic datasets. The results provide a first evidence of the potential of incremental learning strategies for predicting process monitoring in real environments, and of the impact of different case encoding strategies in this setting

arXiv.org e-Print Archive

Metric based attribute reduction in dynamic desicion tables

Author: Demetrovics János
Giang Nguyen Long
Thi Vu Duc
Publication venue: 'Gazi Egitim Faukeltesi Dergisi'
Publication date: 01/01/2014
Field of study

SZTAKI Publication Repository

Active Sample Selection Based Incremental Algorithm for Attribute Reduction with Rough Sets

Author: Chen Degang
Wang Hui
Yang Yanyan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2017
Field of study

Attribute reduction with rough sets is an effective technique for obtaining a compact and informative attribute set from a given dataset. However, traditional algorithms have no explicit provision for handling dynamic datasets where data present themselves in successive samples. Incremental algorithms for attribute reduction with rough sets have been recently introduced to handle dynamic datasets with large samples, though they have high complexity in time and space. To address the time/space complexity issue of the algorithms, this paper presents a novel incremental algorithm for attribute reduction with rough sets based on the adoption of an active sample selection process and an insight into the attribute reduction process. This algorithm first decides whether each incoming sample is useful with respect to the current dataset by the active sample selection process. A useless sample is discarded while a useful sample is selected to update a reduct. At the arrival of a useful sample, the attribute reduction process is then employed to guide how to add and/or delete attributes in the current reduct. The two processes thus constitute the theoretical framework of our algorithm. The proposed algorithm is finally experimentally shown to be efficient in time and space

Digital Repository @ Iowa State University (ISU)

Queen's University Belfast Research Portal

Ulster University's Research Portal

Improving Data Quality: Consistency and Accuracy

Author: Cong Gao
Fan Wenfei
Geerts Floris
Jia Xibei
Ma Shuai
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Institutional Repository Universiteit Antwerpen