Search CORE

7,513 research outputs found

The Digital Puglia Project: An Active Digital Library of Remote Sensing Data

Author: Aloisio Giovanni
Cafaro Massimo
Williams Roy
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/1999
Field of study

The growing need of software infrastructure able to create, maintain and ease the evolution of scientific data, promotes the development of digital libraries in order to provide the user with fast and reliable access to data. In a world that is rapidly changing, the standard view of a digital library as a data repository specialized to a community of users and provided with some search tools is no longer tenable. To be effective, a digital library should be an active digital library, meaning that users can process available data not just to retrieve a particular piece of information, but to infer new knowledge about the data at hand. Digital Puglia is a new project, conceived to emphasize not only retrieval of data to the client's workstation, but also customized processing of the data. Such processing tasks may include data mining, filtering and knowledge discovery in huge databases, compute-intensive image processing (such as principal component analysis, supervised classification, or pattern matching) and on demand computing sessions. We describe the issues, the requirements and the underlying technologies of the Digital Puglia Project, whose final goal is to build a high performance distributed and active digital library of remote sensing data

CiteSeerX

Caltech Authors

Archivio Istituzionale della Ricerca- Università del Salento

Efficient Incremental Breadth-Depth XML Event Mining

Author: Boussaïd Omar
Darmont Jérôme
Salem Rashed
Publication venue
Publication date: 01/01/2011
Field of study

Many applications log a large amount of events continuously. Extracting interesting knowledge from logged events is an emerging active research area in data mining. In this context, we propose an approach for mining frequent events and association rules from logged events in XML format. This approach is composed of two-main phases: I) constructing a novel tree structure called Frequency XML-based Tree (FXT), which contains the frequency of events to be mined; II) querying the constructed FXT using XQuery to discover frequent itemsets and association rules. The FXT is constructed with a single-pass over logged data. We implement the proposed algorithm and study various performance issues. The performance study shows that the algorithm is efficient, for both constructing the FXT and discovering association rules

arXiv.org e-Print Archive

Crossref

HAL

Adding Logical Operators to Tree Pattern Queries on Graph-Structured Data

Author: Jiang Xiaorui
Zeng Qiang
Zhuge Hai
Publication venue
Publication date: 16/04/2012
Field of study

As data are increasingly modeled as graphs for expressing complex relationships, the tree pattern query on graph-structured data becomes an important type of queries in real-world applications. Most practical query languages, such as XQuery and SPARQL, support logical expressions using logical-AND/OR/NOT operators to define structural constraints of tree patterns. In this paper, (1) we propose generalized tree pattern queries (GTPQs) over graph-structured data, which fully support propositional logic of structural constraints. (2) We make a thorough study of fundamental problems including satisfiability, containment and minimization, and analyze the computational complexity and the decision procedures of these problems. (3) We propose a compact graph representation of intermediate results and a pruning approach to reduce the size of intermediate results and the number of join operations -- two factors that often impair the efficiency of traditional algorithms for evaluating tree pattern queries. (4) We present an efficient algorithm for evaluating GTPQs using 3-hop as the underlying reachability index. (5) Experiments on both real-life and synthetic data sets demonstrate the effectiveness and efficiency of our algorithm, from several times to orders of magnitude faster than state-of-the-art algorithms in terms of evaluation time, even for traditional tree pattern queries with only conjunctive operations.Comment: 16 page

arXiv.org e-Print Archive

CiteSeerX

Coventry University Pure Portal

Modeling, Simulation and Emulation of Intelligent Domotic Environments

Author: Alonso
Barnett
Bonino
Bonino
Bonino
Bonino
Bonino
Booch
Carroll
Chung
Conte
Conte
Cook
Dario Bonino
Drusinsky
Fulvio Corno
Harel
HGI
Jimenez
Manesis
O. M. G. OMG
Saito
Sycara
Winer
Publication venue: Elsevier
Publication date: 01/01/2011
Field of study

Intelligent Domotic Environments are a promising approach, based on semantic models and commercially off-the-shelf domotic technologies, to realize new intelligent buildings, but such complexity requires innovative design methodologies and tools for ensuring correctness. Suitable simulation and emulation approaches and tools must be adopted to allow designers to experiment with their ideas and to incrementally verify designed policies in a scenario where the environment is partly emulated and partly composed of real devices. This paper describes a framework, which exploits UML2.0 state diagrams for automatic generation of device simulators from ontology-based descriptions of domotic environments. The DogSim simulator may simulate a complete building automation system in software, or may be integrated in the Dog Gateway, allowing partial simulation of virtual devices alongside with real devices. Experiments on a real home show that the approach is feasible and can easily address both simulation and emulation requirement

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking

Author: C Luo
DM Blei
J Leskovec
J Leskovec
J-M Fourneau
LA Barroso
T Rabl
Z Jia
Publication venue
Publication date: 26/02/2014
Field of study

Data generation is a key issue in big data benchmarking that aims to generate application-specific data sets to meet the 4V requirements of big data. Specifically, big data generators need to generate scalable data (Volume) of different types (Variety) under controllable generation rates (Velocity) while keeping the important characteristics of raw data (Veracity). This gives rise to various new challenges about how we design generators efficiently and successfully. To date, most existing techniques can only generate limited types of data and support specific big data systems such as Hadoop. Hence we develop a tool, called Big Data Generator Suite (BDGS), to efficiently generate scalable big data while employing data models derived from real data to preserve data veracity. The effectiveness of BDGS is demonstrated by developing six data generators covering three representative data types (structured, semi-structured and unstructured) and three data sources (text, graph, and table data)

arXiv.org e-Print Archive

Crossref

Synthetic Data Generation for the Internet of Things

Author: Anderson Jason
Apon Amy
Kennedy K. E.
Luckow Andre
Ngo Linh B.
Publication venue: Clemson University Libraries
Publication date: 01/10/2014
Field of study

The concept of Internet of Things (IoT) is rapidly moving from a vision to being pervasive in our everyday lives. This can be observed in the integration of connected sensors from a multitude of devices such as mobile phones, healthcare equipment, and vehicles. There is a need for the development of infrastructure support and analytical tools to handle IoT data, which are naturally big and complex. But, research on IoT data can be constrained by concerns about the release of privately owned data. In this paper, we present the design and implementation results of a synthetic IoT data generation framework. The framework enables research on synthetic data that exhibit the complex characteristics of original data without compromising proprietary information and personal privacy

Crossref

Clemson University: TigerPrints