Search CORE

4,644 research outputs found

Designing a resource-efficient data structure for mobile data systems

Author: Gourlay Richard Scott
Publication venue
Publication date: 01/07/2006
Field of study

Designing data structures for use in mobile devices requires attention on optimising data volumes with associated benefits for data transmission, storage space and battery use. For semi-structured data, tree summarisation techniques can be used to reduce the volume of structured elements while dictionary compression can efficiently deal with value-based predicates. This project seeks to investigate and evaluate an integration of the two approaches. The key strength of this technique is that both structural and value predicates could be resolved within one graph while further allowing for compression of the resulting data structure. As the current trend is towards the requirement for working with larger semi-structured data sets this work would allow for the utilisation of much larger data sets whilst reducing requirements on bandwidth and minimising the memory necessary both for the storage and querying of the data

Automated Fixing of Programs with Contracts

Author: Furia Carlo A.
Meyer Bertrand
Nordio Martin
Pei Yu
Wei Yi
Zeller Andreas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

This paper describes AutoFix, an automatic debugging technique that can fix faults in general-purpose software. To provide high-quality fix suggestions and to enable automation of the whole debugging process, AutoFix relies on the presence of simple specification elements in the form of contracts (such as pre- and postconditions). Using contracts enhances the precision of dynamic analysis techniques for fault detection and localization, and for validating fixes. The only required user input to the AutoFix supporting tool is then a faulty program annotated with contracts; the tool produces a collection of validated fixes for the fault ranked according to an estimate of their suitability. In an extensive experimental evaluation, we applied AutoFix to over 200 faults in four code bases of different maturity and quality (of implementation and of contracts). AutoFix successfully fixed 42% of the faults, producing, in the majority of cases, corrections of quality comparable to those competent programmers would write; the used computational resources were modest, with an average time per fix below 20 minutes on commodity hardware. These figures compare favorably to the state of the art in automated program fixing, and demonstrate that the AutoFix approach is successfully applicable to reduce the debugging burden in real-world scenarios.Comment: Minor changes after proofreadin

arXiv.org e-Print Archive

Principles and Implementation of Deductive Parsing

Author: Pereira Fernando C. N.
Schabes Yves
Shieber Stuart M.
Publication venue
Publication date: 01/01/1994
Field of study

We present a system for generating parsers based directly on the metaphor of parsing as deduction. Parsing algorithms can be represented directly as deduction systems, and a single deduction engine can interpret such deduction systems so as to implement the corresponding parser. The method generalizes easily to parsers for augmented phrase structure formalisms, such as definite-clause grammars and other logic grammar formalisms, and has been used for rapid prototyping of parsing algorithms for a variety of formalisms including variants of tree-adjoining grammars, categorial grammars, and lexicalized context-free grammars.Comment: 69 pages, includes full Prolog cod

arXiv.org e-Print Archive

CiteSeerX

Management of Big Annotations in Relational Database Management Systems

Author: Ibrahim Karim
Publication venue: Digital WPI
Publication date: 24/04/2014
Field of study

Annotations play a key role in understanding and describing the data, and annotation management has become an integral component in most emerging applications such as scientific databases. Scientists need to exchange not only data but also their thoughts, comments and annotations on the data as well. Annotations represent comments, Lineage of data, description and much more. Therefore, several annotation management techniques have been proposed to efficiently and abstractly handle the annotations. However, with the increasing scale of collaboration and the extensive use of annotations among users and scientists, the number and size of the annotations may far exceed the size of the original data itself. However, current annotation management techniques donâ€™t address large scale annotation management. In this work, we propose three chapters to that tackle the Big annotations from three different perspectives (1) User-Centric Annotation Propagation, (2) Proactive Annotation Management and (3) InsightNotes Summary-Based Querying. We capture users\u27 preferences in profiles and personalizes the annotation propagation at query time by reporting the most relevant annotations (per tuple) for each user based on time plan. We provide three Time-Based plans, support static and dynamic profiles for each user. We support a proactive annotation management which suggests data tuples to be annotated in case new annotation has a reference to a data value and user doesnâ€™t annotate the data precisely. Moreover, we provide an extension on the InsightNotes: Summary-Based Annotation Management in Relational Databases by adding query language that enable the user to query the annotation summaries and add predicates on the annotation summaries themselves. Our system is implemented inside PostgreSQL

Obvious: a meta-toolkit to encapsulate information visualization toolkits. One toolkit to bind them all

Author: Baudel T.
Fekete J.
Hemery P.
Wood J.
Publication venue
Publication date: 01/01/2011
Field of study

This article describes “Obvious”: a meta-toolkit that abstracts and encapsulates information visualization toolkits implemented in the Java language. It intends to unify their use and postpone the choice of which concrete toolkit(s) to use later-on in the development of visual analytics applications. We also report on the lessons we have learned when wrapping popular toolkits with Obvious, namely Prefuse, the InfoVis Toolkit, partly Improvise, JUNG and other data management libraries. We show several examples on the uses of Obvious, how the different toolkits can be combined, for instance sharing their data models. We also show how Weka and RapidMiner, two popular machine-learning toolkits, have been wrapped with Obvious and can be used directly with all the other wrapped toolkits. We expect Obvious to start a co-evolution process: Obvious is meant to evolve when more components of Information Visualization systems will become consensual. It is also designed to help information visualization systems adhere to the best practices to provide a higher level of interoperability and leverage the domain of visual analytics

HAL-CentraleSupelec

CiteSeerX

City Research Online

INRIA a CCSD electronic archive server

HAL-Rennes 1

Efficient similarity-based operations for data integration

Author: Schallehn Eike
Publication venue: Universitätsbibliothek
Publication date: 01/01/2004
Field of study

Similarity-based operations, similarity join, similarity grouping, data integrationMagdeburg, Univ., Fak. für Informatik, Diss., 2004von Eike Schalleh

Digital University Library Saxony-Anhalt

Clustering-Based Pre-Processing Approaches To Improve Similarity Join Techniques

Author: Tan Yufen
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2010
Field of study

Research on similarity join techniques is becoming one of the growing practical areas for study, especially with the increasing E-availability of vast amounts of digital data from more and more source systems. This research is focused on pre-processing clustering-based techniques to improve existing similarity join approaches. Identifying and extracting the same real-world entities from different data sources is still a big challenge and a significant task in the digital information era. Dissimilar extracts may indeed represent the same real-world entity because of inconsistent values and naming conventions, incorrect or missing data values, or incomplete information. Therefore discovering efficient and accurate approaches to determine the similarity of data objects or values is of theoretical as well as practical significance. Semantic problems are raised even on the concept of similarity regarding its usage and foundation. Existing similarity join approaches often have a very specific view of similarity measures and pre-defined predicates that represent a narrow focus on the context of similarity for a given scenario. The predicates have been assumed to be a group of clustering [MSW 72] related attributes on the join. To identify those entities for data integration purposes requires a broader view of similarity; for instance a number of generic similarity measures are useful in a given data integration systems. This study focused on string similarity join, namely based on the Levenshtein or edit distance and Q-gram. Proposed effective and efficient pre-processing clustering-based techniques were the focus of this study to identify clustering related predicates based on either attribute value or data value that improve existing similarity join techniques in enterprise data integration scenarios

Digital Commons@Wayne State University