919 research outputs found

    Heap Abstractions for Static Analysis

    Full text link
    Heap data is potentially unbounded and seemingly arbitrary. As a consequence, unlike stack and static memory, heap memory cannot be abstracted directly in terms of a fixed set of source variable names appearing in the program being analysed. This makes it an interesting topic of study and there is an abundance of literature employing heap abstractions. Although most studies have addressed similar concerns, their formulations and formalisms often seem dissimilar and some times even unrelated. Thus, the insights gained in one description of heap abstraction may not directly carry over to some other description. This survey is a result of our quest for a unifying theme in the existing descriptions of heap abstractions. In particular, our interest lies in the abstractions and not in the algorithms that construct them. In our search of a unified theme, we view a heap abstraction as consisting of two features: a heap model to represent the heap memory and a summarization technique for bounding the heap representation. We classify the models as storeless, store based, and hybrid. We describe various summarization techniques based on k-limiting, allocation sites, patterns, variables, other generic instrumentation predicates, and higher-order logics. This approach allows us to compare the insights of a large number of seemingly dissimilar heap abstractions and also paves way for creating new abstractions by mix-and-match of models and summarization techniques.Comment: 49 pages, 20 figure

    Growth of relational model: Interdependence and complementary to big data

    Get PDF
    A database management system is a constant application of science that provides a platform for the creation, movement, and use of voluminous data. The area has witnessed a series of developments and technological advancements from its conventional structured database to the recent buzzword, bigdata. This paper aims to provide a complete model of a relational database that is still being widely used because of its well known ACID properties namely, atomicity, consistency, integrity and durability. Specifically, the objective of this paper is to highlight the adoption of relational model approaches by bigdata techniques. Towards addressing the reason for this in corporation, this paper qualitatively studied the advancements done over a while on the relational data model. First, the variations in the data storage layout are illustrated based on the needs of the application. Second, quick data retrieval techniques like indexing, query processing and concurrency control methods are revealed. The paper provides vital insights to appraise the efficiency of the structured database in the unstructured environment, particularly when both consistency and scalability become an issue in the working of the hybrid transactional and analytical database management system

    Scalable Automated Incrementalization for Real-Time Static Analyses

    Get PDF
    This thesis proposes a framework for easy development of static analyses, whose results are incrementalized to provide instantaneous feedback in an integrated development environment (IDE). Today, IDEs feature many tools that have static analyses as their foundation to assess software quality and catch correctness problems. Yet, these tools often fail to provide instantaneous feedback and are thus restricted to nightly build processes. This precludes developers from fixing issues at their inception time, i.e., when the problem and the developed solution are both still fresh in mind. In order to provide instantaneous feedback, incrementalization is a well-known technique that utilizes the fact that developers make only small changes to the code and, hence, analysis results can be re-computed fast based on these changes. Yet, incrementalization requires carefully crafted static analyses. Thus, a manual approach to incrementalization is unattractive. Automated incrementalization can alleviate these problems and allows analyses writers to formulate their analyses as queries with the full data set in mind, without worrying over the semantics of incremental changes. Existing approaches to automated incrementalization utilize standard technologies, such as deductive databases, that provide declarative query languages, yet also require to materialize the full dataset in main-memory, i.e., the memory is permanently blocked by the data required for the analyses. Other standard technologies such as relational databases offer better scalability due to persistence, yet require large transaction times for data. Both technologies are not a perfect match for integrating static analyses into an IDE, since the underlying data, i.e., the code base, is already persisted and managed by the IDE. Hence, transitioning the data into a database is redundant work. In this thesis a novel approach is proposed that provides a declarative query language and automated incrementalization, yet retains in memory only a necessary minimum of data, i.e., only the data that is required for the incrementalization. The approach allows to declare static analyses as incrementally maintained views, where the underlying formalism for incrementalization is the relational algebra with extensions for object-orientation and recursion. The algebra allows to deduce which data is the necessary minimum for incremental maintenance and indeed shows that many views are self-maintainable, i.e., do not require to materialize memory at all. In addition an optimization for the algebra is proposed that allows to widen the range of self-maintainable views, based on domain knowledge of the underlying data. The optimization works similar to declaring primary keys for databases, i.e., the optimization is declared on the schema of the data, and defines which data is incrementally maintained in the same scope. The scope makes all analyses (views) that correlate only data within the boundaries of the scope self-maintainable. The approach is implemented as an embedded domain specific language in a general-purpose programming language. The implementation can be understood as a database-like engine with an SQL-style query language and the execution semantics of the relational algebra. As such the system is a general purpose database-like query engine and can be used to incrementalize other domains than static analyses. To evaluate the approach a large variety of static analyses were sampled from real-world tools and formulated as incrementally maintained views in the implemented engine

    A User-driven Annotation Framework for Scientific Data

    Get PDF
    Annotations play an increasingly crucial role in scientific exploration and discovery, as the amount of data and the level of collaboration among scientists increases. There are many systems today focusing on annotation management, querying, and propagation. Although all such systems are implemented to take user input (i.e., the annotations themselves), very few systems are user-driven, taking into account user preferences on how annotations should be propagated and applied over data. In this thesis, we propose to treat annotations as first-class citizens for scientific data by introducing a user-driven, view-based annotation framework. Under this framework, we try to resolve two critical questions: Firstly, how do we support annotations that are scalable both from a system point of view and also from a user point of view? Secondly, how do we support annotation queries both from an annotator point of view and a user point of view, in an efficient and accurate way? To address these challenges, we propose the VIew-base annotation Propagation (ViP) framework to empower users to express their preferences over the time semantics of annotations and over the network semantics of annotations, and define three query types for annotations. To efficiently support such novel functionality, ViP utilizes database views and introduces new annotation caching techniques. The use of views also brings a more compact representation of annotations, making our system easier to scale. Through an extensive experimental study on a real system (with both synthetic and real data), we show that the ViP framework can seamlessly introduce user-driven annotation propagation semantics while at the same time significantly improving the performance (in terms of query execution time) over the current state of the art

    Erweiterung von Informationssystemen um Event-Handling - Ein Nicht-Invasiver Ansatz

    Get PDF
    Due to the immense advance of widely accessible information systems in industrial applications, science, education and every day use, it becomes more and more difficult for users of those information systems to keep track with new and updated information. An approach to cope with this problem is to go beyond traditional search facilities and instead use the users' profiles to monitor data changes and to actively inform them about these updates - an aspect that has to be explicitly developed and integrated into a variety of information systems. This is traditionally done in an individual way, depending on the application and its platform. In this dissertation, we present a novel approach to model the semantic interrelations that specify which users to inform about which updates, based on the underlying model of the respective information system. For the first time, a meta-model that allows information system designers to tag an arbitrary data model and thus specify the event-handling semantics is presented. A formal specification of how to interpret meta-models to determine the receivers of the events completes the presented concept. For the practical realization of this new concept, model driven architecture (MDA) shows to be an ideal technical means. Using our newly developed UML profile based on data-modelling standards, an implementation of the event-handling specification can automatically be generated for a variety of different target platforms, like e.g. relational databases, using triggers. This meta-approach makes the proposed solution ideal with respect to maintainability and genericity. Our solution significantly reduces the overall development efforts for an event-handling facility. In addition, the enhanced model of the information system can be used to generate an implementation that also fulfils non-functional requirements like high performance and extensibility. The overall framework, consisting of the domain specific language (i.e. the meta-model), formal and technical transformations of how to interpret the enhanced information system model and a cost-based optimizing strategy, constitutes an integrated approach, offering several advantages over traditional implementation techniques: our framework can be applied to new information systems as well as to legacy applications without having to modify existing systems; it offers an extensible, easy-to-use, generic and thus re-usable solution and it can be tailored to and optimized for many use cases, as the practical evaluation presented in this dissertation verifies.Bedingt durch die immer stärkere Durchdringung rechnergestützter Informationssysteme in Industrie, Forschung, Ausbildung und anderen Bereichen des täglichen Lebens wird es für Anwender immer schwieriger, für sie relevante Änderungen an den dort gespeicherten Datenbeständen nachzuverfolgen. Dem wird häufig dadurch begegnet, dass über die Fähigkeiten traditioneller Suchmöglichkeiten hinaus gegangen wird und Profile der Anwender verwendet werden, um sie aktiv über relevante Änderungen zu informieren. Dieser Aspekt muss für unterschiedlichste Informationssysteme explizit entwickelt und integriert werden, zudem meist abhängig von der fachlichen Domäne der Anwendung und deren Plattform. In dieser Dissertation präsentieren wir einen neuartigen Ansatz, mit dessen Hilfe die semantischen Vorgaben, welche Anwender über welche Änderungen informiert werden sollen, ausgehend vom zugrunde liegenden Datenmodell der Anwendung des jeweiligen Systems modelliert werden können. Erstmalig wird ein Meta-Modell vorgestellt, das Entwicklern und Architekten ermöglicht, ein beliebiges Modell eines Informationssystems mit zusätzlichen Informationen auszuzeichnen und damit die Semantik der Event-Handling-Komponente vorzugeben. Zudem wird ein formales Konzept präsentiert, das spezifiziert wie diese Auszeichnungen für die Bestimmung der Informationsempfänger zu interpretieren sind. Im Hinblick auf die Realisierung dieses Konzepts erweist sich Model Driven Architecture (MDA) als ideales technisches Mittel. Mit Hilfe eines eigens entwickelten UML Profils, das sich auf existierende Standards zur Datenmodellierung stützt, kann automatisch eine Implementierung der Event-Handling-Komponenten für eine Vielzahl unterschiedlichster Zielplattformen generiert werden. Als Beispiel wäre die Verwendung relationaler Datenbanken zusammen mit Datenbanktriggern zu nennen. Dieser Ansatz stellt eine ideale Lösung im Hinblick auf Wartbarkeit und Allgemeingültigkeit dar, wodurch auch der Entwicklungsaufwand minimiert wird. Zudem bietet unser Ansatz auch die Möglichkeit, bei der Implementierung dieser Komponente auch nicht-funktionale Anforderungen - wie beispielsweise möglichst optimale Performanz und Erweiterbarkeit - zu erfüllen. Das hier präsentierte Framework, bestehend aus der domänen-spezifischen Sprache (in Form des Meta-Modells), den formalen und technischen Transformationsvorschriften für die Interpretation der Spezifikation sowie einer kostenbasierten Optimierungsstrategie, stellt einen integrierten Ansatz dar, der im Vergleich zu traditionellen Ansätzen einige Vorteile bietet: so kann dieser Ansatz ohne Modifikation existierender Systeme verwendet werden, stellt eine erweiterbare, einfach benutzbare, und zugleich wiederverwendbare Lösung dar und kann für beliebige Anwendungsfälle maßgeschneidert und optimiert werden, wie die Evaluation unserer Lösung anhand echter Szenarien in dieser Dissertation zeigt

    Data freshness and data accuracy :a state of the art

    Get PDF
    In a context of Data Integration Systems (DIS) providing access to large amounts of data extracted and integrated from autonomous data sources, users are highly concerned about data quality. Traditionally, data quality is characterized via multiple quality factors. Among the quality dimensions that have been proposed in the literature, this report analyzes two main ones: data freshness and data accuracy. Concretely, we analyze the various definitions of both quality dimensions, their underlying metrics and the features of DIS that impact their evaluation. We present a taxonomy of existing works proposed for dealing with both quality dimensions in several kinds of DIS and we discuss open research problems

    Partial Computation in Real-Time Database Systems: A Research Plan

    Get PDF
    State-of-the-art database management systems are inappropriate for real-time applications due to their lack of speed and predictability of response. To combat these problems, the scheduler needs to be able to take advantage of the vast quantity of semantic and timing information that is typically available in such systems. Furthermore, to improve predictability of response, the system should be capable of providing a partial, but correct, response in a timely manner. We therefore propose to develop a semantics for real-time database systems that incorporates temporal knowledge of data-objects, their validity, and computation using their values. This temporal knowledge should include not just historical information but future knowledge of when to expect values to appear. This semantics will be used to develop a notion of approximate or partial computation, and to develop schedulers appropriate for real-time transactions

    Provenance in Collaborative Data Sharing

    Get PDF
    This dissertation focuses on recording, maintaining and exploiting provenance information in Collaborative Data Sharing Systems (CDSS). These are systems that support data sharing across loosely-coupled, heterogeneous collections of relational databases related by declarative schema mappings. A fundamental challenge in a CDSS is to support the capability of update exchange --- which publishes a participant\u27s updates and then translates others\u27 updates to the participant\u27s local schema and imports them --- while tolerating disagreement between them and recording the provenance of exchanged data, i.e., information about the sources and mappings involved in their propagation. This provenance information can be useful during update exchange, e.g., to evaluate provenance-based trust policies. It can also be exploited after update exchange, to answer a variety of user queries, about the quality, uncertainty or authority of the data, for applications such as trust assessment, ranking for keyword search over databases, or query answering in probabilistic databases. To address these challenges, in this dissertation we develop a novel model of provenance graphs that is informative enough to satisfy the needs of CDSS users and captures the semantics of query answering on various forms of annotated relations. We extend techniques from data integration, data exchange, incremental view maintenance and view update to define the formal semantics of unidirectional and bidirectional update exchange. We develop algorithms to perform update exchange incrementally while maintaining provenance information. We present strategies for implementing our techniques over an RDBMS and experimentally demonstrate their viability in the Orchestra prototype system. We define ProQL, a query language for provenance graphs that can be used by CDSS users to combine data querying with provenance testing as well as to compute annotations for their data, based on their provenance, that are useful for a variety of applications. Finally, we develop a prototype implementation ProQL over an RDBMS and indexing techniques to speed up provenance querying, evaluate experimentally the performance of provenance querying and the benefits of our indexing techniques

    Multi-Schema-Version Data Management

    Get PDF

    Incremental materialization of object-oriented views

    Get PDF
    We present an approach to handle incremental materialization of object-oriented views. Queries that define views are implemented as methods that are invoked to compute corresponding views. To avoid computation from scratch each time a view is accessed, we introduce some deferred update algorithms that reflect for a view only related modifications introduced into the database while that view was inactive. A view is updated by considering modifications performed within all classes along the inheritance and class-composition subhierarchies rooted at every class used in deriving that view. To each class, we add a modification list to keep one modification tuple per view dependent on that class. Such a tuple acts as a reference point that marks the start of the next update to the corresponding view. © 1999 Elsevier Science B.V. All rights reserved
    • …
    corecore