238 research outputs found
Attribute-Level Versioning: A Relational Mechanism for Version Storage and Retrieval
Data analysts today have at their disposal a seemingly endless supply of data and repositories hence, datasets from which to draw. New datasets become available daily thus making the choice of which dataset to use difficult. Furthermore, traditional data analysis has been conducted using structured data repositories such as relational database management systems (RDBMS). These systems, by their nature and design, prohibit duplication for indexed collections forcing analysts to choose one value for each of the available attributes for an item in the collection. Often analysts discover two or more datasets with information about the same entity. When combining this data and transforming it into a form that is usable in an RDBMS, analysts are forced to deconflict the collisions and choose a single value for each duplicated attribute containing differing values. This deconfliction is the source of a considerable amount of guesswork and speculation on the part of the analyst in the absence of professional intuition. One must consider what is lost by discarding those alternative values. Are there relationships between the conflicting datasets that have meaning? Is each dataset presenting a different and valid view of the entity or are the alternate values erroneous? If so, which values are erroneous? Is there a historical significance of the variances? The analysis of modern datasets requires the use of specialized algorithms and storage and retrieval mechanisms to identify, deconflict, and assimilate variances of attributes for each entity encountered. These variances, or versions of attribute values, contribute meaning to the evolution and analysis of the entity and its relationship to other entities. A new, distinct storage and retrieval mechanism will enable analysts to efficiently store, analyze, and retrieve the attribute versions without unnecessary complexity or additional alterations of the original or derived dataset schemas. This paper presents technologies and innovations that assist data analysts in discovering meaning within their data and preserving all of the original data for every entity in the RDBMS
A theory and model for the evolution of software services
Software services are subject to constant change and variation. To control service development, a service developer needs to know why a change was made, what are its implications and whether the change is complete. Typically, service clients do not perceive the upgraded service immediately. As a consequence, service-based applications may fail on the service client side due to changes carried out during a provider service upgrade. In order to manage changes in a meaningful and effective manner service clients must therefore be considered when service changes are introduced at the service provider's side. Otherwise such changes will most certainly result in severe application disruption. Eliminating spurious results and inconsistencies that may occur due to uncontrolled changes is therefore a necessary condition for the ability of services to evolve gracefully, ensure service stability, and handle variability in their behavior. Towards this goal, this work presents a model and a theoretical framework for the compatible evolution of services based on well-founded theories and techniques from a number of disparate fields.
Design and Implementation of a Multi-Purpose Object-Orientated Spatio-Temporal (MPooST) Data Model for Cadastral and Land Information Systems (C/LIS)
The application of the object-oriented methodology in geospatial information management has significantly increased during the last 10 years and tends to gradually replace the status quo relational technology. In general, object orientation offers a flexible and adaptable modelling framework to satisfy the most demanding complex data structuring requirements. The objective of this thesis is to determine how a modern Land Information System used for cadastral purposes can benefit from an object-oriented methodology. To this aim, a Multi-Purpose, Object-Oriented Spatio-Temporal (abbreviated as MPOOST) data model has been developed. In brief, the MPOOST data model embodies spatial data and their temporal reference in the form of objects which contain their attributes as well as their behaviour. The design of the MPOOST data model has been specified in such a way that it enables other data models to exploit its functionality, therefore enabling the multi-purpose aspect. At first, the requirements of Land Information Systems are being examined. Next, the functionality that is offered by the object-oriented methodology is being analysed in detail. Even if the bibliography is quite rich in relevant research, however there seems to be no starting point regarding the application of OO in LIS. Hence, a whole chapter of this thesis has been dedicated in an extended bibliographic research. Finally, the OO methodology is applied for the design and implementation of the MPOOST data model. The outcome of the design and the implementation is the first version of the MPOOST data model written using the Java object-oriented programming language. In this way, it is proven that: the relational technology has significant drawbacks which prohibit it from being applied in conceptually demanding information systems; and that object-orientation can fully satisfy the most complex data structuring requirements posed in modern geographic information systems
Introducing the new paradigm of Social Dispersed Computing: Applications, Technologies and Challenges
[EN] If last decade viewed computational services as a utility then surely
this decade has transformed computation into a commodity. Computation
is now progressively integrated into the physical networks in
a seamless way that enables cyber-physical systems (CPS) and the
Internet of Things (IoT) meet their latency requirements. Similar to
the concept of ¿platform as a service¿ or ¿software as a service¿, both
cloudlets and fog computing have found their own use cases. Edge
devices (that we call end or user devices for disambiguation) play the
role of personal computers, dedicated to a user and to a set of correlated
applications. In this new scenario, the boundaries between
the network node, the sensor, and the actuator are blurring, driven
primarily by the computation power of IoT nodes like single board
computers and the smartphones. The bigger data generated in this
type of networks needs clever, scalable, and possibly decentralized
computing solutions that can scale independently as required. Any
node can be seen as part of a graph, with the capacity to serve as a
computing or network router node, or both. Complex applications can
possibly be distributed over this graph or network of nodes to improve
the overall performance like the amount of data processed over time.
In this paper, we identify this new computing paradigm that we call
Social Dispersed Computing, analyzing key themes in it that includes
a new outlook on its relation to agent based applications. We architect
this new paradigm by providing supportive application examples that
include next generation electrical energy distribution networks, next
generation mobility services for transportation, and applications for
distributed analysis and identification of non-recurring traffic congestion
in cities. The paper analyzes the existing computing paradigms
(e.g., cloud, fog, edge, mobile edge, social, etc.), solving the ambiguity
of their definitions; and analyzes and discusses the relevant foundational
software technologies, the remaining challenges, and research
opportunities.Garcia Valls, MS.; Dubey, A.; Botti, V. (2018). Introducing the new paradigm of Social Dispersed Computing: Applications, Technologies and Challenges. Journal of Systems Architecture. 91:83-102. https://doi.org/10.1016/j.sysarc.2018.05.007S831029
Knowledge Graphs Evolution and Preservation -- A Technical Report from ISWS 2019
One of the grand challenges discussed during the Dagstuhl Seminar "Knowledge
Graphs: New Directions for Knowledge Representation on the Semantic Web" and
described in its report is that of a: "Public FAIR Knowledge Graph of
Everything: We increasingly see the creation of knowledge graphs that capture
information about the entirety of a class of entities. [...] This grand
challenge extends this further by asking if we can create a knowledge graph of
"everything" ranging from common sense concepts to location based entities.
This knowledge graph should be "open to the public" in a FAIR manner
democratizing this mass amount of knowledge." Although linked open data (LOD)
is one knowledge graph, it is the closest realisation (and probably the only
one) to a public FAIR Knowledge Graph (KG) of everything. Surely, LOD provides
a unique testbed for experimenting and evaluating research hypotheses on open
and FAIR KG. One of the most neglected FAIR issues about KGs is their ongoing
evolution and long term preservation. We want to investigate this problem, that
is to understand what preserving and supporting the evolution of KGs means and
how these problems can be addressed. Clearly, the problem can be approached
from different perspectives and may require the development of different
approaches, including new theories, ontologies, metrics, strategies,
procedures, etc. This document reports a collaborative effort performed by 9
teams of students, each guided by a senior researcher as their mentor,
attending the International Semantic Web Research School (ISWS 2019). Each team
provides a different perspective to the problem of knowledge graph evolution
substantiated by a set of research questions as the main subject of their
investigation. In addition, they provide their working definition for KG
preservation and evolution
Snapshot : friend or foe of data management - on optimizing transaction processing in database and blockchain systems
Data management is a complicated task. Due to a wide range of data management tasks, businesses often need a sophisticated data management infrastructure with a plethora of distinct systems to fulfill their requirements. Moreover, since snapshot is an essential ingredient in solving many data management tasks such as checkpointing and recovery, they have been widely exploited in almost all major data management systems that have appeared in recent years. However, snapshots do not always guarantee exceptional performance. In this dissertation, we will see two different faces of the snapshot, one where it has a tremendous positive impact on the performance and usability of the system, and another where an incorrect usage of the snapshot might have a significant negative impact on the performance of the system. This dissertation consists of three loosely-coupled parts that represent three distinct projects that emerged during this doctoral research. In the first part, we analyze the importance of utilizing snapshots in relational database systems. We identify the bottlenecks in state-of-the-art snapshotting algorithms, propose two snapshotting techniques, and optimize the multi-version concurrency control for handling hybrid workloads effectively. Our snapshotting algorithm is up to 100x faster and reduces the latency of analytical queries by up to 4x in comparison to the state-of-the-art techniques. In the second part, we recognize strict snapshotting used by Fabric as a critical bottleneck, and replace it with MVCC and propose some additional optimizations to improve the throughput of the permissioned-blockchain system by up to 12x under highly contended workloads. In the last part, we propose ChainifyDB, a platform that transforms an existing database infrastructure into a blockchain infrastructure. ChainifyDB achieves up to 6x higher throughput in comparison to another state-of-the-art permissioned blockchain system. Furthermore, its external concurrency control protocol outperforms the internal concurrency control protocol of PostgreSQL and MySQL, achieving up to 2.6x higher throughput in a blockchain setup in comparison to a standalone isolated setup. We also utilize snapshots in ChainifyDB to support recovery, which has been missing so far from the permissioned-blockchain world.Datenverwaltung ist eine komplizierte Aufgabe. Aufgrund der vielfältigen Aufgaben im Bereich der Datenverwaltung benötigen Unternehmen häufig eine anspruchsvolle Infrastruktur mit einer Vielzahl an unterschiedlichen Systemen, um ihre Anforderungen zu erfüllen. Dabei ist Snapshotting ein wesentlicher Bestandteil in nahezu allen aktuellen Datenbanksystemen, um Probleme wie Checkpointing und Recovery zu lösen. Allerdings garantieren Snapshots nicht immer eine gute Performance. In dieser Arbeit werden wir zwei Facetten des Snapshots beleuchten: Einerseits können Snapshots enorm positive Auswirkungen auf die Performance und Usability des Systems haben, andererseits können sie bei falscher Anwendung zu erheblichen Performanceverlusten führen. Diese Dissertation besteht aus drei Teilen basierend auf drei unterschiedlichen Projekten, die im Rahmen der Forschung zu dieser Arbeit entstanden sind. Im ersten Teil untersuchen wir die Bedeutung von Snapshots in relationalen Datenbanksystemen. Wir identifizieren die Bottlenecks gegenwärtiger Snapshottingalgorithmen, stellen zwei leichtgewichtige Snapshottingverfahren vor und optimieren Multi- Version Concurrency Control f¨ur das effiziente Ausführen hybrider Workloads. Unser Snapshottingalgorithmus ist bis zu 100 mal schneller und verringert die Latenz analytischer Anfragen um bis zu Faktor vier gegenüber dem Stand der Technik. Im zweiten Teil identifizieren wir striktes Snapshotting als Bottleneck von Fabric. In Folge dessen ersetzen wir es durch MVCC und schlagen weitere Optimierungen vor, mit denen der Durchsatz des Permissioned Blockchain Systems unter hoher Arbeitslast um Faktor zwölf verbessert werden kann. Im letzten Teil stellen wir ChainifyDB vor, eine Platform die eine existierende Datenbankinfrastruktur in eine Blockchaininfrastruktur überführt. ChainifyDB erreicht dabei einen bis zu sechs mal höheren Durchsatz im Vergleich zu anderen aktuellen Systemen, die auf Permissioned Blockchains basieren. Das externe Concurrency Protokoll übertrifft dabei sogar die internen Varianten von PostgreSQL und MySQL und erreicht einen bis zu 2,6 mal höhren Durchsatz im Blockchain Setup als in einem eigenständigen isolierten Setup. Zusätzlich verwenden wir Snapshots in ChainifyDB zur Unterstützung von Recovery, was bisher im Rahmen von Permissioned Blockchains nicht möglich war
Análise colaborativa de grandes conjuntos de séries temporais
The recent expansion of metrification on a daily basis has led to the production
of massive quantities of data, and in many cases, these collected metrics
are only useful for knowledge building when seen as a full sequence of
data ordered by time, which constitutes a time series. To find and interpret
meaningful behavioral patterns in time series, a multitude of analysis software
tools have been developed. Many of the existing solutions use annotations
to enable the curation of a knowledge base that is shared between a group
of researchers over a network. However, these tools also lack appropriate
mechanisms to handle a high number of concurrent requests and to properly
store massive data sets and ontologies, as well as suitable representations
for annotated data that are visually interpretable by humans and explorable by
automated systems. The goal of the work presented in this dissertation is to
iterate on existing time series analysis software and build a platform for the
collaborative analysis of massive time series data sets, leveraging state-of-the-art technologies for querying, storing and displaying time series and annotations.
A theoretical and domain-agnostic model was proposed to enable
the implementation of a distributed, extensible, secure and high-performant
architecture that handles various annotation proposals in simultaneous and
avoids any data loss from overlapping contributions or unsanctioned changes.
Analysts can share annotation projects with peers, restricting a set of collaborators
to a smaller scope of analysis and to a limited catalog of annotation
semantics. Annotations can express meaning not only over a segment of time,
but also over a subset of the series that coexist in the same segment. A novel
visual encoding for annotations is proposed, where annotations are rendered
as arcs traced only over the affected series’ curves in order to reduce visual
clutter. Moreover, the implementation of a full-stack prototype with a reactive
web interface was described, directly following the proposed architectural and
visualization model while applied to the HVAC domain. The performance of
the prototype under different architectural approaches was benchmarked, and
the interface was tested in its usability. Overall, the work described in this dissertation
contributes with a more versatile, intuitive and scalable time series
annotation platform that streamlines the knowledge-discovery workflow.A recente expansão de metrificação diária levou à produção de quantidades
massivas de dados, e em muitos casos, estas métricas são úteis para
a construção de conhecimento apenas quando vistas como uma sequência
de dados ordenada por tempo, o que constitui uma série temporal. Para se
encontrar padrões comportamentais significativos em séries temporais, uma
grande variedade de software de análise foi desenvolvida. Muitas das soluções
existentes utilizam anotações para permitir a curadoria de uma base
de conhecimento que é compartilhada entre investigadores em rede. No entanto,
estas ferramentas carecem de mecanismos apropriados para lidar com
um elevado número de pedidos concorrentes e para armazenar conjuntos
massivos de dados e ontologias, assim como também representações apropriadas
para dados anotados que são visualmente interpretáveis por seres
humanos e exploráveis por sistemas automatizados. O objetivo do trabalho
apresentado nesta dissertação é iterar sobre o software de análise de séries
temporais existente e construir uma plataforma para a análise colaborativa
de grandes conjuntos de séries temporais, utilizando tecnologias estado-de-arte
para pesquisar, armazenar e exibir séries temporais e anotações. Um
modelo teórico e agnóstico quanto ao domínio foi proposto para permitir a
implementação de uma arquitetura distribuída, extensível, segura e de alto
desempenho que lida com várias propostas de anotação em simultâneo e
evita quaisquer perdas de dados provenientes de contribuições sobrepostas
ou alterações não-sancionadas. Os analistas podem compartilhar projetos
de anotação com colegas, restringindo um conjunto de colaboradores a uma
janela de análise mais pequena e a um catálogo limitado de semântica de
anotação. As anotações podem exprimir significado não apenas sobre um
intervalo de tempo, mas também sobre um subconjunto das séries que coexistem
no mesmo intervalo. Uma nova codificação visual para anotações é
proposta, onde as anotações são desenhadas como arcos traçados apenas
sobre as curvas de séries afetadas de modo a reduzir o ruído visual. Para
além disso, a implementação de um protótipo full-stack com uma interface
reativa web foi descrita, seguindo diretamente o modelo de arquitetura e visualização
proposto enquanto aplicado ao domínio AVAC. O desempenho do
protótipo com diferentes decisões arquiteturais foi avaliado, e a interface foi
testada quanto à sua usabilidade. Em geral, o trabalho descrito nesta dissertação
contribui com uma abordagem mais versátil, intuitiva e escalável para
uma plataforma de anotação sobre séries temporais que simplifica o fluxo de
trabalho para a descoberta de conhecimento.Mestrado em Engenharia Informátic
Message passing in an object-oriented database management system
Ankara : The Department of Computer Engineering and Information Sciences and the Institute of Engineering and Sciences of Bilkent Univ. , 1988.Thesis (Master's) -- Bilkent University , 1988.Includes bibliographical references leaves 145-149.In this thesis, a focused survey on object-oriented database management systems and on object-orientation in general was carried out and a single- user object-oriented database management system prototype was designed and implemented. A command language was defined and a message passing scheme was proposed and implemented. A compiler for the language was developed.
The developed language is computationally complete and aims at solving the impedance mismatch problem. It contains both data definition and data manipulation statements. The statements can be used interactively or in the form of methods. After compilation, the statements are translated into integer codes and these codes are used to perform the necessary operations.Since the developed prototype is a single-user system, the message passing passing scheme does not provide any concurrency control mechanisms and stacks are used to implement message passing and argument handling.Özelçi, Sibel MM.S
- …