196 research outputs found

    Provenance in Collaborative Data Sharing

    Get PDF
    This dissertation focuses on recording, maintaining and exploiting provenance information in Collaborative Data Sharing Systems (CDSS). These are systems that support data sharing across loosely-coupled, heterogeneous collections of relational databases related by declarative schema mappings. A fundamental challenge in a CDSS is to support the capability of update exchange --- which publishes a participant\u27s updates and then translates others\u27 updates to the participant\u27s local schema and imports them --- while tolerating disagreement between them and recording the provenance of exchanged data, i.e., information about the sources and mappings involved in their propagation. This provenance information can be useful during update exchange, e.g., to evaluate provenance-based trust policies. It can also be exploited after update exchange, to answer a variety of user queries, about the quality, uncertainty or authority of the data, for applications such as trust assessment, ranking for keyword search over databases, or query answering in probabilistic databases. To address these challenges, in this dissertation we develop a novel model of provenance graphs that is informative enough to satisfy the needs of CDSS users and captures the semantics of query answering on various forms of annotated relations. We extend techniques from data integration, data exchange, incremental view maintenance and view update to define the formal semantics of unidirectional and bidirectional update exchange. We develop algorithms to perform update exchange incrementally while maintaining provenance information. We present strategies for implementing our techniques over an RDBMS and experimentally demonstrate their viability in the Orchestra prototype system. We define ProQL, a query language for provenance graphs that can be used by CDSS users to combine data querying with provenance testing as well as to compute annotations for their data, based on their provenance, that are useful for a variety of applications. Finally, we develop a prototype implementation ProQL over an RDBMS and indexing techniques to speed up provenance querying, evaluate experimentally the performance of provenance querying and the benefits of our indexing techniques

    Self-adaptation via concurrent multi-action evaluation for unknown context

    Get PDF
    Context-aware computing has been attracting growing attention in recent years. Generally, there are several ways for a context-aware system to select a course of action for a particular change of context. One way is for the system developers to encompass all possible context changes in the domain knowledge. Other methods include system inferences and adaptive learning whereby the system executes one action and evaluates the outcome and self-adapts/self-learns based on that. However, in situations where a system encounters unknown contexts, the iterative approach would become unfeasible when the size of the action space increases. Providing efficient solutions to this problem has been the main goal of this research project. Based on the developed abstract model, the designed methodology replaces the single action implementation and evaluation by multiple actions implemented and evaluated concurrently. This parallel evaluation of actions speeds up significantly the evolution time taken to select the best action suited to unknown context compared to the iterative approach. The designed and implemented framework efficiently carries out concurrent multi-action evaluation when an unknown context is encountered and finds the best course of action. Two concrete implementations of the framework were carried out demonstrating the usability and adaptability of the framework across multiple domains. The first implementation was in the domain of database performance tuning. The concrete implementation of the framework demonstrated the ability of concurrent multi-action evaluation technique to performance tune a database when performance is regressed for an unknown reason. The second implementation demonstrated the ability of the framework to correctly determine the threshold price to be used in a name-your-own-price channel when an unknown context is encountered. In conclusion the research introduced a new paradigm of a self-adaptation technique for context-aware application. Among the existing body of work, the concurrent multi-action evaluation is classified under the abstract concept of experiment-based self-adaptation techniques

    Análise colaborativa de grandes conjuntos de séries temporais

    Get PDF
    The recent expansion of metrification on a daily basis has led to the production of massive quantities of data, and in many cases, these collected metrics are only useful for knowledge building when seen as a full sequence of data ordered by time, which constitutes a time series. To find and interpret meaningful behavioral patterns in time series, a multitude of analysis software tools have been developed. Many of the existing solutions use annotations to enable the curation of a knowledge base that is shared between a group of researchers over a network. However, these tools also lack appropriate mechanisms to handle a high number of concurrent requests and to properly store massive data sets and ontologies, as well as suitable representations for annotated data that are visually interpretable by humans and explorable by automated systems. The goal of the work presented in this dissertation is to iterate on existing time series analysis software and build a platform for the collaborative analysis of massive time series data sets, leveraging state-of-the-art technologies for querying, storing and displaying time series and annotations. A theoretical and domain-agnostic model was proposed to enable the implementation of a distributed, extensible, secure and high-performant architecture that handles various annotation proposals in simultaneous and avoids any data loss from overlapping contributions or unsanctioned changes. Analysts can share annotation projects with peers, restricting a set of collaborators to a smaller scope of analysis and to a limited catalog of annotation semantics. Annotations can express meaning not only over a segment of time, but also over a subset of the series that coexist in the same segment. A novel visual encoding for annotations is proposed, where annotations are rendered as arcs traced only over the affected series’ curves in order to reduce visual clutter. Moreover, the implementation of a full-stack prototype with a reactive web interface was described, directly following the proposed architectural and visualization model while applied to the HVAC domain. The performance of the prototype under different architectural approaches was benchmarked, and the interface was tested in its usability. Overall, the work described in this dissertation contributes with a more versatile, intuitive and scalable time series annotation platform that streamlines the knowledge-discovery workflow.A recente expansão de metrificação diária levou à produção de quantidades massivas de dados, e em muitos casos, estas métricas são úteis para a construção de conhecimento apenas quando vistas como uma sequência de dados ordenada por tempo, o que constitui uma série temporal. Para se encontrar padrões comportamentais significativos em séries temporais, uma grande variedade de software de análise foi desenvolvida. Muitas das soluções existentes utilizam anotações para permitir a curadoria de uma base de conhecimento que é compartilhada entre investigadores em rede. No entanto, estas ferramentas carecem de mecanismos apropriados para lidar com um elevado número de pedidos concorrentes e para armazenar conjuntos massivos de dados e ontologias, assim como também representações apropriadas para dados anotados que são visualmente interpretáveis por seres humanos e exploráveis por sistemas automatizados. O objetivo do trabalho apresentado nesta dissertação é iterar sobre o software de análise de séries temporais existente e construir uma plataforma para a análise colaborativa de grandes conjuntos de séries temporais, utilizando tecnologias estado-de-arte para pesquisar, armazenar e exibir séries temporais e anotações. Um modelo teórico e agnóstico quanto ao domínio foi proposto para permitir a implementação de uma arquitetura distribuída, extensível, segura e de alto desempenho que lida com várias propostas de anotação em simultâneo e evita quaisquer perdas de dados provenientes de contribuições sobrepostas ou alterações não-sancionadas. Os analistas podem compartilhar projetos de anotação com colegas, restringindo um conjunto de colaboradores a uma janela de análise mais pequena e a um catálogo limitado de semântica de anotação. As anotações podem exprimir significado não apenas sobre um intervalo de tempo, mas também sobre um subconjunto das séries que coexistem no mesmo intervalo. Uma nova codificação visual para anotações é proposta, onde as anotações são desenhadas como arcos traçados apenas sobre as curvas de séries afetadas de modo a reduzir o ruído visual. Para além disso, a implementação de um protótipo full-stack com uma interface reativa web foi descrita, seguindo diretamente o modelo de arquitetura e visualização proposto enquanto aplicado ao domínio AVAC. O desempenho do protótipo com diferentes decisões arquiteturais foi avaliado, e a interface foi testada quanto à sua usabilidade. Em geral, o trabalho descrito nesta dissertação contribui com uma abordagem mais versátil, intuitiva e escalável para uma plataforma de anotação sobre séries temporais que simplifica o fluxo de trabalho para a descoberta de conhecimento.Mestrado em Engenharia Informátic

    Efficient Maximum A-Posteriori Inference in Markov Logic and Application in Description Logics

    Full text link
    Maximum a-posteriori (MAP) query in statistical relational models computes the most probable world given evidence and further knowledge about the domain. It is arguably one of the most important types of computational problems, since it is also used as a subroutine in weight learning algorithms. In this thesis, we discuss an improved inference algorithm and an application for MAP queries. We focus on Markov logic (ML) as statistical relational formalism. Markov logic combines Markov networks with first-order logic by attaching weights to first-order formulas. For inference, we improve existing work which translates MAP queries to integer linear programs (ILP). The motivation is that existing ILP solvers are very stable and fast and are able to precisely estimate the quality of an intermediate solution. In our work, we focus on improving the translation process such that we result in ILPs having fewer variables and fewer constraints. Our main contribution is the Cutting Plane Aggregation (CPA) approach which leverages symmetries in ML networks and parallelizes MAP inference. Additionally, we integrate the cutting plane inference (Riedel 2008) algorithm which significantly reduces the number of groundings by solving multiple smaller ILPs instead of one large ILP. We present the new Markov logic engine RockIt which outperforms state-of-the-art engines in standard Markov logic benchmarks. Afterwards, we apply the MAP query to description logics. Description logics (DL) are knowledge representation formalisms whose expressivity is higher than propositional logic but lower than first-order logic. The most popular DLs have been standardized in the ontology language OWL and are an elementary component in the Semantic Web. We combine Markov logic, which essentially follows the semantic of a log-linear model, with description logics to log-linear description logics. In log-linear description logic weights can be attached to any description logic axiom. Furthermore, we introduce a new query type which computes the most-probable 'coherent' world. Possible applications of log-linear description logics are mainly located in the area of ontology learning and data integration. With our novel log-linear description logic reasoner ELog, we experimentally show that more expressivity increases quality and that the solutions of optimal solving strategies have higher quality than the solutions of approximate solving strategies

    Designing a labeling application for image object detection

    Get PDF
    We seek to build a large collection of images with ground truth labels to be used for training object detection and recognition algorithms. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a user interface tool that allows easy image annotation. The tool provides functionalities such as drawing boxes, querying images, and browsing the database. Using this annotation tool, we can collect a large dataset that spans many object categories, often containing multiple instances over a wide variety of images. We quantify the contents of an existing dataset and compare against other state of the art datasets used for object recognition and detection. Also, we show how to extend our dataset to automatically enhance object labels with WordNet, discover object parts, and increase the number of labels using minimal user supervisionope

    Social Learning Systems: The Design of Evolutionary, Highly Scalable, Socially Curated Knowledge Systems

    Get PDF
    In recent times, great strides have been made towards the advancement of automated reasoning and knowledge management applications, along with their associated methodologies. The introduction of the World Wide Web peaked academicians’ interest in harnessing the power of linked, online documents for the purpose of developing machine learning corpora, providing dynamical knowledge bases for question answering systems, fueling automated entity extraction applications, and performing graph analytic evaluations, such as uncovering the inherent structural semantics of linked pages. Even more recently, substantial attention in the wider computer science and information systems disciplines has been focused on the evolving study of social computing phenomena, primarily those associated with the use, development, and analysis of online social networks (OSN\u27s). This work followed an independent effort to develop an evolutionary knowledge management system, and outlines a model for integrating the wisdom of the crowd into the process of collecting, analyzing, and curating data for dynamical knowledge systems. Throughout, we examine how relational data modeling, automated reasoning, crowdsourcing, and social curation techniques have been exploited to extend the utility of web-based, transactional knowledge management systems, creating a new breed of knowledge-based system in the process: the Social Learning System (SLS). The key questions this work has explored by way of elucidating the SLS model include considerations for 1) how it is possible to unify Web and OSN mining techniques to conform to a versatile, structured, and computationally-efficient ontological framework, and 2) how large-scale knowledge projects may incorporate tiered collaborative editing systems in an effort to elicit knowledge contributions and curation activities from a diverse, participatory audience

    Handling imperfect information in criterion evaluation, aggregation and indexing

    Get PDF

    Enhanced Living Environments

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1303 “Algorithms, Architectures and Platforms for Enhanced Living Environments (AAPELE)”. The concept of Enhanced Living Environments (ELE) refers to the area of Ambient Assisted Living (AAL) that is more related with Information and Communication Technologies (ICT). Effective ELE solutions require appropriate ICT algorithms, architectures, platforms, and systems, having in view the advance of science and technology in this area and the development of new and innovative solutions that can provide improvements in the quality of life for people in their homes and can reduce the financial burden on the budgets of the healthcare providers. The aim of this book is to become a state-of-the-art reference, discussing progress made, as well as prompting future directions on theories, practices, standards, and strategies related to the ELE area. The book contains 12 chapters and can serve as a valuable reference for undergraduate students, post-graduate students, educators, faculty members, researchers, engineers, medical doctors, healthcare organizations, insurance companies, and research strategists working in this area

    Design and development of a comprehensive data management platform for cytomics: cytomicsDB

    Get PDF
    In Cytomics environment, scientist has to continuosly deal with a large volume of structured and unstructured data, this condition in particular, makes a challenge the interoperability for any platform developed for cytomics. CytomicsDB approach is an effort for developing a framework which takes care of the standardization of the unstructured data, providing a common data model layer for HTS experiments. This model as well is suitable for the integration with other systems in Cytomics, in special other repositories, which allow the validation of key metadata used in the experiments, thus ensure reliability of the data stored. Other possible solutions for cytomics data management, should take special care in the use of data model standards for enhancing the collaboration and data sharing in the scientific community.BAPE Erasmus Mundus ProgramComputer Systems, Imagery and Medi

    Acesso remoto dinâmico e seguro a bases de dados com integração de políticas de acesso suave

    Get PDF
    The amount of data being created and shared has grown greatly in recent years, thanks in part to social media and the growth of smart devices. Managing the storage and processing of this data can give a competitive edge when used to create new services, to enhance targeted advertising, etc. To achieve this, the data must be accessed and processed. When applications that access this data are developed, tools such as Java Database Connectivity, ADO.NET and Hibernate are typically used. However, while these tools aim to bridge the gap between databases and the object-oriented programming paradigm, they focus only on the connectivity issue. This leads to increased development time as developers need to master the access policies to write correct queries. Moreover, when used in database applications within noncontrolled environments, other issues emerge such as database credentials theft; application authentication; authorization and auditing of large groups of new users seeking access to data, potentially with vague requirements; network eavesdropping for data and credential disclosure; impersonating database servers for data modification; application tampering for unrestricted database access and data disclosure; etc. Therefore, an architecture capable of addressing these issues is necessary to build a reliable set of access control solutions to expand and simplify the application scenarios of access control systems. The objective, then, is to secure the remote access to databases, since database applications may be used in hard-to-control environments and physical access to the host machines/network may not be always protected. Furthermore, the authorization process should dynamically grant the appropriate permissions to users that have not been explicitly authorized to handle large groups seeking access to data. This includes scenarios where the definition of the access requirements is difficult due to their vagueness, usually requiring a security expert to authorize each user individually. This is achieved by integrating and auditing soft access policies based on fuzzy set theory in the access control decision-making process. A proof-of-concept of this architecture is provided alongside a functional and performance assessment.A quantidade de dados criados e partilhados tem crescido nos últimos anos, em parte graças às redes sociais e à proliferação dos dispositivos inteligentes. A gestão do armazenamento e processamento destes dados pode fornecer uma vantagem competitiva quando usados para criar novos serviços, para melhorar a publicidade direcionada, etc. Para atingir este objetivo, os dados devem ser acedidos e processados. Quando as aplicações que acedem a estes dados são desenvolvidos, ferramentas como Java Database Connectivity, ADO.NET e Hibernate são normalmente utilizados. No entanto, embora estas ferramentas tenham como objetivo preencher a lacuna entre as bases de dados e o paradigma da programação orientada por objetos, elas concentram-se apenas na questão da conectividade. Isto aumenta o tempo de desenvolvimento, pois os programadores precisam dominar as políticas de acesso para escrever consultas corretas. Além disso, quando usado em aplicações de bases de dados em ambientes não controlados, surgem outros problemas, como roubo de credenciais da base de dados; autenticação de aplicações; autorização e auditoria de grandes grupos de novos utilizadores que procuram acesso aos dados, potencialmente com requisitos vagos; escuta da rede para obtenção de dados e credenciais; personificação de servidores de bases de dados para modificação de dados; manipulação de aplicações para acesso ilimitado à base de dados e divulgação de dados; etc. Uma arquitetura capaz de resolver esses problemas é necessária para construir um conjunto confiável de soluções de controlo de acesso, para expandir e simplificar os cenários de aplicação destes sistemas. O objetivo, então, é proteger o acesso remoto a bases de dados, uma vez que as aplicações de bases de dados podem ser usados em ambientes de difícil controlo e o acesso físico às máquinas/rede nem sempre está protegido. Adicionalmente, o processo de autorização deve conceder dinamicamente as permissões adequadas aos utilizadores que não foram explicitamente autorizados para suportar grupos grandes de utilizadores que procuram aceder aos dados. Isto inclui cenários em que a definição dos requisitos de acesso é difícil devido à sua imprecisão, geralmente exigindo um especialista em segurança para autorizar cada utilizador individualmente. Este objetivo é atingido no processo de decisão de controlo de acesso com a integração e auditaria das políticas de acesso suaves baseadas na teoria de conjuntos difusos. Uma prova de conceito desta arquitetura é fornecida em conjunto com uma avaliação funcional e de desempenho.Programa Doutoral em Informátic
    corecore