196 research outputs found
Provenance in Collaborative Data Sharing
This dissertation focuses on recording, maintaining and exploiting provenance information in Collaborative Data Sharing Systems (CDSS). These are systems that support data sharing across loosely-coupled, heterogeneous collections of relational databases related by declarative schema mappings. A fundamental challenge in a CDSS is to support the capability of update exchange --- which publishes a participant\u27s updates and then translates others\u27 updates to the participant\u27s local schema and imports them --- while tolerating disagreement between them and recording the provenance of exchanged data, i.e., information about the sources and mappings involved in their propagation. This provenance information can be useful during update exchange, e.g., to evaluate provenance-based trust policies. It can also be exploited after update exchange, to answer a variety of user queries, about the quality, uncertainty or authority of the data, for applications such as trust assessment, ranking for keyword search over databases, or query answering in probabilistic databases.
To address these challenges, in this dissertation we develop a novel model of provenance graphs that is informative enough to satisfy the needs of CDSS users and captures the semantics of query answering on various forms of annotated relations. We extend techniques from data integration, data exchange, incremental view maintenance and view update to define the formal semantics of unidirectional and bidirectional update exchange. We develop algorithms to perform update exchange incrementally while maintaining provenance information. We present strategies for implementing our techniques over an RDBMS and experimentally demonstrate their viability in the Orchestra prototype system. We define ProQL, a query language for provenance graphs that can be used by CDSS users to combine data querying with provenance testing as well as to compute annotations for their data, based on their provenance, that are useful for a variety of applications. Finally, we develop a prototype implementation ProQL over an RDBMS and indexing techniques to speed up provenance querying, evaluate experimentally the performance of provenance querying and the benefits of our indexing techniques
Self-adaptation via concurrent multi-action evaluation for unknown context
Context-aware computing has been attracting growing attention in recent years. Generally, there are several ways for a context-aware system to select a course of action for a particular change of context. One way is for the system developers to encompass all possible context changes in the domain knowledge. Other methods include system inferences and adaptive learning whereby the system executes one action and evaluates the outcome and self-adapts/self-learns based on that. However, in situations where a system encounters unknown contexts, the iterative approach would become unfeasible when the size of the action space increases. Providing efficient solutions to this problem has been the main goal of this research project.
Based on the developed abstract model, the designed methodology replaces the single action implementation and evaluation by multiple actions implemented and evaluated concurrently. This parallel evaluation of actions speeds up significantly the evolution time taken to select the best action suited to unknown context compared to the iterative approach.
The designed and implemented framework efficiently carries out concurrent multi-action evaluation when an unknown context is encountered and finds the best course of action. Two concrete implementations of the framework were carried out demonstrating the usability and adaptability of the framework across multiple domains.
The first implementation was in the domain of database performance tuning. The concrete implementation of the framework demonstrated the ability of concurrent multi-action evaluation technique to performance tune a database when performance is regressed for an unknown reason.
The second implementation demonstrated the ability of the framework to correctly determine the threshold price to be used in a name-your-own-price channel when an unknown context is encountered.
In conclusion the research introduced a new paradigm of a self-adaptation technique for context-aware application. Among the existing body of work, the concurrent multi-action evaluation is classified under the abstract concept of experiment-based self-adaptation techniques
Análise colaborativa de grandes conjuntos de séries temporais
The recent expansion of metrification on a daily basis has led to the production
of massive quantities of data, and in many cases, these collected metrics
are only useful for knowledge building when seen as a full sequence of
data ordered by time, which constitutes a time series. To find and interpret
meaningful behavioral patterns in time series, a multitude of analysis software
tools have been developed. Many of the existing solutions use annotations
to enable the curation of a knowledge base that is shared between a group
of researchers over a network. However, these tools also lack appropriate
mechanisms to handle a high number of concurrent requests and to properly
store massive data sets and ontologies, as well as suitable representations
for annotated data that are visually interpretable by humans and explorable by
automated systems. The goal of the work presented in this dissertation is to
iterate on existing time series analysis software and build a platform for the
collaborative analysis of massive time series data sets, leveraging state-of-the-art technologies for querying, storing and displaying time series and annotations.
A theoretical and domain-agnostic model was proposed to enable
the implementation of a distributed, extensible, secure and high-performant
architecture that handles various annotation proposals in simultaneous and
avoids any data loss from overlapping contributions or unsanctioned changes.
Analysts can share annotation projects with peers, restricting a set of collaborators
to a smaller scope of analysis and to a limited catalog of annotation
semantics. Annotations can express meaning not only over a segment of time,
but also over a subset of the series that coexist in the same segment. A novel
visual encoding for annotations is proposed, where annotations are rendered
as arcs traced only over the affected series’ curves in order to reduce visual
clutter. Moreover, the implementation of a full-stack prototype with a reactive
web interface was described, directly following the proposed architectural and
visualization model while applied to the HVAC domain. The performance of
the prototype under different architectural approaches was benchmarked, and
the interface was tested in its usability. Overall, the work described in this dissertation
contributes with a more versatile, intuitive and scalable time series
annotation platform that streamlines the knowledge-discovery workflow.A recente expansão de metrificação diária levou à produção de quantidades
massivas de dados, e em muitos casos, estas métricas são úteis para
a construção de conhecimento apenas quando vistas como uma sequência
de dados ordenada por tempo, o que constitui uma série temporal. Para se
encontrar padrões comportamentais significativos em séries temporais, uma
grande variedade de software de análise foi desenvolvida. Muitas das soluções
existentes utilizam anotações para permitir a curadoria de uma base
de conhecimento que é compartilhada entre investigadores em rede. No entanto,
estas ferramentas carecem de mecanismos apropriados para lidar com
um elevado número de pedidos concorrentes e para armazenar conjuntos
massivos de dados e ontologias, assim como também representações apropriadas
para dados anotados que são visualmente interpretáveis por seres
humanos e exploráveis por sistemas automatizados. O objetivo do trabalho
apresentado nesta dissertação é iterar sobre o software de análise de séries
temporais existente e construir uma plataforma para a análise colaborativa
de grandes conjuntos de séries temporais, utilizando tecnologias estado-de-arte
para pesquisar, armazenar e exibir séries temporais e anotações. Um
modelo teórico e agnóstico quanto ao domínio foi proposto para permitir a
implementação de uma arquitetura distribuída, extensível, segura e de alto
desempenho que lida com várias propostas de anotação em simultâneo e
evita quaisquer perdas de dados provenientes de contribuições sobrepostas
ou alterações não-sancionadas. Os analistas podem compartilhar projetos
de anotação com colegas, restringindo um conjunto de colaboradores a uma
janela de análise mais pequena e a um catálogo limitado de semântica de
anotação. As anotações podem exprimir significado não apenas sobre um
intervalo de tempo, mas também sobre um subconjunto das séries que coexistem
no mesmo intervalo. Uma nova codificação visual para anotações é
proposta, onde as anotações são desenhadas como arcos traçados apenas
sobre as curvas de séries afetadas de modo a reduzir o ruído visual. Para
além disso, a implementação de um protótipo full-stack com uma interface
reativa web foi descrita, seguindo diretamente o modelo de arquitetura e visualização
proposto enquanto aplicado ao domínio AVAC. O desempenho do
protótipo com diferentes decisões arquiteturais foi avaliado, e a interface foi
testada quanto à sua usabilidade. Em geral, o trabalho descrito nesta dissertação
contribui com uma abordagem mais versátil, intuitiva e escalável para
uma plataforma de anotação sobre séries temporais que simplifica o fluxo de
trabalho para a descoberta de conhecimento.Mestrado em Engenharia Informátic
Efficient Maximum A-Posteriori Inference in Markov Logic and Application in Description Logics
Maximum a-posteriori (MAP) query in statistical relational models computes the most probable world given evidence and further knowledge about the domain. It is arguably one of the most important types of computational problems, since it is also used as a subroutine in weight learning algorithms. In this thesis, we discuss an improved inference algorithm and an application for MAP queries. We focus on Markov logic (ML) as statistical relational formalism. Markov logic combines Markov networks with first-order logic by attaching weights to first-order formulas.
For inference, we improve existing work which translates MAP queries to integer linear programs (ILP). The motivation is that existing ILP solvers are very stable and fast and are able to precisely estimate the quality of an intermediate solution. In our work, we focus on improving the translation process such that we result in ILPs having fewer variables and fewer constraints. Our main contribution is the Cutting Plane Aggregation (CPA) approach which leverages symmetries in ML networks and parallelizes MAP inference. Additionally, we integrate the cutting plane inference (Riedel 2008) algorithm which significantly reduces the number of groundings by solving multiple smaller ILPs instead of one large ILP. We present the new Markov logic engine RockIt which outperforms state-of-the-art engines in standard Markov logic benchmarks.
Afterwards, we apply the MAP query to description logics. Description logics (DL) are knowledge representation formalisms whose expressivity is higher than propositional logic but lower than first-order logic. The most popular DLs have been standardized in the ontology language OWL and are an elementary component in the Semantic Web. We combine Markov logic, which essentially follows the semantic of a log-linear model, with description logics to log-linear description logics. In log-linear description logic weights can be attached to any description logic axiom. Furthermore, we introduce a new query type which computes the most-probable 'coherent' world. Possible applications of log-linear description logics are mainly located in the area of ontology learning and data integration. With our novel log-linear description logic reasoner ELog, we experimentally show that more expressivity increases quality and that the solutions of optimal solving strategies have higher quality than the solutions of approximate solving strategies
Designing a labeling application for image object detection
We seek to build a large collection of images with ground truth labels to be used for training object detection and recognition algorithms. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a user interface tool that allows easy image annotation.
The tool provides functionalities such as drawing boxes, querying images, and browsing the database.
Using this annotation tool, we can collect a large dataset that spans many object categories, often containing multiple instances over a wide variety of images.
We quantify the contents of an existing dataset and compare against other state of the art datasets used for object recognition and detection.
Also, we show how to extend our dataset to automatically enhance object labels with WordNet, discover object parts, and increase the number of labels using minimal user supervisionope
Social Learning Systems: The Design of Evolutionary, Highly Scalable, Socially Curated Knowledge Systems
In recent times, great strides have been made towards the advancement of automated reasoning and knowledge management applications, along with their associated methodologies. The introduction of the World Wide Web peaked academicians’ interest in harnessing the power of linked, online documents for the purpose of developing machine learning corpora, providing dynamical knowledge bases for question answering systems, fueling automated entity extraction applications, and performing graph analytic evaluations, such as uncovering the inherent structural semantics of linked pages. Even more recently, substantial attention in the wider computer science and information systems disciplines has been focused on the evolving study of social computing phenomena, primarily those associated with the use, development, and analysis of online social networks (OSN\u27s).
This work followed an independent effort to develop an evolutionary knowledge management system, and outlines a model for integrating the wisdom of the crowd into the process of collecting, analyzing, and curating data for dynamical knowledge systems. Throughout, we examine how relational data modeling, automated reasoning, crowdsourcing, and social curation techniques have been exploited to extend the utility of web-based, transactional knowledge management systems, creating a new breed of knowledge-based system in the process: the Social Learning System (SLS).
The key questions this work has explored by way of elucidating the SLS model include considerations for 1) how it is possible to unify Web and OSN mining techniques to conform to a versatile, structured, and computationally-efficient ontological framework, and 2) how large-scale knowledge projects may incorporate tiered collaborative editing systems in an effort to elicit knowledge contributions and curation activities from a diverse, participatory audience
Enhanced Living Environments
This open access book was prepared as a Final Publication of the COST Action IC1303 “Algorithms, Architectures and Platforms for Enhanced Living Environments (AAPELE)”. The concept of Enhanced Living Environments (ELE) refers to the area of Ambient Assisted Living (AAL) that is more related with Information and Communication Technologies (ICT). Effective ELE solutions require appropriate ICT algorithms, architectures, platforms, and systems, having in view the advance of science and technology in this area and the development of new and innovative solutions that can provide improvements in the quality of life for people in their homes and can reduce the financial burden on the budgets of the healthcare providers. The aim of this book is to become a state-of-the-art reference, discussing progress made, as well as prompting future directions on theories, practices, standards, and strategies related to the ELE area. The book contains 12 chapters and can serve as a valuable reference for undergraduate students, post-graduate students, educators, faculty members, researchers, engineers, medical doctors, healthcare organizations, insurance companies, and research strategists working in this area
Design and development of a comprehensive data management platform for cytomics: cytomicsDB
In Cytomics environment, scientist has to continuosly deal with a large volume of structured and unstructured data, this condition in particular, makes a challenge the interoperability for any platform developed for cytomics. CytomicsDB approach is an effort for developing a framework which takes care of the standardization of the unstructured data, providing a common data model layer for HTS experiments. This model as well is suitable for the integration with other systems in Cytomics, in special other repositories, which allow the validation of key metadata used in the experiments, thus ensure reliability of the data stored. Other possible solutions for cytomics data management, should take special care in the use of data model standards for enhancing the collaboration and data sharing in the scientific community.BAPE Erasmus Mundus ProgramComputer Systems, Imagery and Medi
Acesso remoto dinâmico e seguro a bases de dados com integração de políticas de acesso suave
The amount of data being created and shared has grown greatly in recent
years, thanks in part to social media and the growth of smart devices.
Managing the storage and processing of this data can give a competitive edge
when used to create new services, to enhance targeted advertising, etc. To
achieve this, the data must be accessed and processed. When applications
that access this data are developed, tools such as Java Database Connectivity,
ADO.NET and Hibernate are typically used. However, while these tools aim to
bridge the gap between databases and the object-oriented programming
paradigm, they focus only on the connectivity issue. This leads to increased
development time as developers need to master the access policies to write
correct queries. Moreover, when used in database applications within noncontrolled
environments, other issues emerge such as database credentials
theft; application authentication; authorization and auditing of large groups of
new users seeking access to data, potentially with vague requirements;
network eavesdropping for data and credential disclosure; impersonating
database servers for data modification; application tampering for unrestricted
database access and data disclosure; etc.
Therefore, an architecture capable of addressing these issues is necessary to
build a reliable set of access control solutions to expand and simplify the
application scenarios of access control systems. The objective, then, is to
secure the remote access to databases, since database applications may be
used in hard-to-control environments and physical access to the host
machines/network may not be always protected. Furthermore, the authorization
process should dynamically grant the appropriate permissions to users that
have not been explicitly authorized to handle large groups seeking access to
data. This includes scenarios where the definition of the access requirements is
difficult due to their vagueness, usually requiring a security expert to authorize
each user individually. This is achieved by integrating and auditing soft access
policies based on fuzzy set theory in the access control decision-making
process. A proof-of-concept of this architecture is provided alongside a
functional and performance assessment.A quantidade de dados criados e partilhados tem crescido nos últimos anos,
em parte graças às redes sociais e à proliferação dos dispositivos inteligentes.
A gestão do armazenamento e processamento destes dados pode fornecer
uma vantagem competitiva quando usados para criar novos serviços, para
melhorar a publicidade direcionada, etc. Para atingir este objetivo, os dados
devem ser acedidos e processados. Quando as aplicações que acedem a
estes dados são desenvolvidos, ferramentas como Java Database
Connectivity, ADO.NET e Hibernate são normalmente utilizados. No entanto,
embora estas ferramentas tenham como objetivo preencher a lacuna entre as
bases de dados e o paradigma da programação orientada por objetos, elas
concentram-se apenas na questão da conectividade. Isto aumenta o tempo de
desenvolvimento, pois os programadores precisam dominar as políticas de
acesso para escrever consultas corretas. Além disso, quando usado em
aplicações de bases de dados em ambientes não controlados, surgem outros
problemas, como roubo de credenciais da base de dados; autenticação de
aplicações; autorização e auditoria de grandes grupos de novos utilizadores
que procuram acesso aos dados, potencialmente com requisitos vagos; escuta
da rede para obtenção de dados e credenciais; personificação de servidores
de bases de dados para modificação de dados; manipulação de aplicações
para acesso ilimitado à base de dados e divulgação de dados; etc.
Uma arquitetura capaz de resolver esses problemas é necessária para
construir um conjunto confiável de soluções de controlo de acesso, para
expandir e simplificar os cenários de aplicação destes sistemas. O objetivo,
então, é proteger o acesso remoto a bases de dados, uma vez que as
aplicações de bases de dados podem ser usados em ambientes de difícil
controlo e o acesso físico às máquinas/rede nem sempre está protegido.
Adicionalmente, o processo de autorização deve conceder dinamicamente as
permissões adequadas aos utilizadores que não foram explicitamente
autorizados para suportar grupos grandes de utilizadores que procuram aceder
aos dados. Isto inclui cenários em que a definição dos requisitos de acesso é
difícil devido à sua imprecisão, geralmente exigindo um especialista em
segurança para autorizar cada utilizador individualmente. Este objetivo é
atingido no processo de decisão de controlo de acesso com a integração e
auditaria das políticas de acesso suaves baseadas na teoria de conjuntos
difusos. Uma prova de conceito desta arquitetura é fornecida em conjunto com
uma avaliação funcional e de desempenho.Programa Doutoral em Informátic
- …