208 research outputs found
Semantic Data Management in Data Lakes
In recent years, data lakes emerged as away to manage large amounts of
heterogeneous data for modern data analytics. One way to prevent data lakes
from turning into inoperable data swamps is semantic data management. Some
approaches propose the linkage of metadata to knowledge graphs based on the
Linked Data principles to provide more meaning and semantics to the data in the
lake. Such a semantic layer may be utilized not only for data management but
also to tackle the problem of data integration from heterogeneous sources, in
order to make data access more expressive and interoperable. In this survey, we
review recent approaches with a specific focus on the application within data
lake systems and scalability to Big Data. We classify the approaches into (i)
basic semantic data management, (ii) semantic modeling approaches for enriching
metadata in data lakes, and (iii) methods for ontologybased data access. In
each category, we cover the main techniques and their background, and compare
latest research. Finally, we point out challenges for future work in this
research area, which needs a closer integration of Big Data and Semantic Web
technologies
Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference
No abstract available
An Interoperability Platform Enabling Reuse of Electronic Health Records for Signal Verification Studies
Depending mostly on voluntarily sent spontaneous reports, pharmacovigilance studies are hampered by low quantity and quality of patient data. Our objective is to improve postmarket safety studies by enabling safety analysts to seamlessly access a wide range of EHR sources for collecting deidentified medical data sets of selected patient populations and tracing the reported incidents back to original EHRs. We have developed an ontological framework where EHR sources and target clinical research systems can continue using their own local data models, interfaces, and terminology systems, while structural interoperability and Semantic Interoperability are handled through rule-based reasoning on formal representations of different models and terminology systems maintained in the SALUS Semantic Resource Set. SALUS Common Information Model at the core of this set acts as the common mediator. We demonstrate the capabilities of our framework through one of the SALUS safety analysis tools, namely, the Case Series Characterization Tool, which have been deployed on top of regional EHR Data Warehouse of the Lombardy Region containing about 1 billion records from 16 million patients and validated by several pharmacovigilance researchers with real-life cases. The results confirm significant improvements in signal detection and evaluation compared to traditional methods with the missing background information
Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference
No abstract available
Sharing Semantic Resources
The Semantic Web is an extension of the current Web in which information, so far created for human consumption, becomes machine readable, “enabling computers and people to work in cooperation”. To turn into reality this vision several challenges are still open among which the most important is to share meaning formally represented with ontologies or more generally with semantic resources. This Semantic Web long-term goal has many convergences with the activities in the field of Human Language Technology and in particular in the development of Natural Language Processing applications where there is a great need of multilingual lexical resources. For instance, one of the most important lexical resources, WordNet, is also commonly regarded and used as an ontology. Nowadays, another important phenomenon is represented by the explosion of social collaboration, and Wikipedia, the largest encyclopedia in the world, is object of research as an up to date omni comprehensive semantic resource. The main topic of this thesis is the management and exploitation of semantic resources in a collaborative way, trying to use the already available resources as Wikipedia and Wordnet. This work presents a general environment able to turn into reality the vision of shared and distributed semantic resources and describes a distributed three-layer architecture to enable a rapid prototyping of cooperative applications for developing semantic resources
SmartORC: smart orchestration of resources in the compute continuum
The promise of the compute continuum is to present applications with a flexible and transparent view of the resources in the Internet of Things–Edge–Cloud ecosystem. However, such a promise requires tackling complex challenges to maximize the benefits of both the cloud and the edge. Challenges include managing a highly distributed platform, matching services and resources, harnessing resource heterogeneity, and adapting the deployment of services to the changes in resources and applications. In this study, we present SmartORC, a comprehensive set of components designed to provide a complete framework for managing resources and applications in the Compute Continuum. Along with the description of all the SmartORC subcomponents, we have also provided the results of an evaluation aimed at showcasing the framework's capability
A Data-driven Methodology Towards Mobility- and Traffic-related Big Spatiotemporal Data Frameworks
Human population is increasing at unprecedented rates, particularly in urban areas. This increase, along with the rise of a more economically empowered middle class, brings new and complex challenges to the mobility of people within urban areas. To tackle such challenges, transportation and mobility authorities and operators are trying to adopt innovative Big Data-driven Mobility- and Traffic-related solutions. Such solutions will help decision-making processes that aim to ease the load on an already overloaded transport infrastructure. The information collected from day-to-day mobility and traffic can help to mitigate some of such mobility challenges in urban areas.
Road infrastructure and traffic management operators (RITMOs) face several limitations to effectively extract value from the exponentially growing volumes of mobility- and traffic-related Big Spatiotemporal Data (MobiTrafficBD) that are being acquired and gathered. Research about the topics of Big Data, Spatiotemporal Data and specially MobiTrafficBD is scattered, and existing literature does not offer a concrete, common methodological approach to setup, configure, deploy and use a complete Big Data-based framework to manage the lifecycle of mobility-related spatiotemporal data, mainly focused on geo-referenced time series (GRTS) and spatiotemporal events (ST Events), extract value from it and support decision-making
processes of RITMOs.
This doctoral thesis proposes a data-driven, prescriptive methodological approach towards the design, development and deployment of MobiTrafficBD Frameworks focused on GRTS and ST Events. Besides a thorough literature review on Spatiotemporal Data, Big Data and the merging of these two fields through MobiTraffiBD, the methodological approach comprises a set of general characteristics, technical requirements, logical components, data flows and technological infrastructure models, as well as guidelines and best practices that aim to guide researchers, practitioners and stakeholders, such as RITMOs, throughout the design, development and deployment phases of any MobiTrafficBD Framework.
This work is intended to be a supporting methodological guide, based on widely used
Reference Architectures and guidelines for Big Data, but enriched with inherent characteristics
and concerns brought about by Big Spatiotemporal Data, such as in the case of GRTS and ST
Events. The proposed methodology was evaluated and demonstrated in various real-world
use cases that deployed MobiTrafficBD-based Data Management, Processing, Analytics and
Visualisation methods, tools and technologies, under the umbrella of several research projects
funded by the European Commission and the Portuguese Government.A população humana cresce a um ritmo sem precedentes, particularmente nas áreas urbanas.
Este aumento, aliado ao robustecimento de uma classe média com maior poder económico,
introduzem novos e complexos desafios na mobilidade de pessoas em áreas urbanas. Para
abordar estes desafios, autoridades e operadores de transportes e mobilidade estão a adotar
soluções inovadoras no domínio dos sistemas de Dados em Larga Escala nos domínios da
Mobilidade e Tráfego. Estas soluções irão apoiar os processos de decisão com o intuito de libertar uma infraestrutura de estradas e transportes já sobrecarregada. A informação colecionada da mobilidade diária e da utilização da infraestrutura de estradas pode ajudar na mitigação de alguns dos desafios da mobilidade urbana.
Os operadores de gestão de trânsito e de infraestruturas de estradas (em inglês, road infrastructure and traffic management operators — RITMOs) estão limitados no que toca a extrair valor de um sempre crescente volume de Dados Espaciotemporais em Larga Escala no domínio da Mobilidade e Tráfego (em inglês, Mobility- and Traffic-related Big Spatiotemporal Data —MobiTrafficBD) que estão a ser colecionados e recolhidos. Os trabalhos de investigação sobre os tópicos de Big Data, Dados Espaciotemporais e, especialmente, de MobiTrafficBD, estão dispersos, e a literatura existente não oferece uma metodologia comum e concreta para preparar, configurar, implementar e usar uma plataforma (framework) baseada em tecnologias Big Data para gerir o ciclo de vida de dados espaciotemporais em larga escala, com ênfase nas série temporais georreferenciadas (em inglês, geo-referenced time series — GRTS) e eventos espacio-
temporais (em inglês, spatiotemporal events — ST Events), extrair valor destes dados e apoiar os
RITMOs nos seus processos de decisão.
Esta dissertação doutoral propõe uma metodologia prescritiva orientada a dados, para o design, desenvolvimento e implementação de plataformas de MobiTrafficBD, focadas em GRTS e ST Events. Além de uma revisão de literatura completa nas áreas de Dados Espaciotemporais, Big Data e na junção destas áreas através do conceito de MobiTrafficBD, a metodologia proposta contem um conjunto de características gerais, requisitos técnicos, componentes lógicos, fluxos de dados e modelos de infraestrutura tecnológica, bem como diretrizes e boas
práticas para investigadores, profissionais e outras partes interessadas, como RITMOs, com o
objetivo de guiá-los pelas fases de design, desenvolvimento e implementação de qualquer pla-
taforma MobiTrafficBD.
Este trabalho deve ser visto como um guia metodológico de suporte, baseado em Arqui-
teturas de Referência e diretrizes amplamente utilizadas, mas enriquecido com as característi-
cas e assuntos implícitos relacionados com Dados Espaciotemporais em Larga Escala, como
no caso de GRTS e ST Events. A metodologia proposta foi avaliada e demonstrada em vários
cenários reais no âmbito de projetos de investigação financiados pela Comissão Europeia e
pelo Governo português, nos quais foram implementados métodos, ferramentas e tecnologias
nas áreas de Gestão de Dados, Processamento de Dados e Ciência e Visualização de Dados em
plataformas MobiTrafficB
Composição de serviços para aplicações biomédicas
Doutoramento em Engenharia InformáticaA exigente inovação na área das aplicações biomédicas tem guiado a evolução
das tecnologias de informação nas últimas décadas. Os desafios associados a
uma gestão, integração, análise e interpretação eficientes dos dados
provenientes das mais modernas tecnologias de hardware e software
requerem um esforço concertado. Desde hardware para sequenciação de
genes a registos electrónicos de paciente, passando por pesquisa de
fármacos, a possibilidade de explorar com precisão os dados destes
ambientes é vital para a compreensão da saúde humana. Esta tese engloba a
discussão e o desenvolvimento de melhores estratégias informáticas para
ultrapassar estes desafios, principalmente no contexto da composição de
serviços, incluindo técnicas flexíveis de integração de dados, como
warehousing ou federação, e técnicas avançadas de interoperabilidade, como
serviços web ou LinkedData.
A composição de serviços é apresentada como um ideal genérico, direcionado
para a integração de dados e para a interoperabilidade de software.
Relativamente a esta última, esta investigação debruçou-se sobre o campo da
farmacovigilância, no contexto do projeto Europeu EU-ADR. As contribuições
para este projeto, um novo standard de interoperabilidade e um motor de
execução de workflows, sustentam a sucesso da EU-ADR Web Platform, uma
plataforma para realizar estudos avançados de farmacovigilância. No contexto
do projeto Europeu GEN2PHEN, esta investigação visou ultrapassar os
desafios associados à integração de dados distribuídos e heterogéneos no
campo do varíoma humano. Foi criada uma nova solução, WAVe - Web
Analyses of the Variome, que fornece uma coleção rica de dados de variação
genética através de uma interface Web inovadora e de uma API avançada. O
desenvolvimento destas estratégias evidenciou duas oportunidades claras na
área de software biomédico: melhorar o processo de implementação de
software através do recurso a técnicas de desenvolvimento rápidas e
aperfeiçoar a qualidade e disponibilidade dos dados através da adopção do
paradigma de web semântica.
A plataforma COEUS atravessa as fronteiras de integração e
interoperabilidade, fornecendo metodologias para a aquisição e tradução
flexíveis de dados, bem como uma camada de serviços interoperáveis para
explorar semanticamente os dados agregados. Combinando as técnicas de
desenvolvimento rápidas com a riqueza da perspectiva "Semantic Web in a
box", a plataforma COEUS é uma aproximação pioneira, permitindo o
desenvolvimento da próxima geração de aplicações biomédicas.The demand for innovation in the biomedical software domain has been an
information technologies evolution driver over the last decades. The challenges
associated with the effective management, integration, analyses and
interpretation of the wealth of life sciences information stemming from modern
hardware and software technologies require concerted efforts. From gene
sequencing hardware to pharmacology research up to patient electronic health
records, the ability to accurately explore data from these environments is vital
to further improve our understanding of human health. This thesis encloses the
discussion on building better informatics strategies to address these
challenges, primarily in the context of service composition, including
warehousing and federation strategies for resource integration, as well as web
services or LinkedData for software interoperability.
Service composition is introduced as a general principle, geared towards data
integration and software interoperability. Concerning the latter, this research
covers the service composition requirements within the pharmacovigilance
field, namely on the European EU-ADR project. The contributions to this area,
the definition of a new interoperability standard and the creation of a new
workflow-wrapping engine, are behind the successful construction of the EUADR
Web Platform, a workspace for delivering advanced pharmacovigilance
studies. In the context of the European GEN2PHEN project, this research
tackles the challenges associated with the integration of heterogeneous and
distributed data in the human variome field. For this matter, a new lightweight
solution was created: WAVe, Web Analysis of the Variome, provides a rich
collection of genetic variation data through an innovative portal and an
advanced API. The development of the strategies underlying these products
highlighted clear opportunities in the biomedical software field: enhancing the
software implementation process with rapid application development
approaches and improving the quality and availability of data with the adoption
of the Semantic Web paradigm.
COEUS crosses the boundaries of integration and interoperability as it provides
a framework for the flexible acquisition and translation of data into a semantic
knowledge base, as well as a comprehensive set of interoperability services,
from REST to LinkedData, to fully exploit gathered data semantically. By
combining the lightness of rapid application development strategies with the
richness of its "Semantic Web in a box" approach, COEUS is a pioneering
framework to enhance the development of the next generation of biomedical
applications
- …