563 research outputs found
Serviços de integração de dados para aplicações biomédicas
Doutoramento em Informática (MAP-i)In the last decades, the field of biomedical science has fostered
unprecedented scientific advances. Research is stimulated by the
constant evolution of information technology, delivering novel and
diverse bioinformatics tools. Nevertheless, the proliferation of new and
disconnected solutions has resulted in massive amounts of resources
spread over heterogeneous and distributed platforms. Distinct
data types and formats are generated and stored in miscellaneous
repositories posing data interoperability challenges and delays in
discoveries. Data sharing and integrated access to these resources
are key features for successful knowledge extraction.
In this context, this thesis makes contributions towards accelerating
the semantic integration, linkage and reuse of biomedical resources.
The first contribution addresses the connection of distributed and
heterogeneous registries. The proposed methodology creates a
holistic view over the different registries, supporting semantic
data representation, integrated access and querying. The second
contribution addresses the integration of heterogeneous information
across scientific research, aiming to enable adequate data-sharing
services. The third contribution presents a modular architecture to
support the extraction and integration of textual information, enabling
the full exploitation of curated data. The last contribution lies
in providing a platform to accelerate the deployment of enhanced
semantic information systems. All the proposed solutions were
deployed and validated in the scope of rare diseases.Nas últimas décadas, o campo das ciências biomédicas proporcionou
grandes avanços científicos estimulados pela constante evolução das
tecnologias de informação. A criação de diversas ferramentas na
área da bioinformática e a falta de integração entre novas soluções
resultou em enormes quantidades de dados distribuídos por diferentes
plataformas. Dados de diferentes tipos e formatos são gerados
e armazenados em vários repositórios, o que origina problemas de
interoperabilidade e atrasa a investigação. A partilha de informação
e o acesso integrado a esses recursos são características fundamentais
para a extração bem sucedida do conhecimento científico.
Nesta medida, esta tese fornece contribuições para acelerar a
integração, ligação e reutilização semântica de dados biomédicos. A
primeira contribuição aborda a interconexão de registos distribuídos e
heterogéneos. A metodologia proposta cria uma visão holística sobre
os diferentes registos, suportando a representação semântica de dados
e o acesso integrado. A segunda contribuição aborda a integração
de diversos dados para investigações científicas, com o objetivo de
suportar serviços interoperáveis para a partilha de informação. O
terceiro contributo apresenta uma arquitetura modular que apoia a
extração e integração de informações textuais, permitindo a exploração
destes dados. A última contribuição consiste numa plataforma web
para acelerar a criação de sistemas de informação semânticos. Todas
as soluções propostas foram validadas no âmbito das doenças raras
Composição de serviços para aplicações biomédicas
Doutoramento em Engenharia InformáticaA exigente inovação na área das aplicações biomédicas tem guiado a evolução
das tecnologias de informação nas últimas décadas. Os desafios associados a
uma gestão, integração, análise e interpretação eficientes dos dados
provenientes das mais modernas tecnologias de hardware e software
requerem um esforço concertado. Desde hardware para sequenciação de
genes a registos electrónicos de paciente, passando por pesquisa de
fármacos, a possibilidade de explorar com precisão os dados destes
ambientes é vital para a compreensão da saúde humana. Esta tese engloba a
discussão e o desenvolvimento de melhores estratégias informáticas para
ultrapassar estes desafios, principalmente no contexto da composição de
serviços, incluindo técnicas flexíveis de integração de dados, como
warehousing ou federação, e técnicas avançadas de interoperabilidade, como
serviços web ou LinkedData.
A composição de serviços é apresentada como um ideal genérico, direcionado
para a integração de dados e para a interoperabilidade de software.
Relativamente a esta última, esta investigação debruçou-se sobre o campo da
farmacovigilância, no contexto do projeto Europeu EU-ADR. As contribuições
para este projeto, um novo standard de interoperabilidade e um motor de
execução de workflows, sustentam a sucesso da EU-ADR Web Platform, uma
plataforma para realizar estudos avançados de farmacovigilância. No contexto
do projeto Europeu GEN2PHEN, esta investigação visou ultrapassar os
desafios associados à integração de dados distribuídos e heterogéneos no
campo do varíoma humano. Foi criada uma nova solução, WAVe - Web
Analyses of the Variome, que fornece uma coleção rica de dados de variação
genética através de uma interface Web inovadora e de uma API avançada. O
desenvolvimento destas estratégias evidenciou duas oportunidades claras na
área de software biomédico: melhorar o processo de implementação de
software através do recurso a técnicas de desenvolvimento rápidas e
aperfeiçoar a qualidade e disponibilidade dos dados através da adopção do
paradigma de web semântica.
A plataforma COEUS atravessa as fronteiras de integração e
interoperabilidade, fornecendo metodologias para a aquisição e tradução
flexíveis de dados, bem como uma camada de serviços interoperáveis para
explorar semanticamente os dados agregados. Combinando as técnicas de
desenvolvimento rápidas com a riqueza da perspectiva "Semantic Web in a
box", a plataforma COEUS é uma aproximação pioneira, permitindo o
desenvolvimento da próxima geração de aplicações biomédicas.The demand for innovation in the biomedical software domain has been an
information technologies evolution driver over the last decades. The challenges
associated with the effective management, integration, analyses and
interpretation of the wealth of life sciences information stemming from modern
hardware and software technologies require concerted efforts. From gene
sequencing hardware to pharmacology research up to patient electronic health
records, the ability to accurately explore data from these environments is vital
to further improve our understanding of human health. This thesis encloses the
discussion on building better informatics strategies to address these
challenges, primarily in the context of service composition, including
warehousing and federation strategies for resource integration, as well as web
services or LinkedData for software interoperability.
Service composition is introduced as a general principle, geared towards data
integration and software interoperability. Concerning the latter, this research
covers the service composition requirements within the pharmacovigilance
field, namely on the European EU-ADR project. The contributions to this area,
the definition of a new interoperability standard and the creation of a new
workflow-wrapping engine, are behind the successful construction of the EUADR
Web Platform, a workspace for delivering advanced pharmacovigilance
studies. In the context of the European GEN2PHEN project, this research
tackles the challenges associated with the integration of heterogeneous and
distributed data in the human variome field. For this matter, a new lightweight
solution was created: WAVe, Web Analysis of the Variome, provides a rich
collection of genetic variation data through an innovative portal and an
advanced API. The development of the strategies underlying these products
highlighted clear opportunities in the biomedical software field: enhancing the
software implementation process with rapid application development
approaches and improving the quality and availability of data with the adoption
of the Semantic Web paradigm.
COEUS crosses the boundaries of integration and interoperability as it provides
a framework for the flexible acquisition and translation of data into a semantic
knowledge base, as well as a comprehensive set of interoperability services,
from REST to LinkedData, to fully exploit gathered data semantically. By
combining the lightness of rapid application development strategies with the
richness of its "Semantic Web in a box" approach, COEUS is a pioneering
framework to enhance the development of the next generation of biomedical
applications
Research data management challenges in citizen science projects and recommendations for library support services. A scoping review and case study.
Citizen science (CS) projects are part of a new era of data aggregation and harmonisation that facilitates interconnections between different datasets. Increasing the value and reuse of CS data has received growing attention with the appearance of the FAIR principles and systematic research data management (RDM) practises, which are often promoted by university libraries. However, RDM initiatives in CS appear diversified and if CS have special needs in terms of RDM is unclear. Therefore, the aim of this article is firstly to identify RDM challenges for CS projects and secondly, to discuss how university libraries may support any such challenges. A scoping review and a case study of Danish CS projects were performed to identify RDM challenges. 48 articles were selected for data extraction. Four academic project leaders were interviewed about RDM practices in their CS projects. Challenges and recommendations identified in the review and case study are often not specific for CS. However, finding CS data, engaging specific populations, attributing volunteers and handling sensitive data including health data are some of the challenges requiring special attention by CS project managers. Scientific requirements or national practices do not always encompass the nature of CS projects. Based on the identified challenges, it is recommended that university libraries focus their services on 1) identifying legal and ethical issues that the project managers should be aware of in their projects, 2) elaborating these issues in a Terms of Participation that also specifies data handling and sharing to the citizen scientist, and 3) motivating the project manager to good data handling practises. Adhering to the FAIR principles and good RDM practices in CS projects will continuously secure contextualisation and data quality. High data quality increases the value and reuse of the data and, therefore, the empowerment of the citizen scientists
Linked Open Data: State-of-the-Art Mechanisms and Conceptual Framework
Today, one of the state-of-the-art technologies that have shown its importance towards data integration and analysis is the linked open data (LOD) systems or applications. LOD constitute of machine-readable resources or mechanisms that are useful in describing data properties. However, one of the issues with the existing systems or data models is the need for not just representing the derived information (data) in formats that can be easily understood by humans, but also creating systems that are able to process the information that they contain or support. Technically, the main mechanisms for developing the data or information processing systems are the aspects of aggregating or computing the metadata descriptions for the various process elements. This is due to the fact that there has been more than ever an increasing need for a more generalized and standard definition of data (or information) to create systems capable of providing understandable formats for the different data types and sources. To this effect, this chapter proposes a semantic-based linked open data framework (SBLODF) that integrates the different elements (entities) within information systems or models with semantics (metadata descriptions) to produce explicit and implicit information based on users’ search or queries. In essence, this work introduces a machine-readable and machine-understandable system that proves to be useful for encoding knowledge about different process domains, as well as provides the discovered information (knowledge) at a more conceptual level
Developing a European grid infrastructure for cancer research: vision, architecture and services
Life sciences are currently at the centre of an information revolution. The nature and amount of information now available opens up areas of research that were once in the realm of science fiction. During this information revolution, the data-gathering capabilities have greatly surpassed the data-analysis techniques. Data integration across heterogeneous data sources and data aggregation across different aspects of the biomedical spectrum, therefore, is at the centre of current biomedical and pharmaceutical R&D
Cloud-based solutions supporting data and knowledge integration in bioinformatics
In recent years, computer advances have changed the way the science progresses and have
boosted studies in silico; as a result, the concept of “scientific research” in bioinformatics
has quickly changed shifting from the idea of a local laboratory activity towards Web
applications and databases provided over the network as services. Thus, biologists have
become among the largest beneficiaries of the information technologies, reaching and
surpassing the traditional ICT users who operate in the field of so-called "hard science"
(i.e., physics, chemistry, and mathematics). Nevertheless, this evolution has to deal with
several aspects (including data deluge, data integration, and scientific collaboration, just to
cite a few) and presents new challenges related to the proposal of innovative approaches in
the wide scenario of emergent ICT solutions.
This thesis aims at facing these challenges in the context of three case studies, being
each case study devoted to cope with a specific open issue by proposing proper solutions in
line with recent advances in computer science.
The first case study focuses on the task of unearthing and integrating information from
different web resources, each having its own organization, terminology and data formats in
order to provide users with flexible environment for accessing the above resources and
smartly exploring their content. The study explores the potential of cloud paradigm as an
enabling technology to severely curtail issues associated with scalability and performance
of applications devoted to support the above task. Specifically, it presents Biocloud Search
EnGene (BSE), a cloud-based application which allows for searching and integrating
biological information made available by public large-scale genomic repositories. BSE is
publicly available at: http://biocloud-unica.appspot.com/.
The second case study addresses scientific collaboration on the Web with special focus
on building a semantic network, where team members, adequately supported by easy
access to biomedical ontologies, define and enrich network nodes with annotations derived
from available ontologies. The study presents a cloud-based application called
Collaborative Workspaces in Biomedicine (COWB) which deals with supporting users in
the construction of the semantic network by organizing, retrieving and creating
connections between contents of different types. Public and private workspaces provide an
accessible representation of the collective knowledge that is incrementally expanded.
COWB is publicly available at: http://cowb-unica.appspot.com/.
Finally, the third case study concerns the knowledge extraction from very large datasets.
The study investigates the performance of random forests in classifying microarray data. In
particular, the study faces the problem of reducing the contribution of trees whose nodes
are populated by non-informative features. Experiments are presented and results are then
analyzed in order to draw guidelines about how reducing the above contribution.
With respect to the previously mentioned challenges, this thesis sets out to give two
contributions summarized as follows. First, the potential of cloud technologies has been
evaluated for developing applications that support the access to bioinformatics resources
and the collaboration by improving awareness of user's contributions and fostering users
interaction. Second, the positive impact of the decision support offered by random forests
has been demonstrated in order to tackle effectively the curse of dimensionality
Systems Biology in ELIXIR: modelling in the spotlight
info:eu-repo/semantics/publishedVersio
Fine-Grained Workflow Interoperability in Life Sciences
In den vergangenen Jahrzehnten führten Fortschritte in den Schlüsseltechnologien der Lebenswissenschaften zu einer exponentiellen Zunahme der zur Verfügung stehenden biologischen Daten. Um Ergebnisse zeitnah generieren zu können werden sowohl spezialisierte Rechensystem als auch Programmierfähigkeiten benötigt: Desktopcomputer oder monolithische Ansätze sind weder in der Lage mit dem Wachstum der verfügbaren biologischen Daten noch mit der Komplexität der Analysetechniken Schritt zu halten.
Workflows erlauben diesem Trend durch Parallelisierungsansätzen und verteilten Rechensystemen entgegenzuwirken. Ihre transparenten Abläufe, gegeben durch ihre klar definierten Strukturen, ebenso ihre Wiederholbarkeit, erfüllen die Standards der Reproduzierbarkeit, welche an wissenschaftliche Methoden gestellt werden.
Eines der Ziele unserer Arbeit ist es Forschern beim Bedienen von Rechensystemen zu unterstützen, ohne dass Programmierkenntnisse notwendig sind. Dafür wurde eine Sammlung von Tools entwickelt, welche jedes Kommandozeilenprogramm in ein Workflowsystem integrieren kann. Ohne weitere Anpassungen kann unser Programm zwei weit verbreitete Workflowsysteme unterstützen. Unser modularer Entwurf erlaubt zudem Unterstützung für weitere Workflowmaschinen hinzuzufügen.
Basierend auf der Bedeutung von frühen und robusten Workflowentwürfen, haben wir außerdem eine wohl etablierte Desktop–basierte Analyseplattform erweitert. Diese enthält über 2.000 Aufgaben, wobei jede als Baustein in einem Workflow fungiert. Die Plattform erlaubt einfache Entwicklung neuer Aufgaben und die Integration externer Kommandozeilenprogramme. In dieser Arbeit wurde ein Plugin zur Konvertierung entwickelt, welches nutzerfreundliche Mechanismen bereitstellt, um Workflows auf verteilten Hochleistungsrechensystemen auszuführen—eine Aufgabe, die sonst technische Kenntnisse erfordert, die gewöhnlich nicht zum Anforderungsprofil eines Lebenswissenschaftlers gehören.
Unsere Konverter–Erweiterung generiert quasi identische Versionen desselben Workflows, welche im Anschluss auf leistungsfähigen Berechnungsressourcen ausgeführt werden können. Infolgedessen werden nicht nur die Möglichkeiten von verteilten hochperformanten Rechensystemen sowie die Bequemlichkeit eines für Desktopcomputer entwickelte Workflowsystems ausgenutzt, sondern zusätzlich werden Berechnungsbeschränkungen von Desktopcomputern und die steile Lernkurve, die mit dem Workflowentwurf auf verteilten Systemen verbunden ist, umgangen. Unser Konverter–Plugin hat sofortige Anwendung für Forscher. Wir zeigen dies in drei für die Lebenswissenschaften relevanten Anwendungsbeispielen: Strukturelle Bioinformatik, Immuninformatik, und Metabolomik.Recent decades have witnessed an exponential increase of available biological data due to advances in key technologies for life sciences. Specialized computing resources and scripting skills are now required to deliver results in a timely fashion: desktop computers or monolithic approaches can no longer keep pace with neither the growth of available biological data nor the complexity of analysis techniques.
Workflows offer an accessible way to counter against this trend by facilitating parallelization and distribution of computations. Given their structured and repeatable nature, workflows also provide a transparent process to satisfy strict reproducibility standards required by the scientific method.
One of the goals of our work is to assist researchers in accessing computing resources without the need for programming or scripting skills. To this effect, we created a toolset able to integrate any command line tool into workflow systems. Out of the box, our toolset supports two widely–used workflow systems, but our modular design allows for seamless additions in order to support further workflow engines.
Recognizing the importance of early and robust workflow design, we also extended a well–established, desktop–based analytics platform that contains more than two thousand tasks (each being a building block for a workflow), allows easy development of new tasks and is able to integrate external command line tools. We developed a converter plug–in that offers a user–friendly mechanism to execute workflows on distributed high–performance computing resources—an exercise that would otherwise require technical skills typically not associated with the average life scientist's profile.
Our converter extension generates virtually identical versions of the same workflows, which can then be executed on more capable computing resources. That is, not only did we leverage the capacity of distributed high–performance resources and the conveniences of a workflow engine designed for personal computers but we also circumvented computing limitations of personal computers and the steep learning curve associated with creating workflows for distributed environments. Our converter extension has immediate applications for researchers and we showcase our results by means of three use cases relevant for life scientists: structural bioinformatics, immunoinformatics and metabolomics
- …