37 research outputs found
Aggregation of biological knowledge for immunological and virological applications
Ph.DDOCTOR OF PHILOSOPH
A standards-based ICT framework to enable a service-oriented approach to clinical decision support
This research provides evidence that standards based Clinical Decision Support (CDS)
at the point of care is an essential ingredient of electronic healthcare service delivery. A
Service Oriented Architecture (SOA) based solution is explored, that serves as a task
management system to coordinate complex distributed and disparate IT systems,
processes and resources (human and computer) to provide standards based CDS.
This research offers a solution to the challenges in implementing computerised CDS such
as integration with heterogeneous legacy systems. Reuse of components and services to
reduce costs and save time. The benefits of a sharable CDS service that can be reused by
different healthcare practitioners to provide collaborative patient care is demonstrated.
This solution provides orchestration among different services by extracting data from
sources like patient databases, clinical knowledge bases and evidence-based clinical
guidelines (CGs) in order to facilitate multiple CDS requests coming from different
healthcare settings. This architecture aims to aid users at different levels of Healthcare
Delivery Organizations (HCOs) to maintain a CDS repository, along with monitoring and
managing services, thus enabling transparency.
The research employs the Design Science research methodology (DSRM) combined with
The Open Group Architecture Framework (TOGAF), an open source group initiative for
Enterprise Architecture Framework (EAF). DSRMâs iterative capability addresses the
rapidly evolving nature of workflows in healthcare. This SOA based solution uses
standards-based open source technologies and platforms, the latest healthcare standards
by HL7 and OMG, Decision Support Service (DSS) and Retrieve, Update Locate Service
(RLUS) standard. Combining business process management (BPM) technologies,
business rules with SOA ensures the HCOâs capability to manage its processes. This
architectural solution is evaluated by successfully implementing evidence based CGs at
the point of care in areas such as; a) Diagnostics (Chronic Obstructive Disease), b) Urgent
Referral (Lung Cancer), c) Genome testing and integration with CDS in screening
(Lynchâs syndrome). In addition to medical care, the CDS solution can benefit
organizational processes for collaborative care delivery by connecting patients,
physicians and other associated members. This framework facilitates integration of
different types of CDS ideal for the different healthcare processes, enabling sharable CDS
capabilities within and across organizations
Enabling Complex Semantic Queries to Bioinformatics Databases through Intuitive Search Over Data
Data integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data already available publicly. However, the heterogene- ity of the existing data sources still poses significant challenges for achieving interoperability among biological databases. Furthermore, merely solving the technical challenges of data in- tegration, for example through the use of common data representation formats, leaves open the larger problem. Namely, the steep learning curve required for understanding the data models of each public source, as well as the technical language through which the sources can be queried and joined. As a consequence, most of the available biological data remain practically unexplored today.
In this thesis, we address these problems jointly, by first introducing an ontology-based data integration solution in order to mitigate the data source heterogeneity problem. We illustrate through the concrete example of Bgee, a gene expression data source, how relational databases can be exposed as virtual Resource Description Framework (RDF) graphs, through relational-to-RDF mappings. This has the important advantage that the original data source can remain unmodified, while still becoming interoperable with external RDF sources.
We complement our methods with applied case studies designed to guide domain experts in formulating expressive federated queries targeting the integrated data across the domains of evolutionary relationships and gene expression. More precisely, we introduce two com- parative analyses, first within the same domain (using orthology data from multiple, inter- operable, data sources) and second across domains, in order to study the relation between expression change and evolution rate following a duplication event.
Finally, in order to bridge the semantic gap between users and data, we design and im- plement Bio-SODA, a question answering system over domain knowledge graphs, that does not require training data for translating user questions to SPARQL. Bio-SODA uses a novel ranking approach that combines syntactic and semantic similarity, while also incorporating node centrality metrics to rank candidate matches for a given user question. Our results in testing Bio-SODA across several real-world databases that span multiple domains (both within and outside bioinformatics) show that it can answer complex, multi-fact queries, be- yond the current state-of-the-art in the more well-studied open-domain question answering.
--
LâintĂ©gration des donnĂ©es promet dâĂȘtre lâun des principaux catalyseurs permettant dâextraire des nouveaux aperçus de la richesse des donnĂ©es biologiques dĂ©jĂ disponibles publiquement. Cependant, lâhĂ©tĂ©rogĂ©nĂ©itĂ© des sources de donnĂ©es existantes pose encore des dĂ©fis importants pour parvenir Ă lâinteropĂ©rabilitĂ© des bases de donnĂ©es biologiques. De plus, en surmontant seulement les dĂ©fis techniques de lâintĂ©gration des donnĂ©es, par exemple grĂące Ă lâutilisation de formats standard de reprĂ©sentation de donnĂ©es, on laisse ouvert un problĂšme encore plus grand. Ă savoir, la courbe dâapprentissage abrupte nĂ©cessaire pour comprendre la modĂ©li- sation des donnĂ©es choisie par chaque source publique, ainsi que le langage technique par lequel les sources peuvent ĂȘtre interrogĂ©s et jointes. Par consĂ©quent, la plupart des donnĂ©es biologiques publiquement disponibles restent pratiquement inexplorĂ©s aujourdâhui.
Dans cette thĂšse, nous abordons lâensemble des deux problĂšmes, en introduisant dâabord une solution dâintĂ©gration de donnĂ©es basĂ©e sur ontologies, afin dâattĂ©nuer le problĂšme dâhĂ©tĂ©- rogĂ©nĂ©itĂ© des sources de donnĂ©es. Nous montrons, Ă travers lâexemple de Bgee, une base de donnĂ©es dâexpression de gĂšnes, une approche permettant les bases de donnĂ©es relationnelles dâĂȘtre publiĂ©s sous forme de graphes RDF (Resource Description Framework) virtuels, via des correspondances relationnel-vers-RDF (« relational-to-RDF mappings »). Cela prĂ©sente lâimportant avantage que la source de donnĂ©es dâorigine peut rester inchangĂ©, tout en de- venant interopĂ©rable avec les sources RDF externes.
Nous complĂ©tons nos mĂ©thodes avec des Ă©tudes de cas appliquĂ©es, conçues pour guider les experts du domaine dans la formulation de requĂȘtes fĂ©dĂ©rĂ©es expressives, ciblant les don- nĂ©es intĂ©grĂ©es dans les domaines des relations Ă©volutionnaires et de lâexpression des gĂšnes. Plus prĂ©cisĂ©ment, nous introduisons deux analyses comparatives, dâabord dans le mĂȘme do- maine (en utilisant des donnĂ©es dâorthologie provenant de plusieurs sources de donnĂ©es in- teropĂ©rables) et ensuite Ă travers des domaines interconnectĂ©s, afin dâĂ©tudier la relation entre le changement dâexpression et le taux dâĂ©volution suite Ă une duplication de gĂšne.
Enfin, afin de mitiger le dĂ©calage sĂ©mantique entre les utilisateurs et les donnĂ©es, nous concevons et implĂ©mentons Bio-SODA, un systĂšme de rĂ©ponse aux questions sur des graphes de connaissances domaine-spĂ©cifique, qui ne nĂ©cessite pas de donnĂ©es de formation pour traduire les questions des utilisateurs vers SPARQL. Bio-SODA utilise une nouvelle ap- proche de classement qui combine la similaritĂ© syntactique et sĂ©mantique, tout en incorporant des mĂ©triques de centralitĂ© des nĆuds, pour classer les possibles candidats en rĂ©ponse Ă une question utilisateur donnĂ©e. Nos rĂ©sultats suite aux tests effectuĂ©s en utilisant Bio-SODA sur plusieurs bases de donnĂ©es Ă travers plusieurs domaines (tantĂŽt liĂ©s Ă la bioinformatique quâextĂ©rieurs) montrent que Bio-SODA rĂ©ussit Ă rĂ©pondre Ă des questions complexes, en- gendrant multiples entitĂ©s, au-delĂ de lâĂ©tat actuel de la technique en matiĂšre de systĂšmes de rĂ©ponses aux questions sur les donnĂ©es structures, en particulier graphes de connaissances
ICSEA 2021: the sixteenth international conference on software engineering advances
The Sixteenth International Conference on Software Engineering Advances (ICSEA 2021), held on October 3 - 7, 2021 in Barcelona, Spain, continued a series of events covering a broad spectrum of software-related topics.
The conference covered fundamentals on designing, implementing, testing, validating and maintaining various kinds of software. The tracks treated the topics from theory to practice, in terms of methodologies, design, implementation, testing, use cases, tools, and lessons learnt. The conference topics covered classical and advanced methodologies, open source, agile software, as well as software deployment and software economics and education.
The conference had the following tracks:
Advances in fundamentals for software development
Advanced mechanisms for software development
Advanced design tools for developing software
Software engineering for service computing (SOA and Cloud)
Advanced facilities for accessing software
Software performance
Software security, privacy, safeness
Advances in software testing
Specialized software advanced applications
Web Accessibility
Open source software
Agile and Lean approaches in software engineering
Software deployment and maintenance
Software engineering techniques, metrics, and formalisms
Software economics, adoption, and education
Business technology
Improving productivity in research on software engineering
Trends and achievements
Similar to the previous edition, this event continued to be very competitive in its selection process and very well perceived by the international software engineering community. As such, it is attracting excellent contributions and active participation from all over the world. We were very pleased to receive a large amount of top quality contributions.
We take here the opportunity to warmly thank all the members of the ICSEA 2021 technical program committee as well as the numerous reviewers. The creation of such a broad and high quality conference program would not have been possible without their involvement. We also kindly thank all the authors that dedicated much of their time and efforts to contribute to the ICSEA 2021. We truly believe that thanks to all these efforts, the final conference program consists of top quality contributions.
This event could also not have been a reality without the support of many individuals, organizations and sponsors. We also gratefully thank the members of the ICSEA 2021 organizing committee for their help in handling the logistics and for their work that is making this professional meeting a success.
We hope the ICSEA 2021 was a successful international forum for the exchange of ideas and results between academia and industry and to promote further progress in software engineering research
Data management and Data Pipelines: An empirical investigation in the embedded systems domain
Context: Companies are increasingly collecting data from all possible sources to extract insights that help in data-driven decision-making. Increased data volume, variety, and velocity and the impact of poor quality data on the development of data products are leading companies to look for an improved data management approach that can accelerate the development of high-quality data products. Further, AI is being applied in a growing number of fields, and thus it is evolving as a horizontal technology. Consequently, AI components are increasingly been integrated into embedded systems along with electronics and software. We refer to these systems as AI-enhanced embedded systems. Given the strong dependence of AI on data, this expansion also creates a new space for applying data management techniques. Objective: The overall goal of this thesis is to empirically identify the data management challenges encountered during the development and maintenance of AI-enhanced embedded systems, propose an improved data management approach and empirically validate the proposed approach.Method: To achieve the goal, we conducted this research in close collaboration with Software Center companies using a combination of different empirical research methods: case studies, literature reviews, and action research.Results and conclusions: This research provides five main results. First, it identifies key data management challenges specific to Deep Learning models developed at embedded system companies. Second, it examines the practices such as DataOps and data pipelines that help to address data management challenges. We observed that DataOps is the best data management practice that improves the data quality and reduces the time tdevelop data products. The data pipeline is the critical component of DataOps that manages the data life cycle activities. The study also provides the potential faults at each step of the data pipeline and the corresponding mitigation strategies. Finally, the data pipeline model is realized in a small piece of data pipeline and calculated the percentage of saved data dumps through the implementation.Future work: As future work, we plan to realize the conceptual data pipeline model so that companies can build customized robust data pipelines. We also plan to analyze the impact and value of data pipelines in cross-domain AI systems and data applications. We also plan to develop AI-based fault detection and mitigation system suitable for data pipelines
Planning for the Lifecycle Management and Long-Term Preservation of Research Data: A Federated Approach
Outcomes of the grant are archived here.The âdata delugeâ is a recent but increasingly well-understood phenomenon of scientific and social inquiry. Large-scale research instruments extend our observational power by many orders of magnitude but at the same time generate massive amounts of data. Researchers work feverishly to document and preserve changing or disappearing habitats, cultures, languages, and artifacts resulting in volumes of media in various formats. New software tools mine a growing universe of historical and modern texts and connect the dots in our semantic environment. Libraries, archives, and museums undertake digitization programs creating broad access to unique cultural heritage resources for research. Global-scale research collaborations with hundreds or thousands of participants, drive the creation of massive amounts of data, most of which cannot be recreated if lost. The University of Kansas (KU) Libraries in collaboration with two partners, the Greater Western Library Alliance (GWLA) and the Great Plains Network (GPN), received an IMLS National Leadership Grant designed to leverage collective strengths and create a proposal for a scalable and federated approach to the lifecycle management of research data based on the needs of GPN and GWLA member institutions.Institute for Museum and Library Services LG-51-12-0695-1
Bioinformatics
This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here