24 research outputs found
Short papers of the 10th Conference on Cloud Computing, Big Data & Emerging Topics
Compilación de los short papers presentados en las 10mas Jornadas de Cloud Computing, Big Data & Emerging Topics (JCC-BD&ET2022), llevadas a cabo en modalidad híbrida durante junio de 2021 y organizadas por el Instituto de Investigación en Informática LIDI (III-LIDI) y la Secretaría de Posgrado de la Facultad de Informática de la UNLP, en colaboración con universidades de Argentina y del exterior.Facultad de Informátic
Systems Biology in ELIXIR: modelling in the spotlight
In this white paper, we describe the founding of a new ELIXIR Community - the Systems Biology Community - and its proposed future contributions to both ELIXIR and the broader community of systems biologists in Europe and worldwide. The Community believes that the infrastructure aspects of systems biology - databases, (modelling) tools and standards development, as well as training and access to cloud infrastructure - are not only appropriate components of the ELIXIR infrastructure, but will prove key components of ELIXIR\u27s future support of advanced biological applications and personalised medicine. By way of a series of meetings, the Community identified seven key areas for its future activities, reflecting both future needs and previous and current activities within ELIXIR Platforms and Communities. These are: overcoming barriers to the wider uptake of systems biology; linking new and existing data to systems biology models; interoperability of systems biology resources; further development and embedding of systems medicine; provisioning of modelling as a service; building and coordinating capacity building and training resources; and supporting industrial embedding of systems biology. A set of objectives for the Community has been identified under four main headline areas: Standardisation and Interoperability, Technology, Capacity Building and Training, and Industrial Embedding. These are grouped into short-term (3-year), mid-term (6-year) and long-term (10-year) objectives
Systems Biology in ELIXIR: modelling in the spotlight
info:eu-repo/semantics/publishedVersio
Perspectives on Digital Humanism
This open access book aims to set an agenda for research and action in the field of Digital Humanism through short essays written by selected thinkers from a variety of disciplines, including computer science, philosophy, education, law, economics, history, anthropology, political science, and sociology. This initiative emerged from the Vienna Manifesto on Digital Humanism and the associated lecture series. Digital Humanism deals with the complex relationships between people and machines in digital times. It acknowledges the potential of information technology. At the same time, it points to societal threats such as privacy violations and ethical concerns around artificial intelligence, automation and loss of jobs, ongoing monopolization on the Web, and sovereignty. Digital Humanism aims to address these topics with a sense of urgency but with a constructive mindset. The book argues for a Digital Humanism that analyses and, most importantly, influences the complex interplay of technology and humankind toward a better society and life while fully respecting universal human rights. It is a call to shaping technologies in accordance with human values and needs
Scaling the development of large ontologies : identitas and hypernormalization
PhD ThesisDuring the last decade ontologies have become a fundamental part of the life sciences
to build organised computational knowledge. Currently, there are more than
800 biomedical ontologies hosted by the NCBO BioPortal repository. However, the
proliferation of ontologies in the biomedical and biological domains has highlighted
a number of problems. As ontologies become large, their development and maintenance
becomes more challenging and time-consuming. Therefore, the scalability of
ontology development has become problematic. In this thesis, we examine two new
approaches that can help address this challenge.
First, we consider a new approach to identi ers that could signi cantly facilitate the
scalability of ontologies and overcome some related issues with monotonic, numeric
identi ers while remaining semantics-free. Our solutions are described, along with
the Identitas library, which allows concurrent development, pronounceability and
error checking. The library integrated into two ontology development environments,
Prot eg e and Tawny-OWL. This thesis also discusses the ways in which current ontological
practices could be migrated towards the use of this scheme.
Second, we investigate the usage of the hypernormalisation, patternisation and programatic
approaches by asking how we could use this approach to rebuild the Gene
Ontology (GO). The aim of the hypernormalisation and patternisation techniques
is to allow the ontology developer to manage its maintainability and evolution. To
apply this approach we had to analyse the ontology structure, starting with the
Molecular Function Ontology (MFO). The MFO is formed from several large and
tangled hierarchies of classes, each of which describe a broad molecular activity.
The exploitation of the hypernormalisation approach resulted in the creation of a
hypernormalised form of the Transporter Activity (TA) and Catalytic Activity (CA)
hierarchies, together they constitute 78% of all classes in MFO. The hypernormalised
structure of the TA and CA are generated based on developed higher-level patterns
and novel content-speci c patterns, and exploit ontology logical reasoners. The gen-
erated ontologies are robust, easy to maintain and can be developed and extended
freely. Although, there are a variety of ontologies development tools, Tawny-OWL is
a programmatic interactive tool for ontology creation and management and provides
a set of patterns that explicitly support the creation of a hypernormalised ontology.
Finally, the investigation of the hypernormalisation highlighted inconsistent classi-
cations and identi cation of signi cant semantic mismatch between GO and the
Chemical Entities of Biological Interest (ChEBI). Although both ontologies describe
the same real entities, GO often refers to the form most common in biology, while
ChEBI is more speci c and precise. The use of hypernormalisation forces us to
deal with this mismatch, we used the equivalence axioms created by the GO-Plus
ontology.
To sum up, to address the scalability and ease development of ontologies we propose a
new identi er scheme and investigate the use of the hypernormalisation methodology.
Together, the Identitas and the hypernormalisation technique should enable the
construction of large-scale ontologies in the future.Northern Borders University, Saudi Arabia
Semantic systems biology of prokaryotes : heterogeneous data integration to understand bacterial metabolism
The goal of this thesis is to improve the prediction of genotype to phenotypeassociations with a focus on metabolic phenotypes of prokaryotes. This goal isachieved through data integration, which in turn required the development ofsupporting solutions based on semantic web technologies. Chapter 1 providesan introduction to the challenges associated to data integration. Semantic webtechnologies provide solutions to some of these challenges and the basics ofthese technologies are explained in the Introduction. Furthermore, the ba-sics of constraint based metabolic modeling and construction of genome scalemodels (GEM) are also provided. The chapters in the thesis are separated inthree related topics: chapters 2, 3 and 4 focus on data integration based onheterogeneous networks and their application to the human pathogen M. tu-berculosis; chapters 5, 6, 7, 8 and 9 focus on the semantic web based solutionsto genome annotation and applications thereof; and chapter 10 focus on thefinal goal to associate genotypes to phenotypes using GEMs. Chapter 2 provides the prototype of a workflow to efficiently analyze in-formation generated by different inference and prediction methods. This me-thod relies on providing the user the means to simultaneously visualize andanalyze the coexisting networks generated by different algorithms, heteroge-neous data sets, and a suite of analysis tools. As a show case, we have ana-lyzed the gene co-expression networks of M. tuberculosis generated using over600 expression experiments. Hereby we gained new knowledge about theregulation of the DNA repair, dormancy, iron uptake and zinc uptake sys-tems. Furthermore, it enabled us to develop a pipeline to integrate ChIP-seqdat and a tool to uncover multiple regulatory layers. In chapter 3 the prototype presented in chapter 2 is further developedinto the Synchronous Network Data Integration (SyNDI) framework, whichis based on Cytoscape and Galaxy. The functionality and usability of theframework is highlighted with three biological examples. We analyzed thedistinct connectivity of plasma metabolites in networks associated with highor low latent cardiovascular disease risk. We obtained deeper insights froma few similar inflammatory response pathways in Staphylococcus aureus infec-tion common to human and mouse. We identified not yet reported regulatorymotifs associated with transcriptional adaptations of M. tuberculosis.In chapter 4 we present a review providing a systems level overview ofthe molecular and cellular components involved in divalent metal homeosta-sis and their role in regulating the three main virulence strategies of M. tu-berculosis: immune modulation, dormancy and phagosome escape. With theuse of the tools presented in chapter 2 and 3 we identified a single regulatorycascade for these three virulence strategies that respond to limited availabilityof divalent metals in the phagosome. The tools presented in chapter 2 and 3 achieve data integration throughthe use of multiple similarity, coexistence, coexpression and interaction geneand protein networks. However, the presented tools cannot store additional(genome) annotations. Therefore, we applied semantic web technologies tostore and integrate heterogeneous annotation data sets. An increasing num-ber of widely used biological resources are already available in the RDF datamodel. There are however, no tools available that provide structural overviewsof these resources. Such structural overviews are essential to efficiently querythese resources and to assess their structural integrity and design. There-fore, in chapter 5, I present RDF2Graph, a tool that automatically recoversthe structure of an RDF resource. The generated overview enables users tocreate complex queries on these resources and to structurally validate newlycreated resources. Direct functional comparison support genotype to phenotype predictions.A prerequisite for a direct functional comparison is consistent annotation ofthe genetic elements with evidence statements. However, the standard struc-tured formats used by the public sequence databases to present genome an-notations provide limited support for data mining, hampering comparativeanalyses at large scale. To enable interoperability of genome annotations fordata mining application, we have developed the Genome Biology OntologyLanguage (GBOL) and associated infrastructure (GBOL stack), which is pre-sented in chapter 6. GBOL is provenance aware and thus provides a consistentrepresentation of functional genome annotations linked to the provenance.The provenance of a genome annotation describes the contextual details andderivation history of the process that resulted in the annotation. GBOL is mod-ular in design, extensible and linked to existing ontologies. The GBOL stackof supporting tools enforces consistency within and between the GBOL defi-nitions in the ontology. Based on GBOL, we developed the genome annotation pipeline SAPP (Se-mantic Annotation Platform with Provenance) presented in chapter 7. SAPPautomatically predicts, tracks and stores structural and functional annotationsand associated dataset- and element-wise provenance in a Linked Data for-mat, thereby enabling information mining and retrieval with Semantic Webtechnologies. This greatly reduces the administrative burden of handling mul-tiple analysis tools and versions thereof and facilitates multi-level large scalecomparative analysis. In turn this can be used to make genotype to phenotypepredictions. The development of GBOL and SAPP was done simultaneously. Duringthe development we realized that we had to constantly validated the data ex-ported to RDF to ensure coherence with the ontology. This was an extremelytime consuming process and prone to error, therefore we developed the Em-pusa code generator. Empusa is presented in chapter 8. SAPP has been successfully used to annotate 432 sequenced Pseudomonas strains and integrate the resulting annotation in a large scale functional com-parison using protein domains. This comparison is presented in chapter 9.Additionally, data from six metabolic models, nearly a thousand transcrip-tome measurements and four large scale transposon mutagenesis experimentswere integrated with the genome annotations. In this way, we linked gene es-sentiality, persistence and expression variability. This gave us insight into thediversity, versatility and evolutionary history of the Pseudomonas genus, whichcontains some important pathogens as well some useful species for bioengi-neering and bioremediation purposes. Genome annotation can be used to create GEM, which can be used to betterlink genotypes to phenotypes. Bio-Growmatch, presented in chapter 10, istool that can automatically suggest modification to improve a GEM based onphenotype data. Thereby integrating growth data into the complete processof modelling the metabolism of an organism. Chapter 11 presents a general discussion on how the chapters contributedthe central goal. After which I discuss provenance requirements for data reuseand integration. I further discuss how this can be used to further improveknowledge generation. The acquired knowledge could, in turn, be used to de-sign new experiments. The principles of the dry-lab cycle and how semantictechnologies can contribute to establish these cycles are discussed in chapter11. Finally a discussion is presented on how to apply these principles to im-prove the creation and usability of GEM’s.</p