2,940 research outputs found

    Grand Challenges of Traceability: The Next Ten Years

    Full text link
    In 2007, the software and systems traceability community met at the first Natural Bridge symposium on the Grand Challenges of Traceability to establish and address research goals for achieving effective, trustworthy, and ubiquitous traceability. Ten years later, in 2017, the community came together to evaluate a decade of progress towards achieving these goals. These proceedings document some of that progress. They include a series of short position papers, representing current work in the community organized across four process axes of traceability practice. The sessions covered topics from Trace Strategizing, Trace Link Creation and Evolution, Trace Link Usage, real-world applications of Traceability, and Traceability Datasets and benchmarks. Two breakout groups focused on the importance of creating and sharing traceability datasets within the research community, and discussed challenges related to the adoption of tracing techniques in industrial practice. Members of the research community are engaged in many active, ongoing, and impactful research projects. Our hope is that ten years from now we will be able to look back at a productive decade of research and claim that we have achieved the overarching Grand Challenge of Traceability, which seeks for traceability to be always present, built into the engineering process, and for it to have "effectively disappeared without a trace". We hope that others will see the potential that traceability has for empowering software and systems engineers to develop higher-quality products at increasing levels of complexity and scale, and that they will join the active community of Software and Systems traceability researchers as we move forward into the next decade of research

    Integration of Biological Sources: Exploring the Case of Protein Homology

    Get PDF
    Data integration is a key issue in the domain of bioin- formatics, which deals with huge amounts of heteroge- neous biological data that grows and changes rapidly. This paper serves as an introduction in the field of bioinformatics and the biological concepts it deals with, and an exploration of the integration problems a bioinformatics scientist faces. We examine ProGMap, an integrated protein homology system used by bioin- formatics scientists at Wageningen University, and several use cases related to protein homology. A key issue we identify is the huge manual effort required to unify source databases into a single resource. Un- certain databases are able to contain several possi- ble worlds, and it has been proposed that they can be used to significantly reduce initial integration efforts. We propose several directions for future work where uncertain databases can be applied to bioinformatics, with the goal of furthering the cause of bioinformatics integration

    Grand Challenges of Traceability: The Next Ten Years

    Full text link
    In 2007, the software and systems traceability community met at the first Natural Bridge symposium on the Grand Challenges of Traceability to establish and address research goals for achieving effective, trustworthy, and ubiquitous traceability. Ten years later, in 2017, the community came together to evaluate a decade of progress towards achieving these goals. These proceedings document some of that progress. They include a series of short position papers, representing current work in the community organized across four process axes of traceability practice. The sessions covered topics from Trace Strategizing, Trace Link Creation and Evolution, Trace Link Usage, real-world applications of Traceability, and Traceability Datasets and benchmarks. Two breakout groups focused on the importance of creating and sharing traceability datasets within the research community, and discussed challenges related to the adoption of tracing techniques in industrial practice. Members of the research community are engaged in many active, ongoing, and impactful research projects. Our hope is that ten years from now we will be able to look back at a productive decade of research and claim that we have achieved the overarching Grand Challenge of Traceability, which seeks for traceability to be always present, built into the engineering process, and for it to have "effectively disappeared without a trace". We hope that others will see the potential that traceability has for empowering software and systems engineers to develop higher-quality products at increasing levels of complexity and scale, and that they will join the active community of Software and Systems traceability researchers as we move forward into the next decade of research

    GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ontologies are increasingly used to structure and semantically describe entities of domains, such as genes and proteins in life sciences. Their increasing size and the high frequency of updates resulting in a large set of ontology versions necessitates efficient management and analysis of this data.</p> <p>Results</p> <p>We present GOMMA, a generic infrastructure for managing and analyzing life science ontologies and their evolution. GOMMA utilizes a generic repository to uniformly and efficiently manage ontology versions and different kinds of mappings. Furthermore, it provides components for ontology matching, and determining evolutionary ontology changes. These components are used by analysis tools, such as the Ontology Evolution Explorer (OnEX) and the detection of unstable ontology regions. We introduce the component-based infrastructure and show analysis results for selected components and life science applications. GOMMA is available at <url>http://dbs.uni-leipzig.de/GOMMA</url>.</p> <p>Conclusions</p> <p>GOMMA provides a comprehensive and scalable infrastructure to manage large life science ontologies and analyze their evolution. Key functions include a generic storage of ontology versions and mappings, support for ontology matching and determining ontology changes. The supported features for analyzing ontology changes are helpful to assess their impact on ontology-dependent applications such as for term enrichment. GOMMA complements OnEX by providing functionalities to manage various versions of mappings between two ontologies and allows combining different match approaches.</p

    Using conceptual modeling to improve genome data management

    Full text link
    [EN] With advances in genomic sequencing technology, a large amount of data is publicly available for the research community to extract meaningful and reliable associations among risk genes and the mechanisms of disease. However, this exponential growth of data is spread in over thousand heterogeneous repositories, represented in multiple formats and with different levels of quality what hinders the differentiation of clinically valid relationships from those that are less well-sustained and that could lead to wrong diagnosis. This paper presents how conceptual models can play a key role to efficiently manage genomic data. These data must be accessible, informative and reliable enough to extract valuable knowledge in the context of the identification of evidence supporting the relationship between DNA variants and disease. The approach presented in this paper provides a solution that help researchers to organize, store and process information focusing only on the data that are relevant and minimizing the impact that the information overload has in clinical and research contexts. A case-study (epilepsy) is also presented, to demonstrate its application in a real context.Spanish State Research Agency and the Generalitat Valenciana under the projects TIN2016-80811-P and PROMETEO/2018/176; ERDF.Pastor López, O.; León-Palacio, A.; Reyes Román, JF.; García-Simón, A.; Casamayor Rodenas, JC. (2020). Using conceptual modeling to improve genome data management. Briefings in Bioinformatics. 22(1):45-54. https://doi.org/10.1093/bib/bbaa100S4554221McCombie, W. R., McPherson, J. D., & Mardis, E. R. (2018). Next-Generation Sequencing Technologies. Cold Spring Harbor Perspectives in Medicine, 9(11), a036798. doi:10.1101/cshperspect.a036798Condit, C. M., Achter, P. J., Lauer, I., & Sefcovic, E. (2001). The changing meanings of ?mutation:? A contextualized study of public discourse. Human Mutation, 19(1), 69-75. doi:10.1002/humu.10023Karki, R., Pandya, D., Elston, R. C., & Ferlini, C. (2015). Defining «mutation» and «polymorphism» in the era of personal genomics. BMC Medical Genomics, 8(1). doi:10.1186/s12920-015-0115-zHamid, J. S., Hu, P., Roslin, N. M., Ling, V., Greenwood, C. M. T., & Beyene, J. (2009). Data Integration in Genetics and Genomics: Methods and Challenges. Human Genomics and Proteomics, 1(1). doi:10.4061/2009/869093Baudhuin, L. M., Biesecker, L. G., Burke, W., Green, E. D., & Green, R. C. (2019). Predictive and Precision Medicine with Genomic Data. Clinical Chemistry, 66(1), 33-41. doi:10.1373/clinchem.2019.304345Amaral, G., & Guizzardi, G. (2019). On the Application of Ontological Patterns for Conceptual Modeling in Multidimensional Models. Lecture Notes in Computer Science, 215-231. doi:10.1007/978-3-030-28730-6_14Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., … Sherlock, G. (2000). Gene Ontology: tool for the unification of biology. Nature Genetics, 25(1), 25-29. doi:10.1038/75556Eilbeck, K., Lewis, S. E., Mungall, C. J., Yandell, M., Stein, L., Durbin, R., & Ashburner, M. (2005). Genome Biology, 6(5), R44. doi:10.1186/gb-2005-6-5-r44Vihinen, M. (2013). Variation Ontology for annotation of variation effects and mechanisms. Genome Research, 24(2), 356-364. doi:10.1101/gr.157495.113Köhler, S., Carmody, L., Vasilevsky, N., Jacobsen, J. O. B., Danis, D., Gourdine, J.-P., … McMurry, J. A. (2018). Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Research, 47(D1), D1018-D1027. doi:10.1093/nar/gky1105Proceedings of the Eleventh International Conference on Data Engineering. (1995). Proceedings of the Eleventh International Conference on Data Engineering. doi:10.1109/icde.1995.380416Okayama, T., Tamura, T., Gojobori, T., Tateno, Y., Ikeo, K., Miyazaki, S., … Sugawara, H. (1998). Formal design and implementation of an improved DDBJ DNA database with a new schema and object-oriented library. Bioinformatics, 14(6), 472-478. doi:10.1093/bioinformatics/14.6.472Medigue, C., Rechenmann, F., Danchin, A., & Viari, A. (1999). Imagene: an integrated computer environment for sequence annotation and analysis. Bioinformatics, 15(1), 2-15. doi:10.1093/bioinformatics/15.1.2Paton, N. W., Khan, S. A., Hayes, A., Moussouni, F., Brass, A., Eilbeck, K., … Oliver, S. G. (2000). Conceptual modelling of genomic information. Bioinformatics, 16(6), 548-557. doi:10.1093/bioinformatics/16.6.548Vihinen, M., Hancock, J. M., Maglott, D. R., Landrum, M. J., Schaafsma, G. C. P., & Taschner, P. (2016). Human Variome Project Quality Assessment Criteria for Variation Databases. Human Mutation, 37(6), 549-558. doi:10.1002/humu.22976Fleuren, W. W. M., & Alkema, W. (2015). Application of text mining in the biomedical domain. Methods, 74, 97-106. doi:10.1016/j.ymeth.2015.01.015Salzberg, S. L. (2007). Genome re-annotation: a wiki solution? Genome Biology, 8(1). doi:10.1186/gb-2007-8-1-102Rigden, D. J., & Fernández, X. M. (2018). The 26th annual Nucleic Acids Research database issue and Molecular Biology Database Collection. Nucleic Acids Research, 47(D1), D1-D7. doi:10.1093/nar/gky1267Reyes Román, J. F., García, A., Rueda, U., & Pastor, Ó. (2019). GenesLove.Me 2.0: Improving the Prioritization of Genetic Variations. Evaluation of Novel Approaches to Software Engineering, 314-333. doi:10.1007/978-3-030-22559-9_14Richards, S., Aziz, N., Bale, S., Bick, D., Das, S., … Rehm, H. L. (2015). Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in Medicine, 17(5), 405-423. doi:10.1038/gim.2015.30Kelly, M. A., Caleshu, C., Morales, A., Buchan, J., Wolf, Z., … Funke, B. (2018). Adaptation and validation of the ACMG/AMP variant classification framework for MYH7-associated inherited cardiomyopathies: recommendations by ClinGen’s Inherited Cardiomyopathy Expert Panel. Genetics in Medicine, 20(3), 351-359. doi:10.1038/gim.2017.21

    Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains

    Full text link
    Biomedical taxonomies, thesauri and ontologies in the form of the International Classification of Diseases (ICD) as a taxonomy or the National Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in acquiring, representing and processing information about human health. With increasing adoption and relevance, biomedical ontologies have also significantly increased in size. For example, the 11th revision of the ICD, which is currently under active development by the WHO contains nearly 50,000 classes representing a vast variety of different diseases and causes of death. This evolution in terms of size was accompanied by an evolution in the way ontologies are engineered. Because no single individual has the expertise to develop such large-scale ontologies, ontology-engineering projects have evolved from small-scale efforts involving just a few domain experts to large-scale projects that require effective collaboration between dozens or even hundreds of experts, practitioners and other stakeholders. Understanding how these stakeholders collaborate will enable us to improve editing environments that support such collaborations. We uncover how large ontology-engineering projects, such as the ICD in its 11th revision, unfold by analyzing usage logs of five different biomedical ontology-engineering projects of varying sizes and scopes using Markov chains. We discover intriguing interaction patterns (e.g., which properties users subsequently change) that suggest that large collaborative ontology-engineering projects are governed by a few general principles that determine and drive development. From our analysis, we identify commonalities and differences between different projects that have implications for project managers, ontology editors, developers and contributors working on collaborative ontology-engineering projects and tools in the biomedical domain.Comment: Published in the Journal of Biomedical Informatic

    Vocabulary Evolution on the Semantic Web: From Changes to Evolution of Vocabularies and its Impact on the Data

    Get PDF
    The main objective of the Semantic Web is to provide data on the web well-defined meaning. Vocabularies are used for modeling data in the web, provide a shared understanding of a domain and consist of a collection of types and properties. These types and properties are so-called terms. A vocabulary can import terms from other vocabularies, and data publishers use vocabulary terms for modeling data. Importing terms via vocabularies results in a Network of Linked vOcabularies (NeLO). Vocabularies are subject to change during their lifetime. When vocabularies change, the published data become a problem if they are not updated based on these changes. So far, there has been no study that analyzes vocabulary changes over time. Furthermore, it is unknown how data publishers reflect on such vocabulary changes. Ontology engineers and data publishers may not be aware of the changes in the vocabulary terms that have already happened since they occur rather rarely. This work addresses the problem of vocabulary changes and their impact on other vocabularies and the published data. We analyzed the changes of vocabularies and their reuse. We selected the most dominant vocabularies, based on their use by data publishers. Additionally, we analyzed the changes of 994 vocabularies. Furthermore, we analyzed various vocabularies to better understand by whom and how they are used in the modeled data, and how these changes are adopted in the Linked Open Data cloud. We computed the state of the NeLO from the available versions of vocabularies for over 17 years. We analyzed the static parameters of the NeLO such as its size, density, average degree, and the most important vocabularies at certain points in time. We further investigated how NeLO changes over time, specifically measuring the impact of a change in one vocabulary on others, how the reuse of terms changes, and the importance of vocabulary changes. Our results show that the vocabularies are highly static, and many of the changes occurred in annotation properties. Additionally, 16% of the existing terms are reused by other vocabularies, and some of the deprecated and deleted terms are still reused. Furthermore, most of the newly coined terms are adopted immediately. Our results show that even if the change frequency of terms is rather low, it can have a high impact on the data due to a large amount of data on the web. Moreover, due to a large number of vocabularies in the NeLO, and therefore the increase of available terms, the percentage of imported terms compared with the available ones has decreased over time. Additionally, based on the scores of the average number of exports for the vocabularies in the NeLO, some vocabularies have become more popular over time. Overall, understanding the evolution of vocabulary terms is important for ontology engineers and data publishers to avoid wrong assumptions about the data published on the web. Furthermore, it may foster a better understanding of the impact of the changes in vocabularies and how they are adopted to possibly learn from previous experience. Our results provide for the first time in-depth insights into the structure and evolution of the NeLO. Supported by proper tools exploiting the analysis of this thesis, it may help ontology engineers to identify data modeling shortcomings and assess the dependencies implied by the reusing of a specific vocabulary.Das Hauptziel des Semantic Web ist es, den Daten im Web eine klar definierte Bedeutung zu geben. Vokabulare werden zum Modellieren von Daten im Web verwendet. Vokabulare vermitteln ein gemeinsames Verständnis einer Domäne und bestehen aus einer Sammlung von Typen und Eigenschaften. Diese Typen und Eigenschaften sind sogenannte Begriffe. Ein Vokabular kann Begriffe aus anderen Vokabularen importieren, und Datenverleger verwenden die Begriffe der Vokabulare zum Modellieren von Daten. Durch das Importieren von Begriffen entsteht ein Netzwerk verknüpfter Vokabulare (NeLO). Vokabulare können sich im Laufe der Zeit ändern. Wenn sich Vokabulare ändern, kann dies zu Problemen mit bereits veröffentlichten Daten führen, falls diese nicht entsprechend angepasst werden. Bisher gibt es keine Studie, die die Veränderung der Vokabulare im Laufe der Zeit analysiert. Darüber hinaus ist nicht bekannt, inwiefern bereits veröffentlichte Daten an diese Veränderungen angepasst werden. Verantwortliche für Ontologien und Daten sind sich möglicherweise der Änderungen in den Vokabularen nicht bewusst, da solche Änderungen eher selten vorkommen. Diese Arbeit befasst sich mit dem Problem der Änderung von Vokabularen und deren Auswirkung auf andere Vokabulare sowie die Daten. Wir analysieren die Änderung von Vokabularen und deren Wiederverwendung. Für unsere Analyse haben wir diejenigen Vokabulare ausgewählt, die am häufigsten verwendet werden. Zusätzlich analysieren wir die Änderungen von 994 Vokabularen aus dem Verzeichnis "Linked Open Vocabulary". Wir analysieren die Vokabulare, um zu verstehen, von wem und wie sie in den modellierten Daten verwendet werden und inwiefern Änderungen in die Linked Open Data Cloud übernommen werden. Wir beobachten den Status von NeLO aus den verfügbaren Versionen der Vokabulare über einen Zeitraum von 17 Jahren. Wir analysieren statische Parameter von NeLO wie Größe, Dichte, Durchschnittsgrad und die wichtigsten Vokabulare zu bestimmten Zeitpunkten. Wir untersuchen weiter, wie sich NeLO mit der Zeit ändert. Insbesondere messen wir die Auswirkung einer Änderung in einem Vokabular auf andere, wie sich die Wiederverwendung von Begriffen ändert und wie wichtig Änderungen im Vokabular sind. Unsere Ergebnisse zeigen, dass die Vokabulare sehr statisch sind und viele Änderungen an sogenannten Annotations-Properties vorgenommen wurden. Darüber hinaus werden 16% der vorhandenen Begriffen von anderen Vokabularen wiederverwendet, und einige der veralteten und gelöschten Begriffe werden weiterhin wiederverwendet. Darüber hinaus werden die meisten neu erstellten Begriffe unmittelbar verwendet. Unsere Ergebnisse zeigen, dass selbst wenn die Häufigkeit von Änderungen an Vokabularen eher gering ist, so kann dies aufgrund der großen Datenmenge im Web erhebliche Auswirkungen haben. Darüber hinaus hat sich aufgrund einer großen Anzahl von Vokabularen in NeLO und damit der Zunahme der verfügbaren Begriffe der Prozentsatz der importierten Begriffe im Vergleich zu den verfügbaren Begriffen im Laufe der Zeit verringert. Basierend auf den Ergebnissen der durchschnittlichen Anzahl von Exporten für die Vokabulare in NeLO sind einige Vokabulare im Laufe der Zeit immer beliebter geworden. Insgesamt ist es für Verantwortliche für Ontologien und Daten wichtig, die Entwicklung der Vokabulare zu verstehen, um falsche Annahmen über die im Web veröffentlichten Daten zu vermeiden. Darüber hinaus ermöglichen unsere Ergebnisse ein besseres Verständnis der Auswirkungen von Änderungen in Vokabularen, sowie deren Nachnutzung, um möglicherweise aus früheren Erfahrungen zu lernen. Unsere Ergebnisse bieten erstmals detaillierte Einblicke in die Struktur und Entwicklung des Netzwerks der verknüpften Vokabularen. Unterstützt von geeigneten Tools für die Analyse in dieser Arbeit, kann es Verantwortlichen für Ontologien helfen, Mängel in der Datenmodellierung zu identifizieren und Abhängigkeiten zu bewerten, die durch die Wiederverwendung eines bestimmten Vokabulars entstehenden

    Ontology evolution: a process-centric survey

    Get PDF
    Ontology evolution aims at maintaining an ontology up to date with respect to changes in the domain that it models or novel requirements of information systems that it enables. The recent industrial adoption of Semantic Web techniques, which rely on ontologies, has led to the increased importance of the ontology evolution research. Typical approaches to ontology evolution are designed as multiple-stage processes combining techniques from a variety of fields (e.g., natural language processing and reasoning). However, the few existing surveys on this topic lack an in-depth analysis of the various stages of the ontology evolution process. This survey extends the literature by adopting a process-centric view of ontology evolution. Accordingly, we first provide an overall process model synthesized from an overview of the existing models in the literature. Then we survey the major approaches to each of the steps in this process and conclude on future challenges for techniques aiming to solve that particular stage
    corecore