    Towards Semantic APIs for Research Data Services

    Die schnelle Entwicklung der Internet- und Web-Technologie verändert den Stand der Technik in der Kommunikation von Wissen oder  Forschungsergebnissen. Insbesondere werden semantische Technologien, verknüpfte und offene Daten zu entscheidenden Faktoren für einen  erfolgreichen und effizienten Forschungsfortschritt. Zuerst definiere ich den Research Data Service (RDS) und diskutiere typische aktuelle  und mögliche zukünftige Nutzungsszenarien mit RDS. Darüber hinaus bespreche ich den Stand der Technik in den Bereichen semantische Dienstleistung und Datenanmerkung und API-Konstruktion sowie infrastrukturelle Lösungen, die für die RDS-Realisierung anwendbar sind. Zum Schluss werden noch innovative Methoden der Online-Verbreitung, Förderung und effizienten Kommunikation der Forschung diskutiert.Rapid development of Internet and Web technology is changing the state of the art in communication of knowledge, or results of research activities. Particularly, Semantic technology, linked and open data become key enablers for successful and efficient progress in research. At first, I define the research data service (RDS) and discuss typical current and possible future usage scenarios involving RDS. Further, I discuss the state of the art in the areas of semantic service and data annotation and API construction, as well as infrastructural solutions, applicable for RDS realisation. At last, innovative methods of online dissemination, promotion and efficient communication of research are discussed

    Publicações ampliadas : aspectos da integração de dados de pesquisa

    Esta pesquisa teve como objetivo principal propor um panorama de integração entre repositórios de dados de pesquisa e publicações científicas baseado no modelo de publicações ampliadas. Do ponto de vista de sua natureza, é básica. Segundo a abordagem é considerada qualitativa, e caracteriza-se por ser uma pesquisa exploratória e bibliográfica. O modelo de análise deste estudo foi composto por três fases. Na primeira fase de análise, foram coletadas literaturas específicas da área por meio da revisão sistemática e, a partir das literaturas, foram extraídos os requisitos indicados e sugeridos para um modelo de publicação ampliada. Na segunda fase de análise, foram selecionados os repositórios de dados de pesquisa através do diretório re3data.org e averiguadas as informações disponibilizadas por meio dos filtros que demonstrassem a utilização de ligações baseadas no modelo de publicação ampliada. Na terceira fase de análise, foi realizada a inspeção das interfaces e identificados os elementos que cada repositório contempla para posteriormente iniciar as análises, observações e comparações dos elementos utilizados nos repositórios com os requisitos estabelecidos por meio da revisão sistemática. Percebeu-se na literatura que há uma tentativa de adicionar uma estrutura utilizando-se requisitos relacionados a metadados, relacionamentos semânticos, entre outros. Verificou-se que parte dessa estrutura indicada na literatura já encontra-se nas interfaces de alguns repositórios identificados no diretório re3data.org. No entanto estes repositórios que contemplam parte da estrutura indicada foram caracterizados como uma minoria. Desta forma, verificou-se uma necessidade crescente, de uma infraestrutura compatível com as publicações ampliadas, tornando-se de extrema importância a utilização de requisitos para as interfaces dos repositórios de dados de pesquisa O cruzamento de informações realizado neste estudo permitiu um maior entendimento do quanto é importante o uso de cada um dos requisitos estipulados, como também foi possível compreender as consequências caso estes requisitos não estejam contemplados nas interfaces de repositórios. Os repositórios de dados de pesquisa deveriam fornecer em suas interfaces as ligações e o acesso às respectivas publicações científicas dos dados registrados para que fosse possível compreender os estudos realizados a partir destes dados. Foi comprovado que esse processo não ocorre na grande parte dos repositórios, o que dificulta a compreensão sobre a origem e o contexto em que foram utilizados esses dados. Percebe-se que os repositórios que não estavam de acordo com os principais requisitos estabelecidos apresentavam características de publicações convencionais como, por exemplo, documentos em PDF, textos em HTML e que não estavam de acordo com padrões da web semântica. Esta proposta de um panorama de integração de dados de pesquisa baseado em publicações ampliadas visa sugerir uma interface que contemple todos os requisitos especificados nesta pesquisa para que se tenha uma infraestrutura compatível com as publicações ampliadas. A partir da investigação e do aprofundamento sobre o tema, foi possível ampliar o conhecimento e detectar os elementos apropriados para propor um panorama de integração entre um repositório de dados de pesquisas e publicações científicas baseando-se em um modelo de publicação ampliada, como também destacar as vantagens da utilização deste modelo. Considera-se que um conjunto de dados disponíveis em um repositório digital, que contemple os requisitos, melhora a recuperação da informação e, consequentemente, aumenta a visibilidade da publicação e de autores.This research had the main objective to propose an integration landscape between research data repositories and scientific publications based on the model for enhanced publications. From its nature`s point of view, it is basic. According to the approach it is considered a qualitative, is characterized as an exploratory, bibliographic research and a case study as well. The analysis model of this study was constituted of three phases. In first phase of the analyses, specific literatures were colected through the systematic review and from literatures, in addition the suggested requirements and indicated for a model for enhanced publications were extracted, in addition the suggested requirements and indicated for an enhanced publication model were extract. In second phase of the analyses, research data repositories were select through the re3data.org directory and were verified the information provided through the filters which demonstrated the application of links based on the model for enhanced publications. In the third and last phase of the analyses, an inspection of interfaces were realized, and also elements that each repository comprehend were identified to subsequently initiate the analyses, observations and comparisions of the elements used on repositories with the estabilished requirements through systematic review. It was perceived in the literature, that there are an attempt to add a structure by using related requirements to metadata, semantic relationship, among others. Also verified that part of this estructure indicated on literature is available on interfaces of some identified repositories on re3data.org. Although these repositories which comprehend part of the indicated structure were characterized as a minority.( Continue) Therefore, there has been a growing need for an infrastructure compatible with the enhanced publications, aware of the importance for the application of requirements to the interfaces of research data repositories. The crossover of informations realized in this study allowed a greater understanding about the importance to use each one of the stipulated requirements, as well as was possible to understand the consequences in case of these requirements are not be included in the repositories interfaces. Research data repositories should provide on their interfaces the links and access to their respective scientific publications of the recorded data in order to be able to understand the studies carried out from this data. It has been proven that this process does not occur in most repositories, which difficults to understand the origin and context in which these data were used. It was noticed that repositories not in agreement with the main presented established requirements characteristics of conventional publications, as for example PDF documents, HTML texts and which was not in accordance to patterns for semantic web. This proposal for a research data integration model based on enhanced publications aims to suggest an interface that that fulfills all the specified requirements in this research to obtain an infrastructure compatible to enhanced publications. From the investigation and deepening the theme, it was possible to increase the knowledge and to detect the appropriate elements to propose an integration model between a researches data repository and scientific publications based on a model for enhanced publication as well as highlighting the advantages of applying this model. Thus, it is concluded that a set of available data into a digital repository that comprehend the requirements improves the information recovery and, therefore, increases the visibility of publication and authors

    Content Enrichment of Digital Libraries: Methods, Technologies and Implementations

    Parallel to the establishment of the concept of a "digital library", there have been rapid developments in the fields of semantic technologies, information retrieval and artificial intelligence. The idea is to use make use of these three fields to crosslink bibliographic data, i.e., library content, and to enrich it "intelligently" with additional, especially non-library, information. By linking the contents of a library, it is possible to offer users access to semantically similar contents of different digital libraries. For instance, a list of semantically similar publications from completely different subject areas and from different digital libraries can be made accessible. In addition, the user is able to see a wider profile about authors, enriched with information such as biographical details, name alternatives, images, job titles, institute affiliations, etc. This information comes from a wide variety of sources, most of which are not library sources. In order to make such scenarios a reality, this dissertation follows two approaches. The first approach is about crosslinking digital library content in order to offer semantically similar publications based on additional information for a publication. Hence, this approach uses publication-related metadata as a basis. The aligned terms between linked open data repositories/thesauri are considered as an important starting point by considering narrower, broader and related concepts through semantic data models such as SKOS. Information retrieval methods are applied to identify publications with high semantic similarity. For this purpose, approaches of vector space models and "word embedding" are applied and analyzed comparatively. The analyses are performed in digital libraries with different thematic focuses (e.g. economy and agriculture). Using machine learning techniques, metadata is enriched, e.g. with synonyms for content keywords, in order to further improve similarity calculations. To ensure quality, the proposed approaches will be analyzed comparatively with different metadata sets, which will be assessed by experts. Through the combination of different information retrieval methods, the quality of the results can be further improved. This is especially true when user interactions offer possibilities for adjusting the search properties. In the second approach, which this dissertation pursues, author-related data are harvested in order to generate a comprehensive author profile for a digital library. For this purpose, non-library sources, such as linked data repositories (e.g. WIKIDATA) and library sources, such as authority data, are used. If such different sources are used, the disambiguation of author names via the use of already existing persistent identifiers becomes necessary. To this end, we offer an algorithmic approach to disambiguate authors, which makes use of authority data such as the Virtual International Authority File (VIAF). Referring to computer sciences, the methodological value of this dissertation lies in the combination of semantic technologies with methods of information retrieval and artificial intelligence to increase the interoperability between digital libraries and between libraries with non-library sources. By positioning this dissertation as an application-oriented contribution to improve the interoperability, two major contributions are made in the context of digital libraries: (1) The retrieval of information from different Digital Libraries can be made possible via a single access. (2) Existing information about authors is collected from different sources and aggregated into one author profile.Parallel zur Etablierung des Konzepts einer „Digitalen Bibliothek“ gab es rasante Weiterentwicklungen in den Bereichen semantischer Technologien, Information Retrieval und künstliche Intelligenz. Die Idee ist es, mit ihrer Hilfe bibliographische Daten, also Inhalte von Bibliotheken, miteinander zu vernetzen und „intelligent“ mit zusätzlichen, insbesondere nicht-bibliothekarischen Informationen anzureichern. Durch die Verknüpfung von Inhalten einer Bibliothek wird es möglich, einen Zugang für Benutzer*innen anzubieten, über den semantisch ähnliche Inhalte unterschiedlicher Digitaler Bibliotheken zugänglich werden. Beispielsweise können hierüber ausgehend von einer bestimmten Publikation eine Liste semantisch ähnlicher Publikationen ggf. aus völlig unterschiedlichen Themenfeldern und aus verschiedenen digitalen Bibliotheken zugänglich gemacht werden. Darüber hinaus können sich Nutzer*innen ein breiteres Autoren-Profil anzeigen lassen, das mit Informationen wie biographischen Angaben, Namensalternativen, Bildern, Berufsbezeichnung, Instituts-Zugehörigkeiten usw. angereichert ist. Diese Informationen kommen aus unterschiedlichsten und in der Regel nicht-bibliothekarischen Quellen. Um derartige Szenarien Realität werden zu lassen, verfolgt diese Dissertation zwei Ansätze. Der erste Ansatz befasst sich mit der Vernetzung von Inhalten Digitaler Bibliotheken, um auf Basis zusätzlicher Informationen für eine Publikation semantisch ähnliche Publikationen anzubieten. Dieser Ansatz verwendet publikationsbezogene Metadaten als Grundlage. Die verknüpften Begriffe zwischen verlinkten offenen Datenrepositorien/Thesauri werden als wichtiger Angelpunkt betrachtet, indem Unterbegriffe, Oberbegriffe und verwandten Konzepte über semantische Datenmodelle, wie SKOS, berücksichtigt werden. Methoden des Information Retrieval werden angewandt, um v.a. Publikationen mit hoher semantischer Verwandtschaft zu identifizieren. Zu diesem Zweck werden Ansätze des Vektorraummodells und des „Word Embedding“ eingesetzt und vergleichend analysiert. Die Analysen werden in Digitalen Bibliotheken mit unterschiedlichen thematischen Schwerpunkten (z.B. Wirtschaft und Landwirtschaft) durchgeführt. Durch Techniken des maschinellen Lernens werden hierfür Metadaten angereichert, z.B. mit Synonymen für inhaltliche Schlagwörter, um so Ähnlichkeitsberechnungen weiter zu verbessern. Zur Sicherstellung der Qualität werden die beiden Ansätze mit verschiedenen Metadatensätzen vergleichend analysiert wobei die Beurteilung durch Expert*innen erfolgt. Durch die Verknüpfung verschiedener Methoden des Information Retrieval kann die Qualität der Ergebnisse weiter verbessert werden. Dies trifft insbesondere auch dann zu wenn Benutzerinteraktion Möglichkeiten zur Anpassung der Sucheigenschaften bieten. Im zweiten Ansatz, den diese Dissertation verfolgt, werden autorenbezogene Daten gesammelt, verbunden mit dem Ziel, ein umfassendes Autorenprofil für eine Digitale Bibliothek zu generieren. Für diesen Zweck kommen sowohl nicht-bibliothekarische Quellen, wie Linked Data-Repositorien (z.B. WIKIDATA) und als auch bibliothekarische Quellen, wie Normdatensysteme, zum Einsatz. Wenn solch unterschiedliche Quellen genutzt werden, wird die Disambiguierung von Autorennamen über die Nutzung bereits vorhandener persistenter Identifikatoren erforderlich. Hierfür bietet sich ein algorithmischer Ansatz für die Disambiguierung von Autoren an, der Normdaten, wie die des Virtual International Authority File (VIAF) nachnutzt. Mit Bezug zur Informatik liegt der methodische Wert dieser Dissertation in der Kombination von semantischen Technologien mit Verfahren des Information Retrievals und der künstlichen Intelligenz zur Erhöhung von Interoperabilität zwischen Digitalen Bibliotheken und zwischen Bibliotheken und nicht-bibliothekarischen Quellen. Mit der Positionierung dieser Dissertation als anwendungsorientierter Beitrag zur Verbesserung von Interoperabilität werden zwei wesentliche Beiträge im Kontext Digitaler Bibliotheken geleistet: (1) Die Recherche nach Informationen aus unterschiedlichen Digitalen Bibliotheken kann über einen Zugang ermöglicht werden. (2) Vorhandene Informationen über Autor*innen werden aus unterschiedlichsten Quellen eingesammelt und zu einem Autorenprofil aggregiert

    A Vision for Open Cyber-Scholarly Infrastructures

    The characteristics of modern science, i.e., data-intensive, multidisciplinary, open, and heavily dependent on Internet technologies, entail the creation of a linked scholarly record that is online and open. Instrumental in making this vision happen is the development of the next generation of Open Cyber-Scholarly Infrastructures (OCIs), i.e., enablers of an open, evolvable, and extensible scholarly ecosystem. The paper delineates the evolving scenario of the modern scholarly record and describes the functionality of future OCIs as well as the radical changes in scholarly practices including new reading, learning, and information-seeking practices enabled by OCIs

    The characteristics of modern science, i.e., data-intensive, multidisciplinary, open, and heavily dependent on Internet technologies, entail the creation of a linked scholarly record that is online and open. Instrumental in making this vision happen is the development of the next generation of Open Cyber-Scholarly Infrastructures (OCIs), i.e., enablers of an open, evolvable, and extensible scholarly ecosystem. The paper delineates the evolving scenario of the modern scholarly record and describes the functionality of future OCIs as well as the radical changes in scholarly practices including new reading, learning, and information-seeking practices enabled by OCIs