19,977 research outputs found

    Distributed human computation framework for linked data co-reference resolution

    No full text
    Distributed Human Computation (DHC) is a technique used to solve computational problems by incorporating the collaborative effort of a large number of humans. It is also a solution to AI-complete problems such as natural language processing. The Semantic Web with its root in AI is envisioned to be a decentralised world-wide information space for sharing machine-readable data with minimal integration costs. There are many research problems in the Semantic Web that are considered as AI-complete problems. An example is co-reference resolution, which involves determining whether different URIs refer to the same entity. This is considered to be a significant hurdle to overcome in the realisation of large-scale Semantic Web applications. In this paper, we propose a framework for building a DHC system on top of the Linked Data Cloud to solve various computational problems. To demonstrate the concept, we are focusing on handling the co-reference resolution in the Semantic Web when integrating distributed datasets. The traditional way to solve this problem is to design machine-learning algorithms. However, they are often computationally expensive, error-prone and do not scale. We designed a DHC system named iamResearcher, which solves the scientific publication author identity co-reference problem when integrating distributed bibliographic datasets. In our system, we aggregated 6 million bibliographic data from various publication repositories. Users can sign up to the system to audit and align their own publications, thus solving the co-reference problem in a distributed manner. The aggregated results are published to the Linked Data Cloud

    Mejorando la Ciencia Abierta Usando Datos Abiertos Enlazados: Caso de Uso CONICET Digital

    Get PDF
    Los servicios de publicación científica están cambiando drásticamente, los investigadores demandan servicios de búsqueda inteligentes para descubrir y relacionar publicaciones científicas. Los editores deben incorporar información semántica para organizar mejor sus activos digitales y hacer que las publicaciones sean más visibles. En este documento, presentamos el trabajo en curso para publicar un subconjunto de publicaciones científicas de CONICET Digital como datos abiertos enlazados. El objetivo de este trabajo es mejorar la recuperación y la reutilización de datos a través de tecnologías de Web Semántica y Datos Enlazados en el dominio de las publicaciones científicas. Para lograr estos objetivos, se han tenido en cuenta los estándares de la Web Semántica y los esquemas RDF (Dublín Core, FOAF, VoID, etc.). El proceso de conversión y publicación se basa en las pautas metodológicas para publicar datos vinculados de gobierno. También describimos como estos datos se pueden vincular a otros conjuntos de datos como DBLP, Wikidata y DBPedia. Finalmente, mostramos algunos ejemplos de consultas que responden a preguntas que inicialmente no permite CONICET Digital.Scientific publication services are changing drastically, researchers demand intelligent search services to discover and relate scientific publications. Publishersneed to incorporate semantic information to better organize their digital assets and make publications more discoverable. In this paper, we present the on-going work to publish a subset of scientific publications of CONICET Digital as Linked Open Data. The objective of this work is to improve the recovery andreuse of data through Semantic Web technologies and Linked Data in the domain of scientific publications.To achieve these goals, Semantic Web standards and reference RDF schema?s have been taken into account (Dublin Core, FOAF, VoID, etc.). The conversion and publication process is guided by the methodological guidelines for publishing government linked data. We also outline how these data can be linked to other datasets DBLP, WIKIDATA and DBPEDIA on the web of data. Finally, we show some examples of queries that answer questions that initially CONICET Digital does not allowFil: Zárate, Marcos Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Centro Nacional Patagónico. Centro para el Estudio de Sistemas Marinos; ArgentinaFil: Carlos Buckle. Universidad Nacional de la Patagonia "San Juan Bosco"; ArgentinaFil: Mazzanti, Renato. Universidad Nacional de la Patagonia "San Juan Bosco"; ArgentinaFil: Samec, Gustavo Daniel. Universidad Nacional de la Patagonia "San Juan Bosco"; Argentin

    Community next steps for making globally unique identifiers work for biocollections data

    Get PDF
    Biodiversity data is being digitized and made available online at a rapidly increasing rate but current practices typically do not preserve linkages between these data, which impedes interoperation, provenance tracking, and assembly of larger datasets. For data associated with biocollections, the biodiversity community has long recognized that an essential part of establishing and preserving linkages is to apply globally unique identifiers at the point when data are generated in the field and to persist these identifiers downstream, but this is seldom implemented in practice. There has neither been coalescence towards one single identifier solution (as in some other domains), nor even a set of recommended best practices and standards to support multiple identifier schemes sharing consistent responses. In order to further progress towards a broader community consensus, a group of biocollections and informatics experts assembled in Stockholm in October 2014 to discuss community next steps to overcome current roadblocks. The workshop participants divided into four groups focusing on: identifier practice in current field biocollections; identifier application for legacy biocollections; identifiers as applied to biodiversity data records as they are published and made available in semantically marked-up publications; and cross-cutting identifier solutions that bridge across these domains. The main outcome was consensus on key issues, including recognition of differences between legacy and new biocollections processes, the need for identifier metadata profiles that can report information on identifier persistence missions, and the unambiguous indication of the type of object associated with the identifier. Current identifier characteristics are also summarized, and an overview of available schemes and practices is provided

    Towards OpenMath Content Dictionaries as Linked Data

    Full text link
    "The term 'Linked Data' refers to a set of best practices for publishing and connecting structured data on the web". Linked Data make the Semantic Web work practically, which means that information can be retrieved without complicated lookup mechanisms, that a lightweight semantics enables scalable reasoning, and that the decentral nature of the Web is respected. OpenMath Content Dictionaries (CDs) have the same characteristics - in principle, but not yet in practice. The Linking Open Data movement has made a considerable practical impact: Governments, broadcasting stations, scientific publishers, and many more actors are already contributing to the "Web of Data". Queries can be answered in a distributed way, and services aggregating data from different sources are replacing hard-coded mashups. However, these services are currently entirely lacking mathematical functionality. I will discuss real-world scenarios, where today's RDF-based Linked Data do not quite get their job done, but where an integration of OpenMath would help - were it not for certain conceptual and practical restrictions. I will point out conceptual shortcomings in the OpenMath 2 specification and common bad practices in publishing CDs and then propose concrete steps to overcome them and to contribute OpenMath CDs to the Web of Data.Comment: Presented at the OpenMath Workshop 2010, http://cicm2010.cnam.fr/om

    The INCF Digital Atlasing Program: Report on Digital Atlasing Standards in the Rodent Brain

    Get PDF
    The goal of the INCF Digital Atlasing Program is to provide the vision and direction necessary to make the rapidly growing collection of multidimensional data of the rodent brain (images, gene expression, etc.) widely accessible and usable to the international research community. This Digital Brain Atlasing Standards Task Force was formed in May 2008 to investigate the state of rodent brain digital atlasing, and formulate standards, guidelines, and policy recommendations.

Our first objective has been the preparation of a detailed document that includes the vision and specific description of an infrastructure, systems and methods capable of serving the scientific goals of the community, as well as practical issues for achieving
the goals. This report builds on the 1st INCF Workshop on Mouse and Rat Brain Digital Atlasing Systems (Boline et al., 2007, _Nature Preceedings_, doi:10.1038/npre.2007.1046.1) and includes a more detailed analysis of both the current state and desired state of digital atlasing along with specific recommendations for achieving these goals

    A Molecular Biology Database Digest

    Get PDF
    Computational Biology or Bioinformatics has been defined as the application of mathematical and Computer Science methods to solving problems in Molecular Biology that require large scale data, computation, and analysis [18]. As expected, Molecular Biology databases play an essential role in Computational Biology research and development. This paper introduces into current Molecular Biology databases, stressing data modeling, data acquisition, data retrieval, and the integration of Molecular Biology data from different sources. This paper is primarily intended for an audience of computer scientists with a limited background in Biology

    A web-based teaching/learning environment to support collaborative knowledge construction in design

    Get PDF
    A web-based application has been developed as part of a recently completed research which proposed a conceptual framework to collect, analyze and compare different design experiences and to construct structured representations of the emerging knowledge in digital architectural design. The paper introduces the theoretical and practical development of this application as a teaching/learning environment which has significantly contributed to the development and testing of the ideas developed throughout the research. Later in the paper, the application of BLIP in two experimental (design) workshops is reported and evaluated according to the extent to which the application facilitates generation, modification and utilization of design knowledge

    The Scholarly Infrastructure Technical Summit @ eResearch Australasia 2011

    No full text
    Scholarly Infrastructure Technical Summit (SITS) meetings are designed to share technical operational experience between CTOs, Lead Developers and Head System Administrators so as to assure that internationally we are all DRY (Don't Repeat Yourself) i.e. saving technical operational money by learning from each other's previous experiences.Iteration four of the International Scholarly Infrastructure Technical Summit meeting following events in London and California and Geneva took place in Australia alongside the 2011 eResearch Conference (eXtreme eResearch).The eResearch conference brings together CTOs and CIOs from various Universities whose job it is to provide, work with and support IT services for scientific research projects. This was a first for SITS, which previously had seen a mix of researchers and those working on the fringes of institutional IT infrastructure. The Australian focus on supporting research directly thus provided the 4th new platform for the SITS meetings, one closest to the infrastructure itself

    mSpace meets EPrints: a Case Study in Creating Dynamic Digital Collections

    No full text
    In this case study we look at issues involved in (a) generating dynamic digital libraries that are on a particular topic but span heterogeneous collections at distinct sites, (b) supplementing the artefacts in that collection with additional information available either from databases at the artefact's home or from the Web at large, and (c) providing an interaction paradigm that will support effective exploration of this new resource. We describe how we used two available frameworks, mSpace and EPrints to support this kind of collection building. The result of the study is a set of recommendations to improve the connectivity of remote resources both to one another and to related Web resources, and that will also reduce problems like co-referencing in order to enable the creation of new collections on demand
    corecore