22,507 research outputs found

    Discovering Links for Metadata Enrichment on Computer Science Papers

    Full text link
    At the very beginning of compiling a bibliography, usually only basic information, such as title, authors and publication date of an item are known. In order to gather additional information about a specific item, one typically has to search the library catalog or use a web search engine. This look-up procedure implies a manual effort for every single item of a bibliography. In this technical report we present a proof of concept which utilizes Linked Data technology for the simple enrichment of sparse metadata sets. This is done by discovering owl:sameAs links be- tween an initial set of computer science papers and resources from external data sources like DBLP, ACM and the Semantic Web Conference Corpus. In this report, we demonstrate how the link discovery tool Silk is used to detect additional information and to enrich an initial set of records in the computer science domain. The pros and cons of silk as link discovery tool are summarized in the end.Comment: 22 pages, 4 figures, 7 listings, presented at SWIB1

    Hypermedia-based discovery for source selection using low-cost linked data interfaces

    Get PDF
    Evaluating federated Linked Data queries requires consulting multiple sources on the Web. Before a client can execute queries, it must discover data sources, and determine which ones are relevant. Federated query execution research focuses on the actual execution, while data source discovery is often marginally discussed-even though it has a strong impact on selecting sources that contribute to the query results. Therefore, the authors introduce a discovery approach for Linked Data interfaces based on hypermedia links and controls, and apply it to federated query execution with Triple Pattern Fragments. In addition, the authors identify quantitative metrics to evaluate this discovery approach. This article describes generic evaluation measures and results for their concrete approach. With low-cost data summaries as seed, interfaces to eight large real-world datasets can discover each other within 7 minutes. Hypermedia-based client-side querying shows a promising gain of up to 50% in execution time, but demands algorithms that visit a higher number of interfaces to improve result completeness

    Liberating links between datasets using lightweight data publishing: an example using plant names and the taxonomic literature

    Get PDF
    Constructing a biodiversity knowledge graph will require making millions of cross links between diversity entities in different datasets. Researchers trying to bootstrap the growth of the biodiversity knowledge graph by constructing databases of links between these entities lack obvious ways to publish these sets of links. One appealing and lightweight approach is to create a "datasette", a database that is wrapped together with a simple web server that enables users to query the data. Datasettes can be packaged into Docker containers and hosted online with minimal effort. This approach is illustrated using a dataset of links between globally unique identifiers for plant taxonomic namesand identifiers for the taxonomic articles that published those names

    From the User to the Medium: Neural Profiling Across Web Communities

    Full text link
    Online communities provide a unique way for individuals to access information from those in similar circumstances, which can be critical for health conditions that require daily and personalized management. As these groups and topics often arise organically, identifying the types of topics discussed is necessary to understand their needs. As well, these communities and people in them can be quite diverse, and existing community detection methods have not been extended towards evaluating these heterogeneities. This has been limited as community detection methodologies have not focused on community detection based on semantic relations between textual features of the user-generated content. Thus here we develop an approach, NeuroCom, that optimally finds dense groups of users as communities in a latent space inferred by neural representation of published contents of users. By embedding of words and messages, we show that NeuroCom demonstrates improved clustering and identifies more nuanced discussion topics in contrast to other common unsupervised learning approaches

    Km4City Ontology Building vs Data Harvesting and Cleaning for Smart-city Services

    Get PDF
    Presently, a very large number of public and private data sets are available from local governments. In most cases, they are not semantically interoperable and a huge human effort would be needed to create integrated ontologies and knowledge base for smart city. Smart City ontology is not yet standardized, and a lot of research work is needed to identify models that can easily support the data reconciliation, the management of the complexity, to allow the data reasoning. In this paper, a system for data ingestion and reconciliation of smart cities related aspects as road graph, services available on the roads, traffic sensors etc., is proposed. The system allows managing a big data volume of data coming from a variety of sources considering both static and dynamic data. These data are mapped to a smart-city ontology, called KM4City (Knowledge Model for City), and stored into an RDF-Store where they are available for applications via SPARQL queries to provide new services to the users via specific applications of public administration and enterprises. The paper presents the process adopted to produce the ontology and the big data architecture for the knowledge base feeding on the basis of open and private data, and the mechanisms adopted for the data verification, reconciliation and validation. Some examples about the possible usage of the coherent big data knowledge base produced are also offered and are accessible from the RDF-Store and related services. The article also presented the work performed about reconciliation algorithms and their comparative assessment and selection

    Report of the Stanford Linked Data Workshop

    No full text
    The Stanford University Libraries and Academic Information Resources (SULAIR) with the Council on Library and Information Resources (CLIR) conducted at week-long workshop on the prospects for a large scale, multi-national, multi-institutional prototype of a Linked Data environment for discovery of and navigation among the rapidly, chaotically expanding array of academic information resources. As preparation for the workshop, CLIR sponsored a survey by Jerry Persons, Chief Information Architect emeritus of SULAIR that was published originally for workshop participants as background to the workshop and is now publicly available. The original intention of the workshop was to devise a plan for such a prototype. However, such was the diversity of knowledge, experience, and views of the potential of Linked Data approaches that the workshop participants turned to two more fundamental goals: building common understanding and enthusiasm on the one hand and identifying opportunities and challenges to be confronted in the preparation of the intended prototype and its operation on the other. In pursuit of those objectives, the workshop participants produced:1. a value statement addressing the question of why a Linked Data approach is worth prototyping;2. a manifesto for Linked Libraries (and Museums and Archives and …);3. an outline of the phases in a life cycle of Linked Data approaches;4. a prioritized list of known issues in generating, harvesting & using Linked Data;5. a workflow with notes for converting library bibliographic records and other academic metadata to URIs;6. examples of potential “killer apps” using Linked Data: and7. a list of next steps and potential projects.This report includes a summary of the workshop agenda, a chart showing the use of Linked Data in cultural heritage venues, and short biographies and statements from each of the participants

    Efficient Discovery of Ontology Functional Dependencies

    Full text link
    Poor data quality has become a pervasive issue due to the increasing complexity and size of modern datasets. Constraint based data cleaning techniques rely on integrity constraints as a benchmark to identify and correct errors. Data values that do not satisfy the given set of constraints are flagged as dirty, and data updates are made to re-align the data and the constraints. However, many errors often require user input to resolve due to domain expertise defining specific terminology and relationships. For example, in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be captured in a pharmaceutical ontology. While functional dependencies (FDs) have traditionally been used in existing data cleaning solutions to model syntactic equivalence, they are not able to model broader relationships (e.g., is-a) defined by an ontology. In this paper, we take a first step towards extending the set of data quality constraints used in data cleaning by defining and discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out theoretical and practical foundations for OFDs, including a set of sound and complete axioms, and a linear inference procedure. We then develop effective algorithms for discovering OFDs, and a set of optimizations that efficiently prune the search space. Our experimental evaluation using real data show the scalability and accuracy of our algorithms.Comment: 12 page

    trackr: A Framework for Enhancing Discoverability and Reproducibility of Data Visualizations and Other Artifacts in R

    Full text link
    Research is an incremental, iterative process, with new results relying and building upon previous ones. Scientists need to find, retrieve, understand, and verify results in order to confidently extend them, even when the results are their own. We present the trackr framework for organizing, automatically annotating, discovering, and retrieving results. We identify sources of automatically extractable metadata for computational results, and we define an extensible system for organizing, annotating, and searching for results based on these and other metadata. We present an open-source implementation of these concepts for plots, computational artifacts, and woven dynamic reports generated in the R statistical computing language

    The linked data strategy for global identity

    No full text
    The Web's promise for planet-scale data integration depends on solving the thorny problem of identity: given one or more possible identifiers, how can we determine whether they refer to the same or different things? Here, the authors discuss various ways to deal with the identity problem in the context of linked data
    • …
    corecore