111,920 research outputs found

    Distant Supervision for Entity Linking

    Full text link
    Entity linking is an indispensable operation of populating knowledge repositories for information extraction. It studies on aligning a textual entity mention to its corresponding disambiguated entry in a knowledge repository. In this paper, we propose a new paradigm named distantly supervised entity linking (DSEL), in the sense that the disambiguated entities that belong to a huge knowledge repository (Freebase) are automatically aligned to the corresponding descriptive webpages (Wiki pages). In this way, a large scale of weakly labeled data can be generated without manual annotation and fed to a classifier for linking more newly discovered entities. Compared with traditional paradigms based on solo knowledge base, DSEL benefits more via jointly leveraging the respective advantages of Freebase and Wikipedia. Specifically, the proposed paradigm facilitates bridging the disambiguated labels (Freebase) of entities and their textual descriptions (Wikipedia) for Web-scale entities. Experiments conducted on a dataset of 140,000 items and 60,000 features achieve a baseline F1-measure of 0.517. Furthermore, we analyze the feature performance and improve the F1-measure to 0.545

    Towards OpenMath Content Dictionaries as Linked Data

    Full text link
    "The term 'Linked Data' refers to a set of best practices for publishing and connecting structured data on the web". Linked Data make the Semantic Web work practically, which means that information can be retrieved without complicated lookup mechanisms, that a lightweight semantics enables scalable reasoning, and that the decentral nature of the Web is respected. OpenMath Content Dictionaries (CDs) have the same characteristics - in principle, but not yet in practice. The Linking Open Data movement has made a considerable practical impact: Governments, broadcasting stations, scientific publishers, and many more actors are already contributing to the "Web of Data". Queries can be answered in a distributed way, and services aggregating data from different sources are replacing hard-coded mashups. However, these services are currently entirely lacking mathematical functionality. I will discuss real-world scenarios, where today's RDF-based Linked Data do not quite get their job done, but where an integration of OpenMath would help - were it not for certain conceptual and practical restrictions. I will point out conceptual shortcomings in the OpenMath 2 specification and common bad practices in publishing CDs and then propose concrete steps to overcome them and to contribute OpenMath CDs to the Web of Data.Comment: Presented at the OpenMath Workshop 2010, http://cicm2010.cnam.fr/om

    Interactive visual exploration of a large spatio-temporal dataset: Reflections on a geovisualization mashup

    Get PDF
    Exploratory visual analysis is useful for the preliminary investigation of large structured, multifaceted spatio-temporal datasets. This process requires the selection and aggregation of records by time, space and attribute, the ability to transform data and the flexibility to apply appropriate visual encodings and interactions. We propose an approach inspired by geographical 'mashups' in which freely-available functionality and data are loosely but flexibly combined using de facto exchange standards. Our case study combines MySQL, PHP and the LandSerf GIS to allow Google Earth to be used for visual synthesis and interaction with encodings described in KML. This approach is applied to the exploration of a log of 1.42 million requests made of a mobile directory service. Novel combinations of interaction and visual encoding are developed including spatial 'tag clouds', 'tag maps', 'data dials' and multi-scale density surfaces. Four aspects of the approach are informally evaluated: the visual encodings employed, their success in the visual exploration of the clataset, the specific tools used and the 'rnashup' approach. Preliminary findings will be beneficial to others considering using mashups for visualization. The specific techniques developed may be more widely applied to offer insights into the structure of multifarious spatio-temporal data of the type explored here

    Investigating the use of semantic technologies in spatial mapping applications

    Get PDF
    Semantic Web Technologies are ideally suited to build context-aware information retrieval applications. However, the geospatial aspect of context awareness presents unique challenges such as the semantic modelling of geographical references for efficient handling of spatial queries, the reconciliation of the heterogeneity at the semantic and geo-representation levels, maintaining the quality of service and scalability of communicating, and the efficient rendering of the spatial queries' results. In this paper, we describe the modelling decisions taken to solve these challenges by analysing our implementation of an intelligent planning and recommendation tool that provides location-aware advice for a specific application domain. This paper contributes to the methodology of integrating heterogeneous geo-referenced data into semantic knowledgebases, and also proposes mechanisms for efficient spatial interrogation of the semantic knowledgebase and optimising the rendering of the dynamically retrieved context-relevant information on a web frontend

    ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters

    Get PDF
    To bridge the gap between the capabilities of the state-of-the-art in factoid question answering (QA) and what users ask, we need large datasets of real user questions that capture the various question phenomena users are interested in, and the diverse ways in which these questions are formulated. We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as compositionality, temporal reasoning, and comparisons. ComQA questions come from the WikiAnswers community QA platform, which typically contains questions that are not satisfactorily answerable by existing search engine technology. Through a large crowdsourcing effort, we clean the question dataset, group questions into paraphrase clusters, and annotate clusters with their answers. ComQA contains 11,214 questions grouped into 4,834 paraphrase clusters. We detail the process of constructing ComQA, including the measures taken to ensure its high quality while making effective use of crowdsourcing. We also present an extensive analysis of the dataset and the results achieved by state-of-the-art systems on ComQA, demonstrating that our dataset can be a driver of future research on QA.Comment: 11 pages, NAACL 201

    Structuring visual exploratory analysis of skill demand

    No full text
    The analysis of increasingly large and diverse data for meaningful interpretation and question answering is handicapped by human cognitive limitations. Consequently, semi-automatic abstraction of complex data within structured information spaces becomes increasingly important, if its knowledge content is to support intuitive, exploratory discovery. Exploration of skill demand is an area where regularly updated, multi-dimensional data may be exploited to assess capability within the workforce to manage the demands of the modern, technology- and data-driven economy. The knowledge derived may be employed by skilled practitioners in defining career pathways, to identify where, when and how to update their skillsets in line with advancing technology and changing work demands. This same knowledge may also be used to identify the combination of skills essential in recruiting for new roles. To address the challenges inherent in exploring the complex, heterogeneous, dynamic data that feeds into such applications, we investigate the use of an ontology to guide structuring of the information space, to allow individuals and institutions to interactively explore and interpret the dynamic skill demand landscape for their specific needs. As a test case we consider the relatively new and highly dynamic field of Data Science, where insightful, exploratory data analysis and knowledge discovery are critical. We employ context-driven and task-centred scenarios to explore our research questions and guide iterative design, development and formative evaluation of our ontology-driven, visual exploratory discovery and analysis approach, to measure where it adds value to users’ analytical activity. Our findings reinforce the potential in our approach, and point us to future paths to build on

    Compositional Semantic Parsing on Semi-Structured Tables

    Full text link
    Two important aspects of semantic parsing for question answering are the breadth of the knowledge source and the depth of logical compositionality. While existing work trades off one aspect for another, this paper simultaneously makes progress on both fronts through a new task: answering complex questions on semi-structured tables using question-answer pairs as supervision. The central challenge arises from two compounding factors: the broader domain results in an open-ended set of relations, and the deeper compositionality results in a combinatorial explosion in the space of logical forms. We propose a logical-form driven parsing algorithm guided by strong typing constraints and show that it obtains significant improvements over natural baselines. For evaluation, we created a new dataset of 22,033 complex questions on Wikipedia tables, which is made publicly available

    Evaluating Semantic Parsing against a Simple Web-based Question Answering Model

    Full text link
    Semantic parsing shines at analyzing complex natural language that involves composition and computation over multiple pieces of evidence. However, datasets for semantic parsing contain many factoid questions that can be answered from a single web document. In this paper, we propose to evaluate semantic parsing-based question answering models by comparing them to a question answering baseline that queries the web and extracts the answer only from web snippets, without access to the target knowledge-base. We investigate this approach on COMPLEXQUESTIONS, a dataset designed to focus on compositional language, and find that our model obtains reasonable performance (35 F1 compared to 41 F1 of state-of-the-art). We find in our analysis that our model performs well on complex questions involving conjunctions, but struggles on questions that involve relation composition and superlatives.Comment: *sem 201

    Which World Bank reports are widely read?

    Get PDF
    Knowledge is central to development. The World Bank invests about one-quarter of its budget for country services in knowledge products. Still, there is little research about the demand for these knowledge products and how internal knowledge flows affect their demand. About 49 percent of the World Bank’s policy reports, which are published Economic and Sector Work or Technical Assistance reports, have the stated objective of informing the public debate or influencing the development community. This study uses information on downloads and citations to assesses whether policy reports meet this objective. About 13 percent of policy reports were downloaded at least 250 times while more than 31 percent of policy reports are never downloaded. Almost 87 percent of policy reports were never cited. More expensive, complex, multi-sector, core diagnostics reports on middle-income countries with larger populations tend to be downloaded more frequently. Multi-sector reports also tend to be cited more frequently. Internal knowledge sharing matters as cross support provided by the World Bank’s Research Department consistently increases downloads and citations
    corecore