7,147 research outputs found

    Entity set expansion from the Web via ASP

    Get PDF
    Knowledge on the Web in a large part is stored in various semantic resources that formalize, represent and organize it differently. Combining information from several sources can improve results of tasks such as recognizing similarities among objects. In this paper, we propose a logic-based method for the problem of entity set expansion (ESE), i.e. extending a list of named entities given a set of seeds. This problem has relevant applications in the Information Extraction domain, specifically in automatic lexicon generation for dictionary-based annotating tools. Contrary to typical approaches in natural languages processing, based on co-occurrence statistics of words, we determine the common category of the seeds by analyzing the semantic relations of the objects the words represent. To do it, we integrate information from selected Web resources. We introduce a notion of an entity network that uniformly represents the combined knowledge and allow to reason over it. We show how to use the network to disambiguate word senses by relying on a concept of optimal common ancestor and how to discover similarities between two entities. Finally, we show how to expand a set of entities, by using answer set programming with external predicates

    ANSWERING TOPICAL INFORMATION NEEDS USING NEURAL ENTITY-ORIENTED INFORMATION RETRIEVAL AND EXTRACTION

    Get PDF
    In the modern world, search engines are an integral part of human lives. The field of Information Retrieval (IR) is concerned with finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need (query) from within large collections (usually stored on computers). The search engine then displays a ranked list of results relevant to our query. Traditional document retrieval algorithms match a query to a document using the overlap of words in both. However, the last decade has seen the focus shifting to leveraging the rich semantic information available in the form of entities. Entities are uniquely identifiable objects or things such as places, events, diseases, etc. that exist in the real or fictional world. Entity-oriented search systems leverage the semantic information associated with entities (e.g., names, types, etc.) to better match documents to queries. Web search engines would provide better search results if they understand the meaning of a query. This dissertation advances the state-of-the-art in IR by developing novel algorithmsthat understand text (query, document, question, sentence, etc.) at the semantic level. To this end, this dissertation aims to understand the fine-grained meaning of entities from the context in which the entities have been mentioned, for example, “oysters” in the context of food versus ecosystems. Further, we aim to automatically learn (vector) representations of entities that incorporate this fine-grained knowledge and knowledge about the query. This work refines the automatic understanding of text passages using deep learning, a modern artificial intelligence paradigm. This dissertation utilized the semantic information extracted from entities to retrieve materials (text and entities) relevant to a query. The interplay between text and entities in the text is studied by addressing three related prediction problems: (1) Identify entities that are relevant for the query, (2) Understand an entity’s meaning in the context of the query, and (3) Identify text passages that elaborate the connection between the query and an entity. The research presented in this dissertation may be integrated into a larger system de-signed for answering complex topical queries such as dark chocolate health benefits which require the search engine to automatically understand the connections between the query and the relevant material, thus transforming the search engine into an answering engine

    The SADC Groundwater Data and Information Archive, Knowledge Sharing and Co-operation Project. Final report

    Get PDF
    The Southern African Development Community (SADC) Groundwater Data and Information Archive, Knowledge Sharing and Co-operation Project, funded by the German Development Cooperation (GIZ) and Department for International Development, UK (DFID), was initiated in September 2009 to identify, catalogue and subsequently promote access to the large collection of reports held in the UK by the British Geological Survey (BGS). The work has focused on a wealth of unpublished so-called “grey” data and information which describes groundwater occurrence and development in Southern Africa and was gathered by the BGS over its many decades of involvement in the region. The project has four main aims: To catalogue and describe the "grey data" documents on SADC groundwater held by the BGS within a digital metadatabase. To identify a sub-set of scanned documents to be made freely available to groundwater practitioners and managers in the SADC region by electronic distribution. To link the metadatabase and digital sub-set of documents via a web portal hosted by the BGS, to enable download of documents by SADC groundwater workers. To strengthen links between BGS hydrogeologists with counterparts in SADC, and provide an example of groundwater data sharing which could be emulated by other European Geological Surveys with substantial data holdings on SADC groundwater. The project has successfully met these aims. The assessment of BGS archived material produced an electronic meta-database describing 1735 items held in hard copy. Of these, 1041 have been scanned digitally to searchable Portable Document Format (PDF) format. A subset of 655 PDFs including partial documents related to groundwater development from the colonial and post independence period as well as BGS internal project reports and reports approved for web dissemination by host countries are now available to download (free of charge) at http://www.SADCgroundwaterarchive.com . Initial results indicate a good deal of interest both from within SADC and elsewhere, accessed by directly addressing the website and via a search engine such as Google. The information presented has already been used by in-region projects such as the SADC Hydrogeological Mapping project and the Malawi Water Assessment Project. This is essentially a pilot project providing an example of how Web delivery of the archive is an important step forward for the well-being of the SADC region. It permits access to documents few even new existed and will, it is hoped, provide a valuable dataset that should inhibit the temptation to waste scarce resources by ‘re-inventing the wheel’

    The NASA Astrophysics Data System: Architecture

    Full text link
    The powerful discovery capabilities available in the ADS bibliographic services are possible thanks to the design of a flexible search and retrieval system based on a relational database model. Bibliographic records are stored as a corpus of structured documents containing fielded data and metadata, while discipline-specific knowledge is segregated in a set of files independent of the bibliographic data itself. The creation and management of links to both internal and external resources associated with each bibliography in the database is made possible by representing them as a set of document properties and their attributes. To improve global access to the ADS data holdings, a number of mirror sites have been created by cloning the database contents and software on a variety of hardware and software platforms. The procedures used to create and manage the database and its mirrors have been written as a set of scripts that can be run in either an interactive or unsupervised fashion. The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table

    Social Search with Missing Data: Which Ranking Algorithm?

    Get PDF
    Online social networking tools are extremely popular, but can miss potential discoveries latent in the social 'fabric'. Matchmaking services which can do naive profile matching with old database technology are too brittle in the absence of key data, and even modern ontological markup, though powerful, can be onerous at data-input time. In this paper, we present a system called BuddyFinder which can automatically identify buddies who can best match a user's search requirements specified in a term-based query, even in the absence of stored user-profiles. We deploy and compare five statistical measures, namely, our own CORDER, mutual information (MI), phi-squared, improved MI and Z score, and two TF/IDF based baseline methods to find online users who best match the search requirements based on 'inferred profiles' of these users in the form of scavenged web pages. These measures identify statistically significant relationships between online users and a term-based query. Our user evaluation on two groups of users shows that BuddyFinder can find users highly relevant to search queries, and that CORDER achieved the best average ranking correlations among all seven algorithms and improved the performance of both baseline methods

    The NASA Astrophysics Data System: Data Holdings

    Get PDF
    Since its inception in 1993, the ADS Abstract Service has become an indispensable research tool for astronomers and astrophysicists worldwide. In those seven years, much effort has been directed toward improving both the quantity and the quality of references in the database. From the original database of approximately 160,000 astronomy abstracts, our dataset has grown almost tenfold to approximately 1.5 million references covering astronomy, astrophysics, planetary sciences, physics, optics, and engineering. We collect and standardize data from approximately 200 journals and present the resulting information in a uniform, coherent manner. With the cooperation of journal publishers worldwide, we have been able to place scans of full journal articles on-line back to the first volumes of many astronomical journals, and we are able to link to current version of articles, abstracts, and datasets for essentially all of the current astronomy literature. The trend toward electronic publishing in the field, the use of electronic submission of abstracts for journal articles and conference proceedings, and the increasingly prominent use of the World Wide Web to disseminate information have enabled the ADS to build a database unparalleled in other disciplines. The ADS can be accessed at http://adswww.harvard.eduComment: 24 pages, 1 figure, 6 tables, 3 appendice

    Approaches for enriching and improving textual knowledge bases

    Get PDF
    [no abstract

    Entity Set Expansion using Interactive Topic Information

    Get PDF

    Integrating multiple document features in language models for expert finding

    Get PDF
    We argue that expert finding is sensitive to multiple document features in an organizational intranet. These document features include multiple levels of associations between experts and a query topic from sentence, paragraph, up to document levels, document authority information such as the PageRank, indegree, and URL length of documents, and internal document structures that indicate the experts' relationship with the content of documents. Our assumption is that expert finding can largely benefit from the incorporation of these document features. However, existing language modeling approaches for expert finding have not sufficiently taken into account these document features. We propose a novel language modeling approach, which integrates multiple document features, for expert finding. Our experiments on two large scale TREC Enterprise Track datasets, i.e., the W3C and CSIRO datasets, demonstrate that the natures of the two organizational intranets and two types of expert finding tasks, i.e., key contact finding for CSIRO and knowledgeable person finding for W3C, influence the effectiveness of different document features. Our work provides insights into which document features work for certain types of expert finding tasks, and helps design expert finding strategies that are effective for different scenarios. Our main contribution is to develop an effective formal method for modeling multiple document features in expert finding, and conduct a systematic investigation of their effects. It is worth noting that our novel approach achieves better results in terms of MAP than previous language model based approaches and the best automatic runs in both the TREC2006 and TREC2007 expert search tasks, respectively

    Electronic security - risk mitigation in financial transactions : public policy issues

    Get PDF
    This paper builds on a previous series of papers (see Claessens, Glaessner, and Klingebiel, 2001, 2002) that identified electronic security as a key component to the delivery of electronic finance benefits. This paper and its technical annexes (available separately at http://www1.worldbank.org/finance/) identify and discuss seven key pillars necessary to fostering a secure electronic environment. Hence, it is intended for those formulating broad policies in the area of electronic security and those working with financial services providers (for example, executives and management). The detailed annexes of this paper are especially relevant for chief information and security officers responsible for establishing layered security. First, this paper provides definitions of electronic finance and electronic security and explains why these issues deserve attention. Next, it presents a picture of the burgeoning global electronic security industry. Then it develops a risk-management framework for understanding the risks and tradeoffs inherent in the electronic security infrastructure. It also provides examples of tradeoffs that may arise with respect to technological innovation, privacy, quality of service, and security in designing an electronic security policy framework. Finally, it outlines issues in seven interrelated areas that often need attention in building an adequate electronic security infrastructure. These are: 1) The legal framework and enforcement. 2) Electronic security of payment systems. 3) Supervision and prevention challenges. 4) The role of private insurance as an essential monitoring mechanism. 5) Certification, standards, and the role of the public and private sectors. 6) Improving the accuracy of information on electronic security incidents and creating better arrangements for sharing this information. 7) Improving overall education on these issues as a key to enhancing prevention.Knowledge Economy,Labor Policies,International Terrorism&Counterterrorism,Payment Systems&Infrastructure,Banks&Banking Reform,Education for the Knowledge Economy,Knowledge Economy,Banks&Banking Reform,International Terrorism&Counterterrorism,Governance Indicators
    • 

    corecore