39,602 research outputs found

    Named Entity Extraction and Disambiguation: The Reinforcement Effect.

    Get PDF
    Named entity extraction and disambiguation have received much attention in recent years. Typical fields addressing these topics are information retrieval, natural language processing, and semantic web. Although these topics are highly dependent, almost no existing works examine this dependency. It is the aim of this paper to examine the dependency and show how one affects the other, and vice versa. We conducted experiments with a set of descriptions of holiday homes with the aim to extract and disambiguate toponyms as a representative example of named entities. We experimented with three approaches for disambiguation with the purpose to infer the country of the holiday home. We examined how the effectiveness of extraction influences the effectiveness of disambiguation, and reciprocally, how filtering out ambiguous names (an activity that depends on the disambiguation process) improves the effectiveness of extraction. Since this, in turn, may improve the effectiveness of disambiguation again, it shows that extraction and disambiguation may reinforce each other.\u

    Spatio-textual indexing for geographical search on the web

    Get PDF
    Many web documents refer to specific geographic localities and many people include geographic context in queries to web search engines. Standard web search engines treat the geographical terms in the same way as other terms. This can result in failure to find relevant documents that refer to the place of interest using alternative related names, such as those of included or nearby places. This can be overcome by associating text indexing with spatial indexing methods that exploit geo-tagging procedures to categorise documents with respect to geographic space. We describe three methods for spatio-textual indexing based on multiple spatially indexed text indexes, attaching spatial indexes to the document occurrences of a text index, and merging text index access results with results of access to a spatial index of documents. These schemes are compared experimentally with a conventional text index search engine, using a collection of geo-tagged web documents, and are shown to be able to compete in speed and storage performance with pure text indexing

    Experiments in terabyte searching, genomic retrieval and novelty detection for TREC 2004

    Get PDF
    In TREC2004, Dublin City University took part in three tracks, Terabyte (in collaboration with University College Dublin), Genomic and Novelty. In this paper we will discuss each track separately and present separate conclusions from this work. In addition, we present a general description of a text retrieval engine that we have developed in the last year to support our experiments into large scale, distributed information retrieval, which underlies all of the track experiments described in this document

    Quantitative Perspectives on Fifty Years of the Journal of the History of Biology

    Get PDF
    Journal of the History of Biology provides a fifty-year long record for examining the evolution of the history of biology as a scholarly discipline. In this paper, we present a new dataset and preliminary quantitative analysis of the thematic content of JHB from the perspectives of geography, organisms, and thematic fields. The geographic diversity of authors whose work appears in JHB has increased steadily since 1968, but the geographic coverage of the content of JHB articles remains strongly lopsided toward the United States, United Kingdom, and western Europe and has diversified much less dramatically over time. The taxonomic diversity of organisms discussed in JHB increased steadily between 1968 and the late 1990s but declined in later years, mirroring broader patterns of diversification previously reported in the biomedical research literature. Finally, we used a combination of topic modeling and nonlinear dimensionality reduction techniques to develop a model of multi-article fields within JHB. We found evidence for directional changes in the representation of fields on multiple scales. The diversity of JHB with regard to the representation of thematic fields has increased overall, with most of that diversification occurring in recent years. Drawing on the dataset generated in the course of this analysis, as well as web services in the emerging digital history and philosophy of science ecosystem, we have developed an interactive web platform for exploring the content of JHB, and we provide a brief overview of the platform in this article. As a whole, the data and analyses presented here provide a starting-place for further critical reflection on the evolution of the history of biology over the past half-century.Comment: 45 pages, 14 figures, 4 table

    Knowledge Discovery in Online Repositories: A Text Mining Approach

    Get PDF
    Before the advent of the Internet, the newspapers were the prominent instrument of mobilization for independence and political struggles. Since independence in Nigeria, the political class has adopted newspapers as a medium of Political Competition and Communication. Consequently, most political information exists in unstructured form and hence the need to tap into it using text mining algorithm. This paper implements a text mining algorithm on some unstructured data format in some newspapers. The algorithm involves the following natural language processing techniques: tokenization, text filtering and refinement. As a follow-up to the natural language techniques, association rule mining technique of data mining is used to extract knowledge using the Modified Generating Association Rules based on Weighting scheme (GARW). The main contributions of the technique are that it integrates information retrieval scheme (Term Frequency Inverse Document Frequency) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) with Data Mining technique for association rules discovery. The program is applied to Pre-Election information gotten from the website of the Nigerian Guardian newspaper. The extracted association rules contained important features and described the informative news included in the documents collection when related to the concluded 2007 presidential election. The system presented useful information that could help sanitize the polity as well as protect the nascent democracy

    Explicit diversification of event aspects for temporal summarization

    Get PDF
    During major events, such as emergencies and disasters, a large volume of information is reported on newswire and social media platforms. Temporal summarization (TS) approaches are used to automatically produce concise overviews of such events by extracting text snippets from related articles over time. Current TS approaches rely on a combination of event relevance and textual novelty for snippet selection. However, for events that span multiple days, textual novelty is often a poor criterion for selecting snippets, since many snippets are textually unique but are semantically redundant or non-informative. In this article, we propose a framework for the diversification of snippets using explicit event aspects, building on recent works in search result diversification. In particular, we first propose two techniques to identify explicit aspects that a user might want to see covered in a summary for different types of event. We then extend a state-of-the-art explicit diversification framework to maximize the coverage of these aspects when selecting summary snippets for unseen events. Through experimentation over the TREC TS 2013, 2014, and 2015 datasets, we show that explicit diversification for temporal summarization significantly outperforms classical novelty-based diversification, as the use of explicit event aspects reduces the amount of redundant and off-topic snippets returned, while also increasing summary timeliness

    The Extraction of Community Structures from Publication Networks to Support Ethnographic Observations of Field Differences in Scientific Communication

    Full text link
    The scientific community of researchers in a research specialty is an important unit of analysis for understanding the field specific shaping of scientific communication practices. These scientific communities are, however, a challenging unit of analysis to capture and compare because they overlap, have fuzzy boundaries, and evolve over time. We describe a network analytic approach that reveals the complexities of these communities through examination of their publication networks in combination with insights from ethnographic field studies. We suggest that the structures revealed indicate overlapping sub- communities within a research specialty and we provide evidence that they differ in disciplinary orientation and research practices. By mapping the community structures of scientific fields we aim to increase confidence about the domain of validity of ethnographic observations as well as of collaborative patterns extracted from publication networks thereby enabling the systematic study of field differences. The network analytic methods presented include methods to optimize the delineation of a bibliographic data set in order to adequately represent a research specialty, and methods to extract community structures from this data. We demonstrate the application of these methods in a case study of two research specialties in the physical and chemical sciences.Comment: Accepted for publication in JASIS

    What guidance are researchers given on how to present network meta-analyses to end-users such as policymakers and clinicians? A systematic review

    Get PDF
    © 2014 Sullivan et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Introduction: Network meta-analyses (NMAs) are complex methodological approaches that may be challenging for non-technical end-users, such as policymakers and clinicians, to understand. Consideration should be given to identifying optimal approaches to presenting NMAs that help clarify analyses. It is unclear what guidance researchers currently have on how to present and tailor NMAs to different end-users. Methods: A systematic review of NMA guidelines was conducted to identify guidance on how to present NMAs. Electronic databases and supplementary sources were searched for NMA guidelines. Presentation format details related to sample formats, target audiences, data sources, analysis methods and results were extracted and frequencies tabulated. Guideline quality was assessed following criteria developed for clinical practice guidelines. Results: Seven guidelines were included. Current guidelines focus on how to conduct NMAs but provide limited guidance to researchers on how to best present analyses to different end-users. None of the guidelines provided reporting templates. Few guidelines provided advice on tailoring presentations to different end-users, such as policymakers. Available guidance on presentation formats focused on evidence networks, characteristics of individual trials, comparisons between direct and indirect estimates and assumptions of heterogeneity and/or inconsistency. Some guidelines also provided examples of figures and tables that could be used to present information. Conclusions: Limited guidance exists for researchers on how best to present NMAs in an accessible format, especially for non-technical end-users such as policymakers and clinicians. NMA guidelines may require further integration with end-users' needs, when NMAs are used to support healthcare policy and practice decisions. Developing presentation formats that enhance understanding and accessibility of NMAs could also enhance the transparency and legitimacy of decisions informed by NMAs.The Canadian Institute of Health Research (CIHR) Drug Safety and Effectiveness Network (Funding reference number – 116573)
    corecore