Search CORE

4,263 research outputs found

Methodologies for the Automatic Location of Academic and Educational Texts on the Internet

Author: Oxnard L.
Evans A.
Publication venue: School of Geography
Publication date: 01/01/2003
Field of study

Traditionally online databases of web resources have been compiled by a human editor, or though the submissions of authors or interested parties. Considerable resources are needed to maintain a constant level of input and relevance in the face of increasing material quantity and quality, and much of what is in databases is of an ephemeral nature. These pressures dictate that many databases stagnate after an initial period of enthusiastic data entry. The solution to this problem would seem to be the automatic harvesting of resources, however, this process necessitates the automatic classification of resources as ‘appropriate’ to a given database, a problem only solved by complex text content analysis. This paper outlines the component methodologies necessary to construct such an automated harvesting system, including a number of novel approaches. In particular this paper looks at the specific problems of automatically identifying academic research work and Higher Education pedagogic materials. Where appropriate, experimental data is presented from searches in the field of Geography as well as the Earth and Environmental Sciences. In addition, appropriate software is reviewed where it exists, and future directions are outlined

Toward higher effectiveness for recall-oriented information retrieval: A patent retrieval case study

Author: Magdy Walid
Publication venue: Dublin City University. School of Computing
Publication date: 01/03/2012
Field of study

Research in information retrieval (IR) has largely been directed towards tasks requiring high precision. Recently, other IR applications which can be described as recall-oriented IR tasks have received increased attention in the IR research domain. Prominent among these IR applications are patent search and legal search, where users are typically ready to check hundreds or possibly thousands of documents in order to find any possible relevant document. The main concerns in this kind of application are very different from those in standard precision-oriented IR tasks, where users tend to be focused on finding an answer to their information need that can typically be addressed by one or two relevant documents. For precision-oriented tasks, mean average precision continues to be used as the primary evaluation metric for almost all IR applications. For recall-oriented IR applications the nature of the search task, including objectives, users, queries, and document collections, is different from that of standard precision-oriented search tasks. In this research study, two dimensions in IR are explored for the recall-oriented patent search task. The study includes IR system evaluation and multilingual IR for patent search. In each of these dimensions, current IR techniques are studied and novel techniques developed especially for this kind of recall-oriented IR application are proposed and investigated experimentally in the context of patent retrieval. The techniques developed in this thesis provide a significant contribution toward evaluating the effectiveness of recall-oriented IR in general and particularly patent search, and improving the efficiency of multilingual search for this kind of task

Evaluating Information Retrieval and Access Tasks

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This open access book summarizes the first two decades of the NII Testbeds and Community for Information access Research (NTCIR). NTCIR is a series of evaluation forums run by a global team of researchers and hosted by the National Institute of Informatics (NII), Japan. The book is unique in that it discusses not just what was done at NTCIR, but also how it was done and the impact it has achieved. For example, in some chapters the reader sees the early seeds of what eventually grew to be the search engines that provide access to content on the World Wide Web, today’s smartphones that can tailor what they show to the needs of their owners, and the smart speakers that enrich our lives at home and on the move. We also get glimpses into how new search engines can be built for mathematical formulae, or for the digital record of a lived human life. Key to the success of the NTCIR endeavor was early recognition that information access research is an empirical discipline and that evaluation therefore lay at the core of the enterprise. Evaluation is thus at the heart of each chapter in this book. They show, for example, how the recognition that some documents are more important than others has shaped thinking about evaluation design. The thirty-three contributors to this volume speak for the many hundreds of researchers from dozens of countries around the world who together shaped NTCIR as organizers and participants. This book is suitable for researchers, practitioners, and students—anyone who wants to learn about past and present evaluation efforts in information retrieval, information access, and natural language processing, as well as those who want to participate in an evaluation task or even to design and organize one

OAPEN Library

HYPERLINK NETWORK SYSTEM AND IMAGE OF GLOBAL CITIES: WEBPAGES AND THEIR CONTENTS

Author: NC DOCKS at The University of North Carolina at Charlotte
Son Jae Soen
Publication venue
Publication date: 01/01/2014
Field of study

A distinctive trend of globalization research is a conceptual expansion that mirrors the penetration of globalization in various aspects of life. The World Wide Web has become the ultimate platform to create and disseminate information in this era of globalization. Although the importance of web-based information is widely acknowledged, the use of this information in global city research is not significant yet. Therefore, the purpose of this research is to extend the concept of globalization to the efficiency of information networks and the thematic dimensionality of the conveyed images from webpages. To this end, 264 global and globalizing cities are selected. The city hyperlink networks are constructed from the web crawling results of each city, and hyperlink network analysis measures the effectiveness of these hyperlink networks. The textual contents are also extracted from the crawled webpages, and the thematic dimensionality of the textual contents is measured by quantified content analysis and multidimensional scaling. The efficiency of the hyperlink network in information flow is confirmed to be a new consideration that shapes the globality of cities. The cities with high efficiency of connections have faster and easier access, which means better structure for city image formation. Specifically, social networking websites are the center of this information flow. This means that social interactions on the Web play a crucial role to form the images of cities. Apart from the positivity and the negativity of the city image, the dimensionality of cities on the thematic space denotes how they are expressed, discussed, and shared on the Web. The image status based on dimensions of globalization is an important starting point to city branding. It is concluded that a research framework handling information networks and images simultaneously deepens the understanding of how the structure and the contents on the Web affect the formation and maintenance of global city networks. Overall, this research demonstrates the usefulness of information networks and images of cities on the Web to overcome data inconsistency and scarcity in global city research

Text mining for central banks: handbook

Author: Bholat David
Hansen Stephen
Santos Pedro
Schonhardt-Bailey Cheryl
Publication venue: Bank of England
Publication date: 01/01/2015
Field of study