4,863 research outputs found

    Text Extraction and Web Searching in a Non-Latin Language

    Get PDF
    Recent studies of queries submitted to Internet Search Engines have shown that non-English queries and unclassifiable queries have nearly tripled during the last decade. Most search engines were originally engineered for English. They do not take full account of inflectional semantics nor, for example, diacritics or the use of capitals which is a common feature in languages other than English. The literature concludes that searching using non-English and non-Latin based queries results in lower success and requires additional user effort to achieve acceptable precision. The primary aim of this research study is to develop an evaluation methodology for identifying the shortcomings and measuring the effectiveness of search engines with non-English queries. It also proposes a number of solutions for the existing situation. A Greek query log is analyzed considering the morphological features of the Greek language. Also a text extraction experiment revealed some problems related to the encoding and the morphological and grammatical differences among semantically equivalent Greek terms. A first stopword list for Greek based on a domain independent collection has been produced and its application in Web searching has been studied. The effect of lemmatization of query terms and the factors influencing text based image retrieval in Greek are also studied. Finally, an instructional strategy is presented for teaching non-English students how to effectively utilize search engines. The evaluation of the capabilities of the search engines showed that international and nationwide search engines ignore most of the linguistic idiosyncrasies of Greek and other complex European languages. There is a lack of freely available non-English resources to work with (test corpus, linguistic resources, etc). The research showed that the application of standard IR techniques, such as stopword removal, stemming, lemmatization and query expansion, in Greek Web searching increases precision. i

    BlogForever D5.2: Implementation of Case Studies

    Get PDF
    This document presents the internal and external testing results for the BlogForever case studies. The evaluation of the BlogForever implementation process is tabulated under the most relevant themes and aspects obtained within the testing processes. The case studies provide relevant feedback for the sustainability of the platform in terms of potential users’ needs and relevant information on the possible long term impact

    Proceedings of the 6th Dutch-Belgian Information Retrieval Workshop

    Get PDF

    Network research by data graph management for capacity development and knowledge building in sustainable sanitation

    Get PDF
    The Millennium Development Goals (MDG) provide clear targets by 2015 and it turns out that sanitation is by far the largest of all the MDG targets affecting about 40% of the global population. The objective of the Sustainable Sanitation Alliance (SuSanA) is to show how Sustainable Sanitation projects should be planned with participation of stakeholders through capacity development activities. Developing the capacity of societies to collaboratively learn through change and uncertainty is fundamental for sustainability science. The aim of this contribution it is to analyze the role of graph database management (GDM) for improve capacity development and knowledge building in the Sustainable Sanitation framework. We provide a theoretical model with four features of network research: link analysis, social network, pattern recognition and keyword search that we illustrate with some examples. Network research allows us to observe how the information in Sustainable Sanitation is scattered properly through the structure and also to detect the emergencies, objections and other characteristics of the network.Peer Reviewe

    Discovering Mathematical Objects of Interest -- A Study of Mathematical Notations

    Full text link
    Mathematical notation, i.e., the writing system used to communicate concepts in mathematics, encodes valuable information for a variety of information search and retrieval systems. Yet, mathematical notations remain mostly unutilized by today's systems. In this paper, we present the first in-depth study on the distributions of mathematical notation in two large scientific corpora: the open access arXiv (2.5B mathematical objects) and the mathematical reviewing service for pure and applied mathematics zbMATH (61M mathematical objects). Our study lays a foundation for future research projects on mathematical information retrieval for large scientific corpora. Further, we demonstrate the relevance of our results to a variety of use-cases. For example, to assist semantic extraction systems, to improve scientific search engines, and to facilitate specialized math recommendation systems. The contributions of our presented research are as follows: (1) we present the first distributional analysis of mathematical formulae on arXiv and zbMATH; (2) we retrieve relevant mathematical objects for given textual search queries (e.g., linking Pn(α,ÎČ) ⁣(x)P_{n}^{(\alpha, \beta)}\!\left(x\right) with `Jacobi polynomial'); (3) we extend zbMATH's search engine by providing relevant mathematical formulae; and (4) we exemplify the applicability of the results by presenting auto-completion for math inputs as the first contribution to math recommendation systems. To expedite future research projects, we have made available our source code and data.Comment: Proceedings of The Web Conference 2020 (WWW'20), April 20--24, 2020, Taipei, Taiwa

    Search Engine Optimization

    Get PDF
    This Special Issue book focuses on the theory and practice of search engine optimization (SEO). It is intended for anyone who publishes content online and it includes five peer-reviewed papers from various researchers. More specifically, the book includes theoretical and case study contributions which review and synthesize important aspects, including, but not limited to, the following themes: theory of SEO, different types of SEO, SEO criteria evaluation, search engine algorithms, social media and SEO, and SEO applications in various industries, as well as SEO on media websites. The book aims to give a better understanding of the importance of SEO in the current state of the Internet and online information search. Even though SEO is widely used by marketing practitioners, there is a relatively small amount of academic research that systematically attempts to capture this phenomenon and its impact across different industries. Thus, this collection of studies offers useful insights, as well as a valuable resource that intends to open the door for future SEO-related research
    • 

    corecore