68,849 research outputs found

    SparkIR: a Scalable Distributed Information Retrieval Engine over Spark

    Get PDF
    Search engines have to deal with a huge amount of data (e.g., billions of documents in the case of the Web) and find scalable and efficient ways to produce effective search results. In this thesis, we propose to use Spark framework, an in memory distributed big data processing framework, and leverage its powerful capabilities of handling large amount of data to build an efficient and scalable experimental search engine over textual documents. The proposed system, SparkIR, can serve as a research framework for conducting information retrieval (IR) experiments. SparkIR supports two indexing schemes, document-based partitioning and term-based partitioning, to adopt document-at-a-time (DAAT) and term-at-a-time (TAAT) query evaluation methods. Moreover, it offers static and dynamic pruning to improve the retrieval efficiency. For static pruning, it employs champion list and tiering, while for dynamic pruning, it uses MaxScore top k retrieval. We evaluated the performance of SparkIR using ClueWeb12-B13 collection that contains about 50M English Web pages. Experiments over different subsets of the collection and compared the Elasticsearch baseline show that SparkIR exhibits reasonable efficiency and scalability performance overall for both indexing and retrieval. Implemented as an open-source library over Spark, users of SparkIR can also benefit from other Spark libraries (e.g., MLlib and GraphX), which, therefore, eliminates the need of usin

    Profiling and understanding student information behaviour: Methodologies and meaning

    Get PDF
    This paper draws on work conducted under the Joint Information Systems Committee (JISC) User Behaviour Monitoring and Evaluation Framework to identify a range of issues associated with research design that can form a platform for enquiry about knowledge creation in the arena of user behaviour. The Framework has developed a multidimensional set of tools for profiling, monitoring and evaluating user behaviour. The Framework has two main approaches: one, a broad‐based survey which generates both a qualitative and a quantitative profile of user behaviour, and the other a longitudinal qualitative study of user behaviour that (in addition to providing in‐depth insights) is the basis for the development of the EIS (Electronic Information Services) Diagnostic Toolkit. The strengths and weaknesses of the Framework approach are evaluated. In the context of profiling user behaviour, key methodological concerns relate to: representativeness, sampling and access, the selection of appropriate measures and the interpretation of those measures. Qualitative approaches are used to generate detailed insights. These include detailed narratives, case study analysis and gap analysis. The messages from this qualitative analysis do not lend themselves to simple summarization. One approach that has been employed to capture and interpret these messages is the development of the EIS Diagnostic Toolkit. This toolkit can be used to assess and monitor an institution's progress with embedding EIS into learning processes. Finally, consideration must be given to integration of insights generated through different strands within the Framework

    Which one is better: presentation-based or content-based math search?

    Full text link
    Mathematical content is a valuable information source and retrieving this content has become an important issue. This paper compares two searching strategies for math expressions: presentation-based and content-based approaches. Presentation-based search uses state-of-the-art math search system while content-based search uses semantic enrichment of math expressions to convert math expressions into their content forms and searching is done using these content-based expressions. By considering the meaning of math expressions, the quality of search system is improved over presentation-based systems

    The contribution of data mining to information science

    Get PDF
    The information explosion is a serious challenge for current information institutions. On the other hand, data mining, which is the search for valuable information in large volumes of data, is one of the solutions to face this challenge. In the past several years, data mining has made a significant contribution to the field of information science. This paper examines the impact of data mining by reviewing existing applications, including personalized environments, electronic commerce, and search engines. For these three types of application, how data mining can enhance their functions is discussed. The reader of this paper is expected to get an overview of the state of the art research associated with these applications. Furthermore, we identify the limitations of current work and raise several directions for future research

    Developing Critical Thinking in online search

    Get PDF
    Digital skills especially those related to Information Literacy, are today considered fundamental to the education of students, both at school and at university. Searching and evaluating information found on the Internet is surely an important competency. An effective way to develop this competency is to educate students about the development of critical thinking. The article presents a qualitative-quantitative survey conducted during a course in Educational Technologies within a five year Degree program. The outcomes of the survey reveal some interesting behaviors and perceptions of students when they are faced with the Web search process and the characteristics of their critical thinking processes: some aspects of critical thinking are generally well supported, but others are acquired only after specific training. Experience shows that if properly motivated by metacognitive reflections and a clear method, students can actually critically evaluate the information presented online, the sources, and the sustainability of the arguments found. Positive results also occurred when the evaluation process was done in a collaborative modality

    New perspectives on Web search engine research

    Get PDF
    Purpose–The purpose of this chapter is to give an overview of the context of Web search and search engine-related research, as well as to introduce the reader to the sections and chapters of the book. Methodology/approach–We review literature dealing with various aspects of search engines, with special emphasis on emerging areas of Web searching, search engine evaluation going beyond traditional methods, and new perspectives on Webs earching. Findings–The approaches to studying Web search engines are manifold. Given the importance of Web search engines for knowledge acquisition, research from different perspectives needs to be integrated into a more cohesive perspective. Researchlimitations/implications–The chapter suggests a basis for research in the field and also introduces further research directions. Originality/valueofpaper–The chapter gives a concise overview of the topics dealt with in the book and also shows directions for researchers interested in Web search engines

    Youth and Digital Media: From Credibility to Information Quality

    Get PDF
    Building upon a process-and context-oriented information quality framework, this paper seeks to map and explore what we know about the ways in which young users of age 18 and under search for information online, how they evaluate information, and how their related practices of content creation, levels of new literacies, general digital media usage, and social patterns affect these activities. A review of selected literature at the intersection of digital media, youth, and information quality -- primarily works from library and information science, sociology, education, and selected ethnographic studies -- reveals patterns in youth's information-seeking behavior, but also highlights the importance of contextual and demographic factors both for search and evaluation. Looking at the phenomenon from an information-learning and educational perspective, the literature shows that youth develop competencies for personal goals that sometimes do not transfer to school, and are sometimes not appropriate for school. Thus far, educational initiatives to educate youth about search, evaluation, or creation have depended greatly on the local circumstances for their success or failure
    • 

    corecore