68,849 research outputs found
SparkIR: a Scalable Distributed Information Retrieval Engine over Spark
Search engines have to deal with a huge amount of data (e.g., billions of
documents in the case of the Web) and find scalable and efficient ways to produce
effective search results. In this thesis, we propose to use Spark framework, an in
memory distributed big data processing framework, and leverage its powerful
capabilities of handling large amount of data to build an efficient and scalable
experimental search engine over textual documents. The proposed system, SparkIR,
can serve as a research framework for conducting information retrieval (IR)
experiments. SparkIR supports two indexing schemes, document-based partitioning
and term-based partitioning, to adopt document-at-a-time (DAAT) and term-at-a-time
(TAAT) query evaluation methods. Moreover, it offers static and dynamic pruning to
improve the retrieval efficiency. For static pruning, it employs champion list and
tiering, while for dynamic pruning, it uses MaxScore top k retrieval. We evaluated the
performance of SparkIR using ClueWeb12-B13 collection that contains about 50M
English Web pages. Experiments over different subsets of the collection and
compared the Elasticsearch baseline show that SparkIR exhibits reasonable efficiency
and scalability performance overall for both indexing and retrieval. Implemented as
an open-source library over Spark, users of SparkIR can also benefit from other Spark
libraries (e.g., MLlib and GraphX), which, therefore, eliminates the need of usin
Recommended from our members
Teaching and learning in information retrieval
A literature review of pedagogical methods for teaching and learning information retrieval is presented. From the analysis of the literature a taxonomy was built and it is used to structure the paper. Information Retrieval (IR) is presented from different points of view: technical levels, educational goals, teaching and learning methods, assessment and curricula. The review is organized around two levels of abstraction which form a taxonomy that deals with the different aspects of pedagogy as applied to information retrieval. The first level looks at the technical level of delivering information retrieval concepts, and at the educational goals as articulated by the two main subject domains where IR is delivered: computer science (CS) and library and information science (LIS). The second level focuses on pedagogical issues, such as teaching and learning methods, delivery modes (classroom, online or e-learning), use of IR systems for teaching, assessment and feedback, and curricula design. The survey, and its bibliography, provides an overview of the pedagogical research carried out in the field of IR. It also provides a guide for educators on approaches that can be applied to improving the student learning experiences
Profiling and understanding student information behaviour: Methodologies and meaning
This paper draws on work conducted under the Joint Information Systems Committee (JISC) User Behaviour Monitoring and Evaluation Framework to identify a range of issues associated with research design that can form a platform for enquiry about knowledge creation in the arena of user behaviour. The Framework has developed a multidimensional set of tools for profiling, monitoring and evaluating user behaviour. The Framework has two main approaches: one, a broadâbased survey which generates both a qualitative and a quantitative profile of user behaviour, and the other a longitudinal qualitative study of user behaviour that (in addition to providing inâdepth insights) is the basis for the development of the EIS (Electronic Information Services) Diagnostic Toolkit. The strengths and weaknesses of the Framework approach are evaluated. In the context of profiling user behaviour, key methodological concerns relate to: representativeness, sampling and access, the selection of appropriate measures and the interpretation of those measures. Qualitative approaches are used to generate detailed insights. These include detailed narratives, case study analysis and gap analysis. The messages from this qualitative analysis do not lend themselves to simple summarization. One approach that has been employed to capture and interpret these messages is the development of the EIS Diagnostic Toolkit. This toolkit can be used to assess and monitor an institution's progress with embedding EIS into learning processes. Finally, consideration must be given to integration of insights generated through different strands within the Framework
Which one is better: presentation-based or content-based math search?
Mathematical content is a valuable information source and retrieving this
content has become an important issue. This paper compares two searching
strategies for math expressions: presentation-based and content-based
approaches. Presentation-based search uses state-of-the-art math search system
while content-based search uses semantic enrichment of math expressions to
convert math expressions into their content forms and searching is done using
these content-based expressions. By considering the meaning of math
expressions, the quality of search system is improved over presentation-based
systems
The contribution of data mining to information science
The information explosion is a serious challenge for current information institutions. On the other hand, data mining, which is the search for valuable information in large volumes of data, is one of the solutions to face this challenge. In the past several years, data mining has made a significant contribution to the field of information science. This paper examines the impact of data mining by reviewing existing applications, including personalized environments, electronic commerce, and search engines. For these three types of application, how data mining can enhance their functions is discussed. The reader of this paper is expected to get an overview of the state of the art research associated with these applications. Furthermore, we identify the limitations of current work and raise several directions for future research
Developing Critical Thinking in online search
Digital skills especially those related to Information Literacy, are today
considered fundamental to the education of students, both at school
and at university. Searching and evaluating information found on the
Internet is surely an important competency. An effective way to develop
this competency is to educate students about the development of critical
thinking. The article presents a qualitative-quantitative survey conducted
during a course in Educational Technologies within a five year Degree
program. The outcomes of the survey reveal some interesting behaviors and
perceptions of students when they are faced with the Web search process
and the characteristics of their critical thinking processes: some aspects
of critical thinking are generally well supported, but others are acquired
only after specific training. Experience shows that if properly motivated by
metacognitive reflections and a clear method, students can actually critically
evaluate the information presented online, the sources, and the sustainability of the arguments found. Positive results also occurred when the evaluation process was done in a collaborative modality
New perspectives on Web search engine research
PurposeâThe purpose of this chapter is to give an overview of the context of Web search and search engine-related research, as well as to introduce the reader to the sections and chapters of the book. Methodology/approachâWe review literature dealing with various aspects of search engines, with special emphasis on emerging areas of Web searching, search engine evaluation going beyond traditional methods, and new perspectives on Webs earching. FindingsâThe approaches to studying Web search engines are manifold. Given the importance of Web search engines for knowledge acquisition, research from different perspectives needs to be integrated into a more cohesive perspective. Researchlimitations/implicationsâThe chapter suggests a basis for research in the field and also introduces further research directions. Originality/valueofpaperâThe chapter gives a concise overview of the topics dealt with in the book and also shows directions for researchers interested in Web search engines
Youth and Digital Media: From Credibility to Information Quality
Building upon a process-and context-oriented information quality framework, this paper seeks to map and explore what we know about the ways in which young users of age 18 and under search for information online, how they evaluate information, and how their related practices of content creation, levels of new literacies, general digital media usage, and social patterns affect these activities. A review of selected literature at the intersection of digital media, youth, and information quality -- primarily works from library and information science, sociology, education, and selected ethnographic studies -- reveals patterns in youth's information-seeking behavior, but also highlights the importance of contextual and demographic factors both for search and evaluation. Looking at the phenomenon from an information-learning and educational perspective, the literature shows that youth develop competencies for personal goals that sometimes do not transfer to school, and are sometimes not appropriate for school. Thus far, educational initiatives to educate youth about search, evaluation, or creation have depended greatly on the local circumstances for their success or failure
- âŠ