133,531 research outputs found
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Software-implemented attack tolerance for critical information retrieval
The fast-growing reliance of our daily life upon online information services often demands an appropriate level of privacy protection as well as highly available service provision. However, most existing solutions have attempted to address these problems separately. This thesis investigates and presents a solution that provides both privacy protection and fault tolerance for online information retrieval. A new approach to Attack-Tolerant Information Retrieval (ATIR) is developed based on an extension of existing theoretical results for Private Information Retrieval (PIR). ATIR uses replicated services to protect a user's privacy and to ensure service availability. In particular, ATIR can tolerate any collusion of up to t servers for privacy violation and up to ƒ faulty (either crashed or malicious) servers in a system with k replicated servers, provided that k ≥ t + ƒ + 1 where t ≥ 1 and ƒ ≤ t. In contrast to other related approaches, ATIR relies on neither enforced trust assumptions, such as the use of tanker-resistant hardware and trusted third parties, nor an increased number of replicated servers. While the best solution known so far requires k (≥ 3t + 1) replicated servers to cope with t malicious servers and any collusion of up to t servers with an O(n^*^) communication complexity, ATIR uses fewer servers with a much improved communication cost, O(n1/2)(where n is the size of a database managed by a server).The majority of current PIR research resides on a theoretical level. This thesis provides both theoretical schemes and their practical implementations with good performance results. In a LAN environment, it takes well under half a second to use an ATIR service for calculations over data sets with a size of up to 1MB. The performance of the ATIR systems remains at the same level even in the presence of server crashes and malicious attacks. Both analytical results and experimental evaluation show that ATIR offers an attractive and practical solution for ever-increasing online information applications
Recommended from our members
Teaching and learning in information retrieval
A literature review of pedagogical methods for teaching and learning information retrieval is presented. From the analysis of the literature a taxonomy was built and it is used to structure the paper. Information Retrieval (IR) is presented from different points of view: technical levels, educational goals, teaching and learning methods, assessment and curricula. The review is organized around two levels of abstraction which form a taxonomy that deals with the different aspects of pedagogy as applied to information retrieval. The first level looks at the technical level of delivering information retrieval concepts, and at the educational goals as articulated by the two main subject domains where IR is delivered: computer science (CS) and library and information science (LIS). The second level focuses on pedagogical issues, such as teaching and learning methods, delivery modes (classroom, online or e-learning), use of IR systems for teaching, assessment and feedback, and curricula design. The survey, and its bibliography, provides an overview of the pedagogical research carried out in the field of IR. It also provides a guide for educators on approaches that can be applied to improving the student learning experiences
Thesauri : practical guidance for construction
Purpose - With the growing recognition that thesauri aid information retrieval, organisations are beginning to adopt, and in many cases, create thesauri. This paper offers some guidance on the construction process. Design/methodology/approach - An opinion piece with a practical focus, based on recent experiences gleaned from consultancy work. Findings - A number of steps can be taken to ensure any thesaurus under construction is fit for purpose. Due consideration is therefore given to aspects such as term selection, structure and notation, thesauri standards, software and Web display issues, thesauri evaluation and maintenance. This paper also notes that creating new subject schemes from scratch, however attractive, contributes to the plethora of terminologies currently in existence and can limit user searching within particular contexts. The decision to create a "new" thesaurus should therefore be taken carefully and observance of standards is paramount. Practical implications - This paper offers advice to assist practitioners in the development of thesauri. Originality/value - Useful guidance for those practitioners new to the area of thesaurus construction is provided, together with an overview of selected key processes involved in the construction of a thesaurus
Evaluating the retrieval effectiveness of Web search engines using a representative query sample
Search engine retrieval effectiveness studies are usually small-scale, using
only limited query samples. Furthermore, queries are selected by the
researchers. We address these issues by taking a random representative sample
of 1,000 informational and 1,000 navigational queries from a major German
search engine and comparing Google's and Bing's results based on this sample.
Jurors were found through crowdsourcing, data was collected using specialised
software, the Relevance Assessment Tool (RAT). We found that while Google
outperforms Bing in both query types, the difference in the performance for
informational queries was rather low. However, for navigational queries, Google
found the correct answer in 95.3 per cent of cases whereas Bing only found the
correct answer 76.6 per cent of the time. We conclude that search engine
performance on navigational queries is of great importance, as users in this
case can clearly identify queries that have returned correct results. So,
performance on this query type may contribute to explaining user satisfaction
with search engines
An inquiry-based learning approach to teaching information retrieval
The study of information retrieval (IR) has increased in interest and importance with the explosive growth of online information in recent years. Learning about IR within formal courses of study enables users of search engines to use
them more knowledgeably and effectively, while providing the starting point for the explorations of new researchers into novel search technologies. Although IR can be taught in a traditional manner of formal classroom instruction with students being led through the details of the subject and expected to reproduce this in assessment, the nature of IR as a topic makes it an ideal subject for inquiry-based learning approaches to teaching. In an inquiry-based learning approach students are introduced to the principles of a subject and then encouraged to develop their understanding by solving structured or open problems. Working through solutions in subsequent class discussions enables students to appreciate the availability of alternative solutions as proposed by their classmates. Following this approach students not only learn the details of IR techniques, but significantly, naturally learn to apply them in solution of problems. In doing this they not only gain an appreciation of alternative solutions to a problem, but also how to assess their relative strengths and weaknesses. Developing confidence and skills in problem solving enables student assessment to be structured around solution of problems. Thus students can be assessed on the basis of their understanding and ability to apply techniques, rather simply their skill at reciting facts. This has the additional benefit of encouraging general problem solving skills which can be of benefit in other subjects. This approach to teaching IR was successfully implemented in an undergraduate module where students were
assessed in a written examination exploring their knowledge and understanding of the principles of IR and their ability to apply them to solving problems, and a written assignment based on developing an individual research proposal
Group Invariant Deep Representations for Image Instance Retrieval
Most image instance retrieval pipelines are based on comparison of vectors
known as global image descriptors between a query image and the database
images. Due to their success in large scale image classification,
representations extracted from Convolutional Neural Networks (CNN) are quickly
gaining ground on Fisher Vectors (FVs) as state-of-the-art global descriptors
for image instance retrieval. While CNN-based descriptors are generally
remarked for good retrieval performance at lower bitrates, they nevertheless
present a number of drawbacks including the lack of robustness to common object
transformations such as rotations compared with their interest point based FV
counterparts.
In this paper, we propose a method for computing invariant global descriptors
from CNNs. Our method implements a recently proposed mathematical theory for
invariance in a sensory cortex modeled as a feedforward neural network. The
resulting global descriptors can be made invariant to multiple arbitrary
transformation groups while retaining good discriminativeness.
Based on a thorough empirical evaluation using several publicly available
datasets, we show that our method is able to significantly and consistently
improve retrieval results every time a new type of invariance is incorporated.
We also show that our method which has few parameters is not prone to
overfitting: improvements generalize well across datasets with different
properties with regard to invariances. Finally, we show that our descriptors
are able to compare favourably to other state-of-the-art compact descriptors in
similar bitranges, exceeding the highest retrieval results reported in the
literature on some datasets. A dedicated dimensionality reduction step
--quantization or hashing-- may be able to further improve the competitiveness
of the descriptors
An introduction to crowdsourcing for language and multimedia technology research
Language and multimedia technology research often relies on
large manually constructed datasets for training or evaluation of algorithms and systems. Constructing these datasets is often expensive with significant challenges in terms of recruitment of personnel to carry out the work. Crowdsourcing methods using scalable pools of workers available on-demand offers a flexible means of rapid low-cost construction of many of these datasets to support existing research requirements and potentially promote new research initiatives that would otherwise not be possible
- …