133,531 research outputs found

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Software-implemented attack tolerance for critical information retrieval

    Get PDF
    The fast-growing reliance of our daily life upon online information services often demands an appropriate level of privacy protection as well as highly available service provision. However, most existing solutions have attempted to address these problems separately. This thesis investigates and presents a solution that provides both privacy protection and fault tolerance for online information retrieval. A new approach to Attack-Tolerant Information Retrieval (ATIR) is developed based on an extension of existing theoretical results for Private Information Retrieval (PIR). ATIR uses replicated services to protect a user's privacy and to ensure service availability. In particular, ATIR can tolerate any collusion of up to t servers for privacy violation and up to ƒ faulty (either crashed or malicious) servers in a system with k replicated servers, provided that k ≥ t + ƒ + 1 where t ≥ 1 and ƒ ≤ t. In contrast to other related approaches, ATIR relies on neither enforced trust assumptions, such as the use of tanker-resistant hardware and trusted third parties, nor an increased number of replicated servers. While the best solution known so far requires k (≥ 3t + 1) replicated servers to cope with t malicious servers and any collusion of up to t servers with an O(n^*^) communication complexity, ATIR uses fewer servers with a much improved communication cost, O(n1/2)(where n is the size of a database managed by a server).The majority of current PIR research resides on a theoretical level. This thesis provides both theoretical schemes and their practical implementations with good performance results. In a LAN environment, it takes well under half a second to use an ATIR service for calculations over data sets with a size of up to 1MB. The performance of the ATIR systems remains at the same level even in the presence of server crashes and malicious attacks. Both analytical results and experimental evaluation show that ATIR offers an attractive and practical solution for ever-increasing online information applications

    Thesauri : practical guidance for construction

    Get PDF
    Purpose - With the growing recognition that thesauri aid information retrieval, organisations are beginning to adopt, and in many cases, create thesauri. This paper offers some guidance on the construction process. Design/methodology/approach - An opinion piece with a practical focus, based on recent experiences gleaned from consultancy work. Findings - A number of steps can be taken to ensure any thesaurus under construction is fit for purpose. Due consideration is therefore given to aspects such as term selection, structure and notation, thesauri standards, software and Web display issues, thesauri evaluation and maintenance. This paper also notes that creating new subject schemes from scratch, however attractive, contributes to the plethora of terminologies currently in existence and can limit user searching within particular contexts. The decision to create a "new" thesaurus should therefore be taken carefully and observance of standards is paramount. Practical implications - This paper offers advice to assist practitioners in the development of thesauri. Originality/value - Useful guidance for those practitioners new to the area of thesaurus construction is provided, together with an overview of selected key processes involved in the construction of a thesaurus

    Evaluating the retrieval effectiveness of Web search engines using a representative query sample

    Full text link
    Search engine retrieval effectiveness studies are usually small-scale, using only limited query samples. Furthermore, queries are selected by the researchers. We address these issues by taking a random representative sample of 1,000 informational and 1,000 navigational queries from a major German search engine and comparing Google's and Bing's results based on this sample. Jurors were found through crowdsourcing, data was collected using specialised software, the Relevance Assessment Tool (RAT). We found that while Google outperforms Bing in both query types, the difference in the performance for informational queries was rather low. However, for navigational queries, Google found the correct answer in 95.3 per cent of cases whereas Bing only found the correct answer 76.6 per cent of the time. We conclude that search engine performance on navigational queries is of great importance, as users in this case can clearly identify queries that have returned correct results. So, performance on this query type may contribute to explaining user satisfaction with search engines

    An inquiry-based learning approach to teaching information retrieval

    Get PDF
    The study of information retrieval (IR) has increased in interest and importance with the explosive growth of online information in recent years. Learning about IR within formal courses of study enables users of search engines to use them more knowledgeably and effectively, while providing the starting point for the explorations of new researchers into novel search technologies. Although IR can be taught in a traditional manner of formal classroom instruction with students being led through the details of the subject and expected to reproduce this in assessment, the nature of IR as a topic makes it an ideal subject for inquiry-based learning approaches to teaching. In an inquiry-based learning approach students are introduced to the principles of a subject and then encouraged to develop their understanding by solving structured or open problems. Working through solutions in subsequent class discussions enables students to appreciate the availability of alternative solutions as proposed by their classmates. Following this approach students not only learn the details of IR techniques, but significantly, naturally learn to apply them in solution of problems. In doing this they not only gain an appreciation of alternative solutions to a problem, but also how to assess their relative strengths and weaknesses. Developing confidence and skills in problem solving enables student assessment to be structured around solution of problems. Thus students can be assessed on the basis of their understanding and ability to apply techniques, rather simply their skill at reciting facts. This has the additional benefit of encouraging general problem solving skills which can be of benefit in other subjects. This approach to teaching IR was successfully implemented in an undergraduate module where students were assessed in a written examination exploring their knowledge and understanding of the principles of IR and their ability to apply them to solving problems, and a written assignment based on developing an individual research proposal

    Group Invariant Deep Representations for Image Instance Retrieval

    Get PDF
    Most image instance retrieval pipelines are based on comparison of vectors known as global image descriptors between a query image and the database images. Due to their success in large scale image classification, representations extracted from Convolutional Neural Networks (CNN) are quickly gaining ground on Fisher Vectors (FVs) as state-of-the-art global descriptors for image instance retrieval. While CNN-based descriptors are generally remarked for good retrieval performance at lower bitrates, they nevertheless present a number of drawbacks including the lack of robustness to common object transformations such as rotations compared with their interest point based FV counterparts. In this paper, we propose a method for computing invariant global descriptors from CNNs. Our method implements a recently proposed mathematical theory for invariance in a sensory cortex modeled as a feedforward neural network. The resulting global descriptors can be made invariant to multiple arbitrary transformation groups while retaining good discriminativeness. Based on a thorough empirical evaluation using several publicly available datasets, we show that our method is able to significantly and consistently improve retrieval results every time a new type of invariance is incorporated. We also show that our method which has few parameters is not prone to overfitting: improvements generalize well across datasets with different properties with regard to invariances. Finally, we show that our descriptors are able to compare favourably to other state-of-the-art compact descriptors in similar bitranges, exceeding the highest retrieval results reported in the literature on some datasets. A dedicated dimensionality reduction step --quantization or hashing-- may be able to further improve the competitiveness of the descriptors

    An introduction to crowdsourcing for language and multimedia technology research

    Get PDF
    Language and multimedia technology research often relies on large manually constructed datasets for training or evaluation of algorithms and systems. Constructing these datasets is often expensive with significant challenges in terms of recruitment of personnel to carry out the work. Crowdsourcing methods using scalable pools of workers available on-demand offers a flexible means of rapid low-cost construction of many of these datasets to support existing research requirements and potentially promote new research initiatives that would otherwise not be possible
    corecore