9,591 research outputs found

    An Investigation on Text-Based Cross-Language Picture Retrieval Effectiveness through the Analysis of User Queries

    Get PDF
    Purpose: This paper describes a study of the queries generated from a user experiment for cross-language information retrieval (CLIR) from a historic image archive. Italian speaking users generated 618 queries for a set of known-item search tasks. The queries generated by user’s interaction with the system have been analysed and the results used to suggest recommendations for the future development of cross-language retrieval systems for digital image libraries. Methodology: A controlled lab-based user study was carried out using a prototype Italian-English image retrieval system. Participants were asked to carry out searches for 16 images provided to them, a known-item search task. User’s interactions with the system were recorded and queries were analysed manually quantitatively and qualitatively. Findings: Results highlight the diversity in requests for similar visual content and the weaknesses of Machine Translation for query translation. Through the manual translation of queries we show the benefits of using high-quality translation resources. The results show the individual characteristics of user’s whilst performing known-item searches and the overlap obtained between query terms and structured image captions, highlighting the use of user’s search terms for objects within the foreground of an image. Limitations and Implications: This research looks in-depth into one case of interaction and one image repository. Despite this limitation, the discussed results are likely to be valid across other languages and image repository. Value: The growing quantity of digital visual material in digital libraries offers the potential to apply techniques from CLIR to provide cross-language information access services. However, to develop effective systems requires studying user’s search behaviours, particularly in digital image libraries. The value of this paper is in the provision of empirical evidence to support recommendations for effective cross-language image retrieval system design.</p

    Deduction over Mixed-Level Logic Representations for Text Passage Retrieval

    Full text link
    A system is described that uses a mixed-level representation of (part of) meaning of natural language documents (based on standard Horn Clause Logic) and a variable-depth search strategy that distinguishes between the different levels of abstraction in the knowledge representation to locate specific passages in the documents. Mixed-level representations as well as variable-depth search strategies are applicable in fields outside that of NLP.Comment: 8 pages, Proceedings of the Eighth International Conference on Tools with Artificial Intelligence (TAI'96), Los Alamitos C

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Logical-Linguistic Model and Experiments in Document Retrieval

    Get PDF
    Conventional document retrieval systems have relied on the extensive use of the keyword approach with statistical parameters in their implementations. Now, it seems that such an approach has reached its upper limit of retrieval effectiveness, and therefore, new approaches should be investigated for the development of future systems. With current advances in hardware, programming languages and techniques, natural language processing and understanding, and generally, in the field of artificial intelligence, there are now attempts being made to include linguistic processing into document retrieval systems. Few attempts have been made to include parsing or syntactic analysis into document retrieval systems, and the results reported show some improvements in the level of retrieval effectiveness. The first part of this thesis sets out to investigate further the use of linguistic processing by including translation, instead of only parsing, into a document retrieval system. The translation process implemented is based on unification categorial grammar and uses C-Prolog as the building tool. It is used as the main part of the indexing process of documents and queries into a knowledge base predicate representation. Instead of using the vector space model to represent documents and queries, we have used a kind of knowledge base model which we call logical-linguistic model. A development of a robust parser-translator to perform the translation is discussed in detail in the thesis. A method of dealing with ambiguity is also incorporated in the parser-translator implementation. The retrieval process of this model is based on a logical implication process implemented in C-Prolog. In order to handle uncertainty in evaluating similarity values between documents and queries, meta level constructs are built upon the C-Prolog system. A logical meta language, called UNIL (UNcertain Implication Language), is proposed for controlling the implication process. Using UNIL, one can write a set of implication rules and thesaurus to define the matching function of a particular retrieval strategy. Thus, we have demonstrated and implemented the matching operation between a document and a query as an inference using unification. An inference from a document to a query is done in the context of global information represented by the implication rules and the thesaurus. A set of well structured experiments is performed with various retrieval strategies on a test collection of documents and queries in order to evaluate the performance of the system. The results obtained are analysed and discussed. The second part of the thesis sets out to implement and evaluate the imaging retrieval strategy as originally defined by van Rijsbergen. The imaging retrieval is implemented as a relevance feedback retrieval with nearest neighbour information which is defined as follows. One of the best retrieval strategies from the earlier experiments is chosen to perform the initial ranking of the documents, and a few top ranked documents will be retrieved and identified as relevant or not by the user. From this set of retrieved and relevant documents, we can obtain all other unretrieved documents which have any of the retrieved and relevant documents as their nearest neighbour. These unretrieved documents have the potential of also being relevant since they are 'close' to the retrieved and relevant ones, and thus their initial similarity values to the query will be updated according to their distances from their nearest neighbours. From the updated similarity values, a new ranking of documents can be obtained and evaluated. A few sets of experiments using imaging retrieval strategy are performed for the following objectives: to search for an appropriate updating function in order to produce a new ranking of documents, to determine an appropriate nearest neighbour set, to find the relationship of the retrieval effectiveness to the size of the documents shown to the user for relevance judgement, and lastly, to find the effectiveness of a multi-stage imaging retrieval. The results obtained are analysed and discussed. Generally, the thesis sets out to define the logical-linguistic model in document retrieval and demonstrates it by building an experimental system which will be referred to as SILOL (a Simple Logical-linguistic document retrieval system). A set of retrieval strategies will be experimented with and the results obtained will be analysed and discussed

    Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture

    Full text link
    We present the architecture behind Twitter's real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time "twist": after significant breaking news events, we aim to provide relevant results within minutes. This paper provides a case study illustrating the challenges of real-time data processing in the era of "big data". We tell the story of how our system was built twice: our first implementation was built on a typical Hadoop-based analytics stack, but was later replaced because it did not meet the latency requirements necessary to generate meaningful real-time results. The second implementation, which is the system deployed in production, is a custom in-memory processing engine specifically designed for the task. This experience taught us that the current typical usage of Hadoop as a "big data" platform, while great for experimentation, is not well suited to low-latency processing, and points the way to future work on data analytics platforms that can handle "big" as well as "fast" data

    An investigation into teaching description and retrieval for constructed languages : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University

    Get PDF
    The research presented in this thesis focuses on an investigation on teaching concepts for constructed languages, and the development of a teaching tool, called VISL, for teaching a specific constructed language. Constructed languages have been developed for integration with computer systems to overcome ambiguities and complexities existing in natural language in information description and retrieval. Understanding and using properly these languages is one of the keys for successful use of these computer systems Unfortunately, current teaching approaches are not suitable for users to learn features of those languages easily. There are different types of constructed languages. Each has specific features adapted for specific uses but they have in common explicitly constructed grammar. In addition, a constructed language commonly embeds a powerful query engine that makes it easy for computer systems to search for correct information from descriptions following the conditions of the queries. This suggests new teaching principles that should be easily adaptable to teach any specific structured language's structures and its specific query engine. In this research, teaching concepts were developed that offer a multi-modal approach to teach constructed languages and their specific query engines. These concepts are developed based on the efficiencies of language structure diagrams over the cumbersome and non-transparent nature of textual explanations, and advantages of active learning strategies in enhancing language understanding. These teaching concepts then were applied successfully for a constructed language, FSCL, as an example The research also explains howr the concepts developed can be adapted for other constructed languages. Based on the developed concepts, a Computer Aided Language Learning (CALL) application called VISL is built to teach FSCL. The application is integrated as an extension module in PAC, the computer system using FSCL for description and retrieval of information in qualitative analysis. In this application, users will learn FSCL through an interconnection of four modes: FSCL structures through the first two modes and its specific query engine through the sccond two modes After going through four modes, users will have developed full understanding for the language. This will help users to construct a consistent vocabulary database, produce descriptive sentences conducive to retrieval, and create appropriate query sentences for obtaining relevant search results
    • …
    corecore