939 research outputs found

    Query Formulation Assistance for Kids: What is Available, When to Help & What Kids Want

    Get PDF
    Children use popular web search tools, which are generally designed for adult users. Because children have different developmental needs than adults, these tools may not always adequately support their search for information. Moreover, even though search tools offer support to help in query formulation, these too are aimed at adults and may hinder children rather than help them. This calls for the examination of existing technologies in this area, to better understand what remains to be done when it comes to facilitating query-formulation tasks for young users. In this paper, we investigate interaction elements of query formulation–including query suggestion algorithms–for children. The primary goals of our research efforts are to: (i) examine existing plug-ins and interfaces that explicitly aid children’s query formulation; (ii) investigate children’s interactions with suggestions offered by a general-purpose query suggestion strategy vs. a counterpart designed with children in mind; and (iii) identify, via participatory design sessions, their preferences when it comes to tools / strategies that can help children find information and guide them through the query formulation process. Our analysis shows that existing tools do not meet children’s needs and expectations; the outcomes of our work can guide researchers and developers as they implement query formulation strategies for children

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Content Recognition and Context Modeling for Document Analysis and Retrieval

    Get PDF
    The nature and scope of available documents are changing significantly in many areas of document analysis and retrieval as complex, heterogeneous collections become accessible to virtually everyone via the web. The increasing level of diversity presents a great challenge for document image content categorization, indexing, and retrieval. Meanwhile, the processing of documents with unconstrained layouts and complex formatting often requires effective leveraging of broad contextual knowledge. In this dissertation, we first present a novel approach for document image content categorization, using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant local shape feature that is generic enough to be detected repeatably and is segmentation free. A concise, structurally indexed shape lexicon is learned by clustering and partitioning feature types through graph cuts. Our idea finds successful application in several challenging tasks, including content recognition of diverse web images and language identification on documents composed of mixed machine printed text and handwriting. Second, we address two fundamental problems in signature-based document image retrieval. Facing continually increasing volumes of documents, detecting and recognizing unique, evidentiary visual entities (\eg, signatures and logos) provides a practical and reliable supplement to the OCR recognition of printed text. We propose a novel multi-scale framework to detect and segment signatures jointly from document images, based on the structural saliency under a signature production model. We formulate the problem of signature retrieval in the unconstrained setting of geometry-invariant deformable shape matching and demonstrate state-of-the-art performance in signature matching and verification. Third, we present a model-based approach for extracting relevant named entities from unstructured documents. In a wide range of applications that require structured information from diverse, unstructured document images, processing OCR text does not give satisfactory results due to the absence of linguistic context. Our approach enables learning of inference rules collectively based on contextual information from both page layout and text features. Finally, we demonstrate the importance of mining general web user behavior data for improving document ranking and other web search experience. The context of web user activities reveals their preferences and intents, and we emphasize the analysis of individual user sessions for creating aggregate models. We introduce a novel algorithm for estimating web page and web site importance, and discuss its theoretical foundation based on an intentional surfer model. We demonstrate that our approach significantly improves large-scale document retrieval performance

    Entity-Oriented Search

    Get PDF
    This open access book covers all facets of entity-oriented search—where “search” can be interpreted in the broadest sense of information access—from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book. The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)—a process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research. Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms

    Inferring User Needs and Tasks from User Interactions

    Get PDF
    The need for search often arises from a broad range of complex information needs or tasks (such as booking travel, buying a house, etc.) which lead to lengthy search processes characterised by distinct stages and goals. While existing search systems are adept at handling simple information needs, they offer limited support for tackling complex tasks. Accurate task representations could be useful in aptly placing users in the task-subtask space and enable systems to contextually target the user, provide them better query suggestions, personalization and recommendations and help in gauging satisfaction. The major focus of this thesis is to work towards task based information retrieval systems - search systems which are adept at understanding, identifying and extracting tasks as well as supporting user’s complex search task missions. This thesis focuses on two major themes: (i) developing efficient algorithms for understanding and extracting search tasks from log user and (ii) leveraging the extracted task information to better serve the user via different applications. Based on log analysis on a tera-byte scale data from a real-world search engine, detailed analysis is provided on user interactions with search engines. On the task extraction side, two bayesian non-parametric methods are proposed to extract subtasks from a complex task and to recursively extract hierarchies of tasks and subtasks. A novel coupled matrix-tensor factorization model is proposed that represents user based on their topical interests and task behaviours. Beyond personalization, the thesis demonstrates that task information provides better context to learn from and proposes a novel neural task context embedding architecture to learn query representations. Finally, the thesis examines implicit signals of user interactions and considers the problem of predicting user’s satisfaction when engaged in complex search tasks. A unified multi-view deep sequential model is proposed to make query and task level satisfaction prediction

    Spatial and temporal-based query disambiguation for improving web search

    Get PDF
    Queries submitted to search engines are ambiguous in nature due to users’ irrelevant input which poses real challenges to web search engines both towards understanding a query and giving results. A lot of irrelevant and ambiguous information creates disappointment among users. Thus, this research proposes an ambiguity evolvement process followed by an integrated use of spatial and temporal features to alleviate the search results imprecision. To enhance the effectiveness of web information retrieval the study develops an enhanced Adaptive Disambiguation Approach for web search queries to overcome the problems caused by ambiguous queries. A query classification method was used to filter search results to overcome the imprecision. An algorithm was utilized for finding the similarity of the search results based on spatial and temporal features. Users’ selection based on web results facilitated recording of implicit feedback which was then utilized for web search improvement. Performance evaluation was conducted on data sets GISQC_DS, AMBIENT and MORESQUE comprising of ambiguous queries to certify the effectiveness of the proposed approach in comparison to a well-known temporal evaluation and two-box search methods. The implemented prototype is focused on ambiguous queries to be classified by spatial or temporal features. Spatial queries focus on targeting the location information whereas temporal queries target time in years. In conclusion, the study used search results in the context of Spatial Information Retrieval (S-IR) along with temporal information. Experiments results show that the use of spatial and temporal features in combination can significantly improve the performance in terms of precision (92%), accuracy (93%), recall (95%), and f-measure (93%). Moreover, the use of implicit feedback has a significant impact on the search results which has been demonstrated through experimental evaluation.SHAHID KAMA
    corecore