14 research outputs found

    Mining Missing Hyperlinks from Human Navigation Traces: A Case Study of Wikipedia

    Full text link
    Hyperlinks are an essential feature of the World Wide Web. They are especially important for online encyclopedias such as Wikipedia: an article can often only be understood in the context of related articles, and hyperlinks make it easy to explore this context. But important links are often missing, and several methods have been proposed to alleviate this problem by learning a linking model based on the structure of the existing links. Here we propose a novel approach to identifying missing links in Wikipedia. We build on the fact that the ultimate purpose of Wikipedia links is to aid navigation. Rather than merely suggesting new links that are in tune with the structure of existing links, our method finds missing links that would immediately enhance Wikipedia's navigability. We leverage data sets of navigation paths collected through a Wikipedia-based human-computation game in which users must find a short path from a start to a target article by only clicking links encountered along the way. We harness human navigational traces to identify a set of candidates for missing links and then rank these candidates. Experiments show that our procedure identifies missing links of high quality

    Task-based user profiling for query refinement (toque)

    Get PDF
    The information needs of search engine users vary in complexity. Some simple needs can be satisfied by using a single query, while complicated ones require a series of queries spanning a period of time. A search task, consisting of a sequence of search queries serving the same information need, can be treated as an atomic unit for modeling user’s search preferences and has been applied in improving the accuracy of search results. However, existing studies on user search tasks mainly focus on applying user’s interests in re-ranking search results. Only few studies have examined the effects of utilizing search tasks to assist users in obtaining effective queries. Moreover, fewer existing studies have examined the dynamic characteristics of user’s search interests within a search task. Furthermore, even fewer studies have examined approaches to selective personalization for candidate refined queries that are expected to benefit from its application. This study proposes a framework of modeling user’s task-based dynamic search interests to address these issues and makes the following contributions. First, task identification: a cross-session based method is proposed to discover tasks by modeling the best-link structure of queries, based on the commonly shared clicked results. A graph-based representation method is introduced to improve the effectiveness of link prediction in a query sequence. Second, dynamic task-level search interest representation: a four-tuple user profiling model is introduced to represent long- and short-term user interests extracted from search tasks and sessions. It models user’s interests at the task level to re-rank candidate queries through modules of task identification and update. Third, selective personalization: a two-step personalization algorithm is proposed to improve the rankings of candidate queries for query refinement by assessing the task dependency via exploiting a latent task space. Experimental results show that the proposed TOQUE framework contributes to an increased precision of candidate queries and thus shortened search sessions

    Query Suggestion and Data Fusion in Contextual Disambiguation

    Full text link

    Analysis of web information-seeking behavior of users with different levels of health literacy

    Get PDF
    Literacia em Saúde é definida como "o nível pelo qual os indivíduos podem obter, processar, compreender e comunicar informação relacionada com saúde necessária para tomar decisões de saúde informadas". Os utilizadores com um baixo nível de literacia em saúde têm menos conhecimentos das suas condições médicas, maior dificuldade em seguir as instruções e compreender a informação dada pelos médicos. Cada vez mais, as pessoas recorrem à web para pesquisar sobre informação de saúde. As dificuldades que os utilizadores de baixa literacia têm no mundo real provavelmente persistem no mundo virtual. O principal objetivo deste estudo é analisar os comportamentos de pesquisa de utilizadores com diferentes níveis de literacia em saúde. Pretende-se identificar diferenças entre pessoas com baixa e alta literacia de saúde que depois possam ser utilizadas para a melhoria dos sistemas de recuperação e contribuir, entre outros, para facilitar o acesso à informação e educação das pessoas com baixa literacia. Este estudo surge na sequência de um trabalho prévio que incluiu a anotação dos registos de vídeo de uma experiência com utilizadores realizada anteriormente. Com base na versão preliminar de análise do trabalho anterior, foi proposto um esquema de classificação de eventos que engloba tipos de interação relativos ao navegador, motor de pesquisa e páginas web. Cada tipo de interação é composto por eventos que, por sua vez estão associados a variáveis de análise. Dentro deste esquema, foram construídos módulos para analisar as interrogações de pesquisa submetidas. Com base neste esquema, foi revista a anotação dos vídeos e foi realizada a análise de dados de forma descritiva e inferencial. Os principais resultados demonstram que o grupo de baixa literacia em saúde utilizou sobretudo a caixa do motor de pesquisa e a funcionalidade de voltar atrás; interagiu mais tempo com página de resultados do motor de pesquisa, clicando mais com o botão esquerdo do rato e fazendo scrolling. Por outro lado, o grupo de alta literacia em saúde utilizou mais a barra de endereço e a funcionalidade de selecionar o texto do URL. Na página de resultados do motor de pesquisa este grupo fez mais cliques com o botão direito. A nível de reformulação de interrogações, que ocorrem no contexto da mesma necessidade de informação, os utilizadores com baixa literacia em saúde usaram mais as reformulações "totalmente novas", ou seja, sem termos em comum com a interrogação anterior. Por sua vez, o grupo de alta literacia em saúde fez mais reformulações.Health Literacy is "the level by which individuals can obtain, process, understand and communicate health-related information necessary to make informed health decisions". Users with a low level of health literacy are less aware of their medical conditions, more difficult to follow instructions and understand doctors' information. Increasingly, people turn to the web to search for health information. Low literacy users' difficulties in the real world are likely to continue to exist in the virtual world. The main objective of this study is to analyze the search behavior of users with different levels of health literacy. It intends to identify differences between people with low and high health literacy that can then be used to improve retrieval systems and contribute, among others, to facilitate access to information and education by people with low literacy. This study follows a previous work that included annotating video records of experience with users previously carried out. Based on the preliminary analysis version of the previous work, an event classification scheme was proposed that includes types of interactions related to the browser, search engine, and web pages. Each type of interaction is composed of events that, in turn, are associated with analysis variables. Within this scheme, modules were built to analyze the formulation of search queries. Based on this scheme, the annotation of the videos was revised, and the data analysis was performed in a descriptive and inferential manner. The main results demonstrate that the low health literacy group used mainly the search engine box and the backward feature. On the search engine results page, they clicked more with the left mouse button. On the results page, they spent more time on the interaction, mainly scrolling. On the other hand, the high health literacy group made more use of the address bar and the functionality of selecting the URL text. On the search engine results page, this group made more right-clicks. At the level of reformulations, which occur in the context of the same need for information, users with low health literacy used more "totally new" reformulations, that is, without terms in common with the previous question. In turn, the high health literacy group did more reformulations

    Understanding the relationship between searchers' queries and information goals

    No full text
    We describe results from Web search log studies aimed at elucidating user behaviors associated with queries and destination URLs that appear with different frequencies. We note the diversity of information goals that searchers have and the differing ways that goals are specified. We examine rare and common information goals that are specified using rare or common queries. We identify several significant differences in user behavior depending on the rarity of the query and the destination URL. We find that searchers are more likely to be successful when the frequencies of the query and destination URL are similar. We also establish that the behavioral differences observed for queries and goals of varying rarity persist even after accounting for potential confounding variables, including query length, search engine ranking, session duration, and task difficulty. Finally, using an information-theoretic measure of search difficulty, we show that the benefits obtained by search and navigation actions depend on the frequency of the information goal

    SEARCHING AS THINKING: THE ROLE OF CUES IN QUERY REFORMULATION

    Get PDF
    Given the growing volume of information that surrounds us, search, and particularly web search, is now a fundamental part of how people perceive and experience the world. Understanding how searchers interact with search engines is thus an important topic both for designers of information retrieval systems and educators working in the area of digital literacy. Reaching such understanding, however, with the more established, system-centric, approaches in information retrieval (IR) is limited. While inherently iterative nature of the search process is generally acknowledged in the field of IR, research on query reformulation is typically limited to dealing with the what or the how of the query reformulation process. Drawing a complete picture of searchers\u27 behavior is thus incomplete without addressing the why of query reformulation, including what pieces of information, or cues, trigger the reformulation process. Unpacking that aspect of the searchers\u27 behavior requires a more user-centric approach. The overall goal of this study is to advance understanding of the reformulation process and the cues that influence it. It was driven by two broad questions about the use of cues (on the search engine result pages or the full web pages) in the searchers\u27 decisions regarding query reformulation and the effects of that use on search effectiveness. The study draws on data collected in a lab setting from a sample of students who performed a series of search tasks and then went through a process of stimulated recall focused on their query reformulations. Both, query reformulations recorded during the search tasks and cues elicited during the stimulated recall exercise, were coded and then modeled using the mixed effects method. The final models capture the relationships between cues and query reformulation strategies as well as cues and search effectiveness; in both cases some relationships are moderated by search expertise and domain knowledge. The results demonstrate that searchers systematically elicit and use cues with regard to query reformulation. Some of these relationships are independent from search expertise and domain knowledge, while others manifest themselves differently at different levels of search expertise and domain knowledge. Similarly, due to the fact that the majority of the reformulations in this study indicated a failure of the preceding query, mixed results were achieved with identifying relationships between the use of cues and search effectiveness. As a whole, this work offers two contributions to the field of user-centered information retrieval. First, it reaffirms some of the earlier conceptual work about the role of cues in search behavior, and then expands on it by proposing specific relationships between cues and reformulations. Second, it highlights potential design considerations in creating search engine results pages and query term suggestions, as well as and training suggestion for educators working on digital literacy

    An Approach for Intention-Driven, Dialogue-Based Web Search

    Get PDF
    Web search engines facilitate the achievement of Web-mediated tasks, including information retrieval, Web page navigation, and online transactions. These tasks often involve goals that pertain to multiple topics, or domains. Current search engines are not suitable for satisfying complex, multi-domain needs due to their lack of interactivity and knowledge. This thesis presents a novel intention-driven, dialogue-based Web search approach that uncovers and combines users\u27 multi-domain goals to provide helpful virtual assistance. The intention discovery procedure uses a hierarchy of Partially Observable Markov Decision Process-based dialogue managers and a backing knowledge base to systematically explore the dialogue\u27s information space, probabilistically refining the perception of user goals. The search approach has been implemented in IDS, a search engine for online gift shopping. A usability study comparing IDS-based searching with Google-based searching found that the IDS-based approach takes significantly less time and effort, and results in higher user confidence in the retrieved results

    Système participatif de tags iconiques basé sur un langage visuel instinctif multi-points de vue

    Get PDF
    Le système de tags pour un système d organisation des connaissances centralise et fournit les tags qui peuvent être utilisés pour classer, partager et rechercher des connaissances sur le web pour l utilisation personnelle ou organisationnelle. Bien que les études précédentes aient pensé à améliorer le système de tags visuels en utilisant des icônes, il existe dans ce cas le problème de reconnaissance, de mémorisation et de désorientation. Notre recherche se consacre à la recherche d'une nouvelle approche pour améliorer la représentation des tags et surtout de leur structure, dans un système où les icônes bien structurées pourront améliorer l'efficacité de tagage en considérant la qualité et la rapidité. Ce système de tags iconiques s organise sur un LVD (Langage Visuel Distinctif) lui-même basé sur le modèle Hypertopic pour la représentation de cartes de thèmes multipoints de vue développé par l équipe Tech-CICO. Cette solution est proposée pour améliorer principalement l'interprétation sémiotique du sens de l icone et renforcer la compréhension et l usage de la structure de tags dans un système informatisé de partage des connaissances, notamment pour gérer et partager les tags iconiques sur une plate-forme collaborativeTags systems for Knowledge Organization System centralize and provide the tags that can be employed in classifying, sharing and seeking knowledge on the web for personal or organizational use. However, an increased variety of vocabularies and languages cause connections between tags and documents marked by textual tags to become less and less distinctive, making the use and reuse of tags systems even harder. Although previous attempts have been made onto visual tags system by using icons, it caused the disorientation when users facing with plant of isolated symbols. Our research dedicates to searching a new approach to improve the representation of tags and their structure in a tags system, where well-structured icons enhance the tagging effectiveness by considering tagging quality and tagging speed. The LVD (Visual Distinctive Language)-based iconic tags system is proposed and presented in this thesis to bring amelioration mainly from semiotic interpretation of tag meaning and graphical code of tag structure. The arrangement of icons is as well another interesting topic that was deal with in our research to offers a more complete definition of iconic tags system. Apart from modeling and evaluating the LVD-based iconic tags system we have considered the way to build up such icon system in today s cooperative knowledge sharing context and made it possible to manage and share iconic tags on a collaborative plate-formTROYES-SCD-UTT (103872102) / SudocSudocFranceF
    corecore