247 research outputs found

    Applying Wikipedia to Interactive Information Retrieval

    Get PDF
    There are many opportunities to improve the interactivity of information retrieval systems beyond the ubiquitous search box. One idea is to use knowledge bases—e.g. controlled vocabularies, classification schemes, thesauri and ontologies—to organize, describe and navigate the information space. These resources are popular in libraries and specialist collections, but have proven too expensive and narrow to be applied to everyday webscale search. Wikipedia has the potential to bring structured knowledge into more widespread use. This online, collaboratively generated encyclopaedia is one of the largest and most consulted reference works in existence. It is broader, deeper and more agile than the knowledge bases put forward to assist retrieval in the past. Rendering this resource machine-readable is a challenging task that has captured the interest of many researchers. Many see it as a key step required to break the knowledge acquisition bottleneck that crippled previous efforts. This thesis claims that the roadblock can be sidestepped: Wikipedia can be applied effectively to open-domain information retrieval with minimal natural language processing or information extraction. The key is to focus on gathering and applying human-readable rather than machine-readable knowledge. To demonstrate this claim, the thesis tackles three separate problems: extracting knowledge from Wikipedia; connecting it to textual documents; and applying it to the retrieval process. First, we demonstrate that a large thesaurus-like structure can be obtained directly from Wikipedia, and that accurate measures of semantic relatedness can be efficiently mined from it. Second, we show that Wikipedia provides the necessary features and training data for existing data mining techniques to accurately detect and disambiguate topics when they are mentioned in plain text. Third, we provide two systems and user studies that demonstrate the utility of the Wikipedia-derived knowledge base for interactive information retrieval

    Spartan Daily, September 21, 2000

    Get PDF
    Volume 115, Issue 15https://scholarworks.sjsu.edu/spartandaily/9583/thumbnail.jp

    Information Outlook, September 2007

    Get PDF
    Volume 11, Issue 9https://scholarworks.sjsu.edu/sla_io_2007/1008/thumbnail.jp

    Information visibility on the Web and conceptions of success and failure in Web searching.

    Get PDF
    This thesis reports the procedure and findings of an empirical study about end users' interaction with web-based search tools. The first part is dedicated to address early research questions to discover web user's conceptions of the invisible web. The second part addresses primary research questions to explore web users' conceptualizations of the causes of their search success/failure and their awareness of and reaction to missed information while searching the web. The third part is devoted to a number of emergent research questions to reexamine the dataset in the light of a number of theoretical frameworks including Locus of Control, Self-efficacy, Attribution Theory and Bounded Rationality and Satisficing theory. The data collection was carried out in three phases based on in-depth, open-ended and semi-structured interviews with a sample of academic staff, research staff and research students from three biology-related departments at the University of Sheffield. A combination of inductive and deductive approaches was employed to address three sets of research questions. The first part of analysis which was based on Grounded Theory led to discovery of a new concept called 'information visibility' which does make a distinction between technical objective conceptions of the invisible web that commonly appear in the literature, and a cognitive subjective conception based on searchers' perceptions of search failure. Accordingly, the study introduced a 'model of information visibility on the web' which suggests a complementary definition for the invisible web. Inductive exploration of the data to address the primary research questions culminated in identification of different kinds of success (i.e. anticipated, serendipitous, and unexpected success) and failure (i.e. unexpected, unexplained and inevitable failure). The results also showed that the participants in the study were aware of the possibility of missing some relevant information in their searches and the risk of missing potentially important information is a matter of concern to them. However, regarding the context of each search they have different perceptions of the importance and the volume of missed information and accordingly they react to it differently. In view of that, two matrices including the "matrix of search impact" and the "matrix of search depth" were developed to address users' search behaviours regarding their awareness of and reaction to missed information. The matrix of search impact suggests that there are different perceptions of the risk of missing information including "inconsequential", "tolerable", "damaging" and "disastrous". The matrix of search depth illustrates different search strategies including "minimalist", "opportunistic", "nervous" and "extensive". The third part of the study indicated that Locus of Control and Attribution Theory are useful theoretical frameworks for helping us to better understand web-based information seeking. Furthermore, interpretation of the data with regards to Bounded Rationality and Satisficing theory supported the inductive findings and showed that web users' estimations of the likely volume and importance of missed information affect their decision to persist in searching. At the final stage of the study, an integrative model of information seeking behaviour on the web was developed. This six-layer model incorporates the results of both inductive and deductive stages of the study

    The BG News August 22, 2007

    Get PDF
    The BGSU campus student newspaper August 22, 2007. Volume 98 - Issue 4https://scholarworks.bgsu.edu/bg-news/8780/thumbnail.jp

    Questions And Answers: Exploring Mobile User Needs

    Get PDF
    The users of mobile devices increasingly use networked services to address their information needs. Questions asked by mobile users are strongly influenced by context factors, such as location and user activity. However in research which has empirically documented the link between mobile information needs and context factors, information about expected answers is scant. Therefore, the goal of this study is to explore the context factors which influence the mobile information needs and the answers expected by mobile users. The results, are obtained by analysing information from paper diaries and digital diaries. This project involved a user study, comprising two different types of studies concerning a paper diary and a digital diary. The analysis of both the paper diary and the digital diary was conducted through grounded theory and taxonomy of information needs. our results indicate a relationship between mobile information needs and context factors and expected answers. Our study explored this relationship between mobile information needs and context factors, and provides a better understanding of the expected answers related to mobile information needs

    A multiple case study exploration of undergraduate subject searching

    Get PDF
    Subject searching—seeking information with a subject or topic in mind—is often involved in carrying out undergraduate assignments such as term papers and research reports. It is also an important component of information literacy—the abilities and experiences of effectively finding and evaluating, and appropriately using, needed information—which universities hope to cultivate in undergraduates by the time they complete their degree programs. By exploring the subject searching of a small group of upper-level, academically successful undergraduates over a school year I sought to acquire a deeper understanding of the contexts and characteristics of their subject searching, and of the extent to which it was similar in quality to that of search and domain experts. Primary data sources for this study comprised subject searching diaries maintained by participants, and three online subject searches they demonstrated at the beginning, middle, and end of the study during which they talked aloud while I observed, followed by focused interviews. To explore the quality of study participants’ subject searching I looked for indications of advanced thinking in thoughts they spoke aloud during demonstration sessions relating to using strategy, evaluating, and creating personal understanding, which represent three of the most challenging and complex aspects of information literacy. Applying a layered interpretive process, I identified themes within several hundred instances of participants’ advanced thinking relating to these three information literacy elements, with evaluative themes occurring most often. I also noted three factors influencing the extent of similarity iii between the quality of participants’ advanced thinking and that of search and domain experts which reflected matters that tended to be i) pragmatic or principled, , ii) technical or conceptual, and iii) externally or internally focused. Filtered through these factors, participants’ instances of advanced thinking brought to mind three levels of subject searching abilities: the competent student, the search expert, and the domain expert. Although relatively few in number, I identified at least some advanced thinking evincing domain expert qualities in voiced thoughts of all but one participant, suggesting the gap between higher order thinking abilities of upper-level undergraduates and information literate individuals is not always dauntingly large.Ye

    Learning Representations of Social Media Users

    Get PDF
    User representations are routinely used in recommendation systems by platform developers, targeted advertisements by marketers, and by public policy researchers to gauge public opinion across demographic groups. Computer scientists consider the problem of inferring user representations more abstractly; how does one extract a stable user representation - effective for many downstream tasks - from a medium as noisy and complicated as social media? The quality of a user representation is ultimately task-dependent (e.g. does it improve classifier performance, make more accurate recommendations in a recommendation system) but there are proxies that are less sensitive to the specific task. Is the representation predictive of latent properties such as a person's demographic features, socioeconomic class, or mental health state? Is it predictive of the user's future behavior? In this thesis, we begin by showing how user representations can be learned from multiple types of user behavior on social media. We apply several extensions of generalized canonical correlation analysis to learn these representations and evaluate them at three tasks: predicting future hashtag mentions, friending behavior, and demographic features. We then show how user features can be employed as distant supervision to improve topic model fit. Finally, we show how user features can be integrated into and improve existing classifiers in the multitask learning framework. We treat user representations - ground truth gender and mental health features - as auxiliary tasks to improve mental health state prediction. We also use distributed user representations learned in the first chapter to improve tweet-level stance classifiers, showing that distant user information can inform classification tasks at the granularity of a single message.Comment: PhD thesi

    Learning Representations of Social Media Users

    Get PDF
    User representations are routinely used in recommendation systems by platform developers, targeted advertisements by marketers, and by public policy researchers to gauge public opinion across demographic groups. Computer scientists consider the problem of inferring user representations more abstractly; how does one extract a stable user representation - effective for many downstream tasks - from a medium as noisy and complicated as social media? The quality of a user representation is ultimately task-dependent (e.g. does it improve classifier performance, make more accurate recommendations in a recommendation system) but there are proxies that are less sensitive to the specific task. Is the representation predictive of latent properties such as a person's demographic features, socioeconomic class, or mental health state? Is it predictive of the user's future behavior? In this thesis, we begin by showing how user representations can be learned from multiple types of user behavior on social media. We apply several extensions of generalized canonical correlation analysis to learn these representations and evaluate them at three tasks: predicting future hashtag mentions, friending behavior, and demographic features. We then show how user features can be employed as distant supervision to improve topic model fit. Finally, we show how user features can be integrated into and improve existing classifiers in the multitask learning framework. We treat user representations - ground truth gender and mental health features - as auxiliary tasks to improve mental health state prediction. We also use distributed user representations learned in the first chapter to improve tweet-level stance classifiers, showing that distant user information can inform classification tasks at the granularity of a single message.Comment: PhD thesi
    corecore