5 research outputs found

    A Component-Based Approach for Scientific Services for Education and Research (Scientific SEARCH)

    Today’s challenge for retrieving digital information by users such as “students,” educators,” or “researchers” is coping, more than ever before, with the excessive data and information available. The problem is further compounded because of the way scientific knowledge is structured, in terms of expert interviews, articles, conference coverage, journal scans etc. Great progress has been made in digital library research. The NSF/NSDL through their initiatives has assembled a great set of tools and techniques that hold significant potential. Many projects are now underway applying these tools and techniques to meet the information needs of different user communities. The primary focus of Scientific SEARCH project is enhancing access to high quality learning materials and resources, modules, and other digital objects targeted towards scientific consumer and scientific producer. The project will use a multi-phased approach to achieve the objective. The paper describes the first-phase work submitted to NSF 04-542 solicitation

    Revealing Hidden Community Structures and Identifying Bridges in Complex Networks: An Application to Analyzing Contents of Web Pages for Browsing

    International audienceThe emergence of scale free and small world properties in real world complex networks has stimulated lots of activity in the field of network analysis. An example of such a network comes from the field of Content Analysis (CA) and Text Mining where the goal is to analyze the contents of a set of web pages. The Network can be represented by the words appearing in the web pages as nodes and the edges representing a relation between two words if they appear in a document together. In this paper we present a CA system that helps users analyze these networks representing the textual contents of a set of web pages visually. Major contributions include a methodology to cluster complex networks based on duplication of nodes and identification of bridges i.e. words that might be of user interest but have a low frequency in the document corpus. We have tested this system with a number of data sets and users have found it very useful for the exploration of data. One of the case studies is presented in detail which is based on browsing a collection of web pages on Wikipedia (http://en.wikipedia.org/wiki/Main_Page)

    Text mining with exploitation of user\u27s background knowledge : discovering novel association rules from text

    The goal of text mining is to find interesting and non-trivial patterns or knowledge from unstructured documents. Both objective and subjective measures have been proposed in the literature to evaluate the interestingness of discovered patterns. However, objective measures alone are insufficient because such measures do not consider knowledge and interests of the users. Subjective measures require explicit input of user expectations which is difficult or even impossible to obtain in text mining environments. This study proposes a user-oriented text-mining framework and applies it to the problem of discovering novel association rules from documents. The developed system, uMining, consists of two major components: a background knowledge developer and a novel association rules miner. The background knowledge developer learns a user\u27s background knowledge by extracting keywords from documents already known to the user (background documents) and developing a concept hierarchy to organize popular keywords. The novel association rule miner discovers association rules among noun phrases extracted from relevant documents (target documents) and compares the rules with the background knowledge to predict the rule novelty to the particular user (useroriented novelty). The user-oriented novelty measure is defined as the semantic distance between the antecedent and the consequent of a rule in the background knowledge. It consists of two components: occurrence distance and connection distance. The former considers the co-occurrences of two keywords in the background documents: the more the shorter the distance. The latter considers the common connections of with others in the concept hierarchy. It is defined as the length of the connecting the two keywords in the concept hierarchy: the longer the path, distance. The user-oriented novelty measure is evaluated from two perspectives: novelty prediction accuracy and usefulness indication power. The results show that the useroriented novelty measure outperforms the WordNet novelty measure and the compared objective measures in term of predicting novel rules and identifying useful rules

    Design and Evaluation of User Interfaces for Mobile Web Search

    Mobiili tiedonhaku on jatkuvasti kasvava ja monimuotoistuva osa jokapäiväistä tiedonhankintaa. Aikaisemman tutkimuksen mukaan tarvitaan kuitenkin parempia käyttöliittymäratkaisuja tukemaan mobiililaitteilla tapahtuvaa verkkotiedonhakua. Väitöskirjatutkimuksessa suunniteltiin ja toteutettiin kaksi uutta hakukäyttöliittymää, joita arvioitiin käyttäjätutkimuksissa. Ensimmäinen käyttöliittymä perustuu siihen, että hakutulokset luokitellaan ryhmiin niissä esiintyvien avainsanojen perusteella. Käyttäjätutkimusten tulokset osoittavat, että luokittelulla voidaan tukea mobiilikäyttäjien tutkivaa tiedonhakua. Toinen käyttöliittymä antaa hakutulosten yhteydessä yleiskuvan hakulauseen sijaintikohdista tulosdokumenteissa. Vaikkakin menetelmän käyttö vaatii opettelua, käyttäjäarviot osoittavat että se voi auttaa sivuuttamaan huonot hakutulokset, etenkin silloin kun muut hakutulosta kuvaavat tiedot ovat epäselviä. Lisäksi väitöskirjassa tutkittiin aktiivisten mobiili-Internetin käyttäjien tiedontarpeita verkkotiedonhaun käytön ymmärtämiseksi. Tutkimustulosten mukaan hakujen tekeminen ja verkon selaaminen ovat näiden käyttäjien tärkeimpiä tiedonhankintatapoja. Niillä pyritään vastaamaan tiedontarpeisiin heti niiden ilmaantuessa, olipa käyttäjä sitten kotona, liikkeessä tai sosiaalisessa vuorovaikutustilanteessa. Mobiili tiedonhankinta on vahvasti sidoksissa käyttötilanteeseen, mikä tulee huomioida hakukäyttöliittymien suunnittelussa. Tulevaisuuden hakukäyttöliittymät voivat esimerkiksi tukea tiedonhankintaa hyödyntämällä tietoa käyttäjän sijainnista ja aktiviteeteista. Myös epämuodollisten ja tutkivien tiedontarpeiden kasvava rooli asettaa uusia haasteita vuorovaikutuksen suunnittelulle.Mobile Web search is a rapidly growing information seeking activity employed across different locations, situations, and activities. Current mobile search interfaces are based on the ranked result list, dominant in desktop interfaces. Research suggests that new paradigms are needed for better support of mobile searchers. For this dissertation, two such novel search interface techniques were designed, implemented, and evaluated. The first method, a clustering search interface that presents a category- based overview of the results, was studied both in a task-based experiment in a laboratory setting and in a longitudinal field study wherein it was used to address real information needs. The results indicate that clustering can support exploratory search needs when the searcher has trouble defining the information need, requires an overview of the search topic, or is interested in multiple results related to the same topic. The findings informed design guidelines for category-based search interfaces. How and when categorization is presented in the search interface needs to be carefully considered. Categorization methods should be improved, for better response to diverse information needs. Hybrid approaches employing contextually informed clustering, classification, and faceted browsing may offer the best match for user needs. The second presentation method, a visualization of the occurrences of the user s query phrase in a result document, can be incorporated into the ranked result list as an additional, unobtrusive result descriptor. It allows the searcher to see how often the query phrase appears in the result document, enabling the use of various evaluation strategies to assess the relevance of the results. Several iterations of the visualization were studied with users to form an understanding of the potential of this approach. The results suggest that a novel visualization can be useful in ruling out non-relevant results and can assist when the other result descriptors do not provide for a conclusive relevance assessment. However, users familiarity with well-established result descriptors means that users have to learn how to integrate the visualization into their search strategies and reconcile situations in which the visualization is in conflict with other metadata. In addition, the contextual triggers and information behaviors of mobile Internet users were studied, for understanding of the role of Web search as a mobile information seeking activity. The results from this study show that mobile Web search and browsing are important information seeking activities. They are engaged in to resolve emerging information needs as they appear, whether at home, on the go, or in social situations

    Supporting Exploratory Search Tasks Through Alternative Representations of Information

    Information seeking is a fundamental component of many of the complex tasks presented to us, and is often conducted through interactions with automated search systems such as Web search engines. Indeed, the ubiquity of Web search engines makes information so readily available that people now often turn to the Web for all manners of information seeking needs. Furthermore, as the range of online information seeking tasks grows, more complex and open-ended search activities have been identified. One type of complex search activities that is of increasing interest to researchers is exploratory search, where the goal involves "learning" or "investigating", rather than simply "looking-up". Given the massive increase in information availability and the use of online search for tasks beyond simply looking-up, researchers have noted that it becomes increasingly challenging for users to effectively leverage the available online information for complex and open-ended search activities. One of the main limitations of the current document retrieval paradigm offered by modern search engines is that it provides a ranked list of documents as a response to the searcher’s query with no further support for locating and synthesizing relevant information. Therefore, the searcher is left to find and make sense of useful information in a massive information space that lacks any overview or conceptual organization. This thesis explores the impact of alternative representations of search results on user behaviors and outcomes during exploratory search tasks. Our inquiry is inspired by the premise that exploratory search tasks require sensemaking, and that sensemaking involves constructing and interacting with representations of knowledge. As such, in order to provide the searchers with more support in performing exploratory activities, there is a need to move beyond the current document retrieval paradigm by extending the support for locating and externalizing semantic information from textual documents and by providing richer representations of the extracted information coupled with mechanisms for accessing and interacting with the information in ways that support exploration and sensemaking. This dissertation presents a series of discrete research endeavour to explore different aspects of providing information and presenting this information in ways that both extraction and assimilation of relevant information is supported. We first address the problem of extracting information – that is more granular than documents – as a response to a user's query by developing a novel information extraction system to represent documents as a series of entity-relationship tuples. Next, through a series of designing and evaluating alternative representations of search results, we examine how this extracted information can be represented such that it extends the document-based search framework's support for exploratory search tasks. Finally, we assess the ecological validity of this research by exploring error-prone representations of search results and how they impact a searcher's ability to leverage our representations to perform exploratory search tasks. Overall, this research contributes towards designing future search systems by providing insights into the efficacy of alternative representations of search results for supporting exploratory search activities, culminating in a novel hybrid representation called Hierarchical Knowledge Graphs (HKG). To this end we propose and develop a framework that enables a reliable investigation of the impact of different representations and how they are perceived and utilized by information seekers