4,213 research outputs found

    Updating collection representations for federated search

    Get PDF
    To facilitate the search for relevant information across a set of online distributed collections, a federated information retrieval system typically represents each collection, centrally, by a set of vocabularies or sampled documents. Accurate retrieval is therefore related to how precise each representation reflects the underlying content stored in that collection. As collections evolve over time, collection representations should also be updated to reflect any change, however, a current solution has not yet been proposed. In this study we examine both the implications of out-of-date representation sets on retrieval accuracy, as well as proposing three different policies for managing necessary updates. Each policy is evaluated on a testbed of forty-four dynamic collections over an eight-week period. Our findings show that out-of-date representations significantly degrade performance overtime, however, adopting a suitable update policy can minimise this problem

    A Web Smart Space Framework for Intelligent Search Engines

    Get PDF
    A web smart space is an intelligent environment which has additional capability of searching the information smartly and efficiently. New advancements like dynamic web contents generation has increased the size of web repositories. Among so many modern software analysis requirements, one is to search information from the given repository. But useful information extraction is a troublesome hitch due to the multi-lingual; base of the web data collection. The issue of semantic based information searching has become a standoff due to the inconsistencies and variations in the characteristics of the data. In the accomplished research, a web smart space framework has been proposed which introduces front end processing for a search engine to make the information retrieval process more intelligent and accurate. In orthodox searching anatomies, searching is performed only by using pattern matching technique and consequently a large number of irrelevant results are generated. The projected framework has insightful ability to improve this drawback and returns efficient outcomes. Designed framework gets text input from the user in the form complete question, understands the input and generates the meanings. Search engine searches on the basis of the information provided

    Search Engines Giving You Garbage? Put A Corc In It, Implementing The Cooperative Online Resource Catalog

    Full text link
    This paper presents an implementation strategy for adding Internet resources to a library online catalog using OCLC\u27s Cooperative Online Resource Catalog (CORC). Areas of consideration include deciding which electronic resources to include in the online catalog and how to select them. The value and importance of pathfinders in creating electronic bibliographies and the role of library staff in updating them is introduced. Using an electronic suggestion form as a means of Internet resource collection development is another innovative method of enriching library collections. Education and training for cataloging staff on Dublin Core elements is also needed. Attention should be paid to the needs of distance learners in providing access to Internet resources. The significance of evaluating the appropriateness of Internet resources for library collections is emphasized

    Peer to Peer Information Retrieval: An Overview

    Get PDF
    Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom

    Lightweight Federation of Non-Cooperating Digital Libraries

    Get PDF
    This dissertation studies the challenges and issues faced in federating heterogeneous digital libraries (DLs). The objective of this research is to demonstrate the feasibility of interoperability among non-cooperating DLs by presenting a lightweight, data driven approach, or Data Centered Interoperability (DCI). We build a Lightweight Federated Digital Library (LFDL) system to provide federated search service for existing digital libraries with no prior coordination. We describe the motivation, architecture, design and implementation of the LFDL. We develop, deploy, and evaluate key services of the federation. The major difference to existing DL interoperability approaches is one where we do not insist on cooperation among DLs, that is, they do not have to change anything in their system or processes. The underlying approach is to have a dynamic federation where digital libraries can be added (removed) to the federation in real-time. This is made possible by describing the behavior of participating DLs in an XML-based language that the federation engine understands. The major contributions of this work are: (1) This dissertation addresses the interoperability issues among non-cooperating DLs and presents a practical and efficient approach toward providing federated search service for those DLs. The DL itself remains autonomous and does not need to change its structure, data format, protocol and other internal features when it is added to the federation. (2) The implementation of the LFDL is based on a lightweight, dynamic, data-centered and rule-driven architecture. To add a DL to the federation, all that is needed is observing a DL\u27s interaction with the user and storing the interaction specification in a human-readable and highly maintainable format. The federation engine provides the federated service based on the specification of a DL. A registration service allows dynamic DL registration, removal, or modification. No code needs to be rewritten or recompiled to add or change a DL. These notions are achieved by designing a new specification language in XML format and a powerful processing engine that enforces and implements the rules specified using the language. (3) In this thesis we explore an alternate approach where searches are distributed to participating DLs in real time. We have addressed the performance and reliability problems associated with other distributed search approaches. This is achieved by a locally maintained metadata repository extracted from DLs, as well as an efficient caching system based on the repository

    Approaches to implement and evaluate aggregated search

    Get PDF
    La recherche d'information agrĂ©gĂ©e peut ĂȘtre vue comme un troisiĂšme paradigme de recherche d'information aprĂšs la recherche d'information ordonnĂ©e (ranked retrieval) et la recherche d'information boolĂ©enne (boolean retrieval). Les deux paradigmes les plus explorĂ©s jusqu'Ă  aujourd'hui retournent un ensemble ou une liste ordonnĂ©e de rĂ©sultats. C'est Ă  l'usager de parcourir ces ensembles/listes et d'en extraire l'information nĂ©cessaire qui peut se retrouver dans plusieurs documents. De maniĂšre alternative, la recherche d'information agrĂ©gĂ©e ne s'intĂ©resse pas seulement Ă  l'identification des granules (nuggets) d'information pertinents, mais aussi Ă  l'assemblage d'une rĂ©ponse agrĂ©gĂ©e contenant plusieurs Ă©lĂ©ments. Dans nos travaux, nous analysons les travaux liĂ©s Ă  la recherche d'information agrĂ©gĂ©e selon un schĂ©ma gĂ©nĂ©ral qui comprend 3 parties: dispatching de la requĂȘte, recherche de granules d'information et agrĂ©gation du rĂ©sultat. Les approches existantes sont groupĂ©es autours de plusieurs perspectives gĂ©nĂ©rales telle que la recherche relationnelle, la recherche fĂ©dĂ©rĂ©e, la gĂ©nĂ©ration automatique de texte, etc. Ensuite, nous nous sommes focalisĂ©s sur deux pistes de recherche selon nous les plus prometteuses: (i) la recherche agrĂ©gĂ©e relationnelle et (ii) la recherche agrĂ©gĂ©e inter-verticale. * La recherche agrĂ©gĂ©e relationnelle s'intĂ©resse aux relations entre les granules d'information pertinents qui servent Ă  assembler la rĂ©ponse agrĂ©gĂ©e. En particulier, nous nous sommes intĂ©ressĂ©s Ă  trois types de requĂȘtes notamment: requĂȘte attribut (ex. prĂ©sident de la France, PIB de l'Italie, maire de Glasgow, ...), requĂȘte instance (ex. France, Italie, Glasgow, Nokia e72, ...) et requĂȘte classe (pays, ville française, portable Nokia, ...). Pour ces requĂȘtes qu'on appelle requĂȘtes relationnelles nous avons proposĂ©s trois approches pour permettre la recherche de relations et l'assemblage des rĂ©sultats. Nous avons d'abord mis l'accent sur la recherche d'attributs qui peut aider Ă  rĂ©pondre aux trois types de requĂȘtes. Nous proposons une approche Ă  large Ă©chelle capable de rĂ©pondre Ă  des nombreuses requĂȘtes indĂ©pendamment de la classe d'appartenance. Cette approche permet l'extraction des attributs Ă  partir des tables HTML en tenant compte de la qualitĂ© des tables et de la pertinence des attributs. Les diffĂ©rentes Ă©valuations de performances effectuĂ©es prouvent son efficacitĂ© qui dĂ©passe les mĂ©thodes de l'Ă©tat de l'art. DeuxiĂšmement, nous avons traitĂ© l'agrĂ©gation des rĂ©sultats composĂ©s d'instances et d'attributs. Ce problĂšme est intĂ©ressant pour rĂ©pondre Ă  des requĂȘtes de type classe avec une table contenant des instances (lignes) et des attributs (colonnes). Pour garantir la qualitĂ© du rĂ©sultat, nous proposons des pondĂ©rations sur les instances et les attributs promouvant ainsi les plus reprĂ©sentatifs. Le troisiĂšme problĂšme traitĂ© concerne les instances de la mĂȘme classe (ex. France, Italie, Allemagne, ...). Nous proposons une approche capable d'identifier massivement ces instances en exploitant les listes HTML. Toutes les approches proposĂ©es fonctionnent Ă  l'Ă©chelle Web et sont importantes et complĂ©mentaires pour la recherche agrĂ©gĂ©e relationnelle. Enfin, nous proposons 4 prototypes d'application de recherche agrĂ©gĂ©e relationnelle. Ces derniers peuvent rĂ©pondre des types de requĂȘtes diffĂ©rents avec des rĂ©sultats relationnels. Plus prĂ©cisĂ©ment, ils recherchent et assemblent des attributs, des instances, mais aussi des passages et des images dans des rĂ©sultats agrĂ©gĂ©s. Un exemple est la requĂȘte ``Nokia e72" dont la rĂ©ponse sera composĂ©e d'attributs (ex. prix, poids, autonomie batterie, ...), de passages (ex. description, reviews, ...) et d'images. Les rĂ©sultats sont encourageants et illustrent l'utilitĂ© de la recherche agrĂ©gĂ©e relationnelle. * La recherche agrĂ©gĂ©e inter-verticale s'appuie sur plusieurs moteurs de recherche dits verticaux tel que la recherche d'image, recherche vidĂ©o, recherche Web traditionnelle, etc. Son but principal est d'assembler des rĂ©sultats provenant de toutes ces sources dans une mĂȘme interface pour rĂ©pondre aux besoins des utilisateurs. Les moteurs de recherche majeurs et la communautĂ© scientifique nous offrent dĂ©jĂ  une sĂ©rie d'approches. Notre contribution consiste en une Ă©tude sur l'Ă©valuation et les avantages de ce paradigme. Plus prĂ©cisĂ©ment, nous comparons 4 types d'Ă©tudes qui simulent des situations de recherche sur un total de 100 requĂȘtes et 9 sources diffĂ©rentes. Avec cette Ă©tude, nous avons identifiĂ©s clairement des avantages de la recherche agrĂ©gĂ©e inter-verticale et nous avons pu dĂ©duire de nombreux enjeux sur son Ă©valuation. En particulier, l'Ă©valuation traditionnelle utilisĂ©e en RI, certes la moins rapide, reste la plus rĂ©aliste. Pour conclure, nous avons proposĂ© des diffĂ©rents approches et Ă©tudes sur deux pistes prometteuses de recherche dans le cadre de la recherche d'information agrĂ©gĂ©e. D'une cĂŽtĂ©, nous avons traitĂ© trois problĂšmes importants de la recherche agrĂ©gĂ©e relationnelle qui ont portĂ© Ă  la construction de 4 prototypes d'application avec des rĂ©sultats encourageants. De l'autre cĂŽtĂ©, nous avons mis en place 4 Ă©tudes sur l'intĂ©rĂȘt et l'Ă©valuation de la recherche agrĂ©gĂ©e inter-verticale qui ont permis d'identifier les enjeux d'Ă©valuation et les avantages du paradigme. Comme suite Ă  long terme de ce travail, nous pouvons envisager une recherche d'information qui intĂšgre plus de granules relationnels et plus de multimĂ©dia.Aggregated search or aggregated retrieval can be seen as a third paradigm for information retrieval following the Boolean retrieval paradigm and the ranked retrieval paradigm. In the first two, we are returned respectively sets and ranked lists of search results. It is up to the time-poor user to scroll this set/list, scan within different documents and assemble his/her information need. Alternatively, aggregated search not only aims the identification of relevant information nuggets, but also the assembly of these nuggets into a coherent answer. In this work, we present at first an analysis of related work to aggregated search which is analyzed with a general framework composed of three steps: query dispatching, nugget retrieval and result aggregation. Existing work is listed aside different related domains such as relational search, federated search, question answering, natural language generation, etc. Within the possible research directions, we have then focused on two directions we believe promise the most namely: relational aggregated search and cross-vertical aggregated search. * Relational aggregated search targets relevant information, but also relations between relevant information nuggets which are to be used to assemble reasonably the final answer. In particular, there are three types of queries which would easily benefit from this paradigm: attribute queries (e.g. president of France, GDP of Italy, major of Glasgow, ...), instance queries (e.g. France, Italy, Glasgow, Nokia e72, ...) and class queries (countries, French cities, Nokia mobile phones, ...). We call these queries as relational queries and we tackle with three important problems concerning the information retrieval and aggregation for these types of queries. First, we propose an attribute retrieval approach after arguing that attribute retrieval is one of the crucial problems to be solved. Our approach relies on the HTML tables in the Web. It is capable to identify useful and relevant tables which are used to extract relevant attributes for whatever queries. The different experimental results show that our approach is effective, it can answer many queries with high coverage and it outperforms state of the art techniques. Second, we deal with result aggregation where we are given relevant instances and attributes for a given query. The problem is particularly interesting for class queries where the final answer will be a table with many instances and attributes. To guarantee the quality of the aggregated result, we propose the use of different weights on instances and attributes to promote the most representative and important ones. The third problem we deal with concerns instances of the same class (e.g. France, Germany, Italy ... are all instances of the same class). Here, we propose an approach that can massively extract instances of the same class from HTML lists in the Web. All proposed approaches are applicable at Web-scale and they can play an important role for relational aggregated search. Finally, we propose 4 different prototype applications for relational aggregated search. They can answer different types of queries with relevant and relational information. Precisely, we not only retrieve attributes and their values, but also passages and images which are assembled into a final focused answer. An example is the query ``Nokia e72" which will be answered with attributes (e.g. price, weight, battery life ...), passages (e.g. description, reviews ...) and images. Results are encouraging and they illustrate the utility of relational aggregated search. * The second research direction that we pursued concerns cross-vertical aggregated search, which consists of assembling results from different vertical search engines (e.g. image search, video search, traditional Web search, ...) into one single interface. Here, different approaches exist in both research and industry. Our contribution concerns mostly evaluation and the interest (advantages) of this paradigm. We propose 4 different studies which simulate different search situations. Each study is tested with 100 different queries and 9 vertical sources. Here, we could clearly identify new advantages of this paradigm and we could identify different issues with evaluation setups. In particular, we observe that traditional information retrieval evaluation is not the fastest but it remains the most realistic. To conclude, we propose different studies with respect to two promising research directions. On one hand, we deal with three important problems of relational aggregated search following with real prototype applications with encouraging results. On the other hand, we have investigated on the interest and evaluation of cross-vertical aggregated search. Here, we could clearly identify some of the advantages and evaluation issues. In a long term perspective, we foresee a possible combination of these two kinds of approaches to provide relational and cross-vertical information retrieval incorporating more focus, structure and multimedia in search results

    Distributed bookmark sharing primitives

    Get PDF
    Ankara : The Department of Computer Engineering and Information Science and the Institute of Engineering and Science of Bilkent Univ., 1999.Thesis (Master's) -- Bilkent University, 1999.Includes bibliographical references leaves 73-[74].Ä°nce, KĂŒrƟatM.S

    Congenial Web Search : A Conceptual Framework for Personalized, Collaborative, and Social Peer-to-Peer Retrieval

    Get PDF
    Traditional information retrieval methods fail to address the fact that information consumption and production are social activities. Most Web search engines do not consider the social-cultural environment of users' information needs and the collaboration between users. This dissertation addresses a new search paradigm for Web information retrieval denoted as Congenial Web Search. It emphasizes personalization, collaboration, and socialization methods in order to improve effectiveness. The client-server architecture of Web search engines only allows the consumption of information. A peer-to-peer system architecture has been developed in this research to improve information seeking. Each user is involved in an interactive process to produce meta-information. Based on a personalization strategy on each peer, the user is supported to give explicit feedback for relevant documents. His information need is expressed by a query that is stored in a Peer Search Memory. On one hand, query-document associations are incorporated in a personalized ranking method for repeated information needs. The performance is shown in a known-item retrieval setting. On the other hand, explicit feedback of each user is useful to discover collaborative information needs. A new method for a controlled grouping of query terms, links, and users was developed to maintain Virtual Knowledge Communities. The quality of this grouping represents the effectiveness of grouped terms and links. Both strategies, personalization and collaboration, tackle the problem of a missing socialization among searchers. Finally, a concept for integrated information seeking was developed. This incorporates an integrated representation to improve effectiveness of information retrieval and information filtering. An integrated information retrieval process explores a virtual search network of Peer Search Memories in order to accomplish a reputation-based ranking. In addition, the community structure is considered by an integrated information filtering process. Both concepts have been evaluated and shown to have a better performance than traditional techniques. The methods presented in this dissertation offer the potential towards more transparency, and control of Web search
    • 

    corecore