51,810 research outputs found

    Congenial Web Search : A Conceptual Framework for Personalized, Collaborative, and Social Peer-to-Peer Retrieval

    Get PDF
    Traditional information retrieval methods fail to address the fact that information consumption and production are social activities. Most Web search engines do not consider the social-cultural environment of users' information needs and the collaboration between users. This dissertation addresses a new search paradigm for Web information retrieval denoted as Congenial Web Search. It emphasizes personalization, collaboration, and socialization methods in order to improve effectiveness. The client-server architecture of Web search engines only allows the consumption of information. A peer-to-peer system architecture has been developed in this research to improve information seeking. Each user is involved in an interactive process to produce meta-information. Based on a personalization strategy on each peer, the user is supported to give explicit feedback for relevant documents. His information need is expressed by a query that is stored in a Peer Search Memory. On one hand, query-document associations are incorporated in a personalized ranking method for repeated information needs. The performance is shown in a known-item retrieval setting. On the other hand, explicit feedback of each user is useful to discover collaborative information needs. A new method for a controlled grouping of query terms, links, and users was developed to maintain Virtual Knowledge Communities. The quality of this grouping represents the effectiveness of grouped terms and links. Both strategies, personalization and collaboration, tackle the problem of a missing socialization among searchers. Finally, a concept for integrated information seeking was developed. This incorporates an integrated representation to improve effectiveness of information retrieval and information filtering. An integrated information retrieval process explores a virtual search network of Peer Search Memories in order to accomplish a reputation-based ranking. In addition, the community structure is considered by an integrated information filtering process. Both concepts have been evaluated and shown to have a better performance than traditional techniques. The methods presented in this dissertation offer the potential towards more transparency, and control of Web search

    Peer-to-peer information retrieval

    Get PDF
    The Internet has become an integral part of our daily lives. However,the essential task of finding information is dominated by a handful of large centralised search engines. In this thesis we study an alternative to this approach. Instead of using large data centres, we propose using the machines that we all use every day: our desktop, laptop and tablet computers, to build a peer-to-peer web search engine. We provide a definition of the associated research field: peer-to-peer information retrieval. We examine what separates it from related fields, give an overview of the work done so far and provide an economic perspective on peer-to-peer search. Furthermore, we introduce our own architecture for peer-to-peer search systems, inspired by BitTorrent. Distributing the task of providing search results for queries introduces the problem of query routing: a query needs to be sent to a peer that can provide relevant search results. We investigate how the content of peers can be represented so that queries can be directed to the best ones in terms of relevance. While cooperative peers can provide their own representation, the content of uncooperative peers can be accessed only through a search interface and thus they can not actively provide a description of themselves. We look into representing these uncooperative peers by probing their search interface to construct a representation. Finally, the capacity of the machines in peer-to-peer networks differs considerably, making it challenging to provide search results quickly. To address this, we present an approach where copies of search results for previous queries are retained at peers and used to serve future requests and show participation can be incentivised using reputations. There are still problems to be solved before a real-world peer-to-peer web search engine can be built. This thesis provides a starting point for this ambitious goal and also provides a solid basis for reasoning about peer-to-peer information retrieval systems in general

    Interest-Based Self-Organizing Peer-to-Peer Networks: A Club Economics Approach

    Get PDF
    Improving the information retrieval (IR) performance of peer-to-peer networks is an important and challenging problem. Recently, the computer science literature has attempted to address this problem by improving IR search algorithms. However, in peer-to-peer networks, IR performance is determined by both technology and user behavior, and very little attention has been paid in the literature to improving IR performance through incentives to change user behavior. We address this gap by combining the club goods economics literature and the IR literature to propose a next generation file sharing architecture. Using the popular Gnutella 0.6 architecture as context, we conceptualize a Gnutella ultrapeer and its local network of leaf nodes as a "club" (in economic terms). We specify an information retrieval-based utility model for a peer to determine which clubs to join, for a club to manage its membership, and for a club to determine to which other clubs they should connect. We simulate the performance of our model using a unique real-world dataset collected from the Gnutella 0.6 network. These simulations show that our club model accomplishes both performance goals. First, peers are self-organized into communities of interest - in our club model peers are 85% more likely to be able to obtain content from their local club than they are in the current Gnutella 0.6 architecture. Second, peers have increased incentives to share content - our model shows that peers who share can increase their recall performance by nearly five times over the performance offered to free-riders. We also show that the benefits provided by our club model outweigh the added protocol overhead imposed on the network for the most valuable peers

    Automatic classification of documents with an in-depth analysis of information extraction and automatic summarization

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2004.Includes bibliographical references (leaves 78-80).Today, annual information fabrication per capita exceeds two hundred and fifty megabytes. As the amount of data increases, classification and retrieval methods become more necessary to find relevant information. This thesis describes a .Net application (named I-Document) that establishes an automatic classification scheme in a peer-to-peer environment that allows free sharing of academic, business, and personal documents. A Web service architecture for metadata extraction, Information Extraction, Information Retrieval, and text summarization is depicted. Specific details regarding the coding process, competition, business model, and technology employed in the project are also discussed.by Joseph Brandon Hohm.M.Eng

    Willage: A Two-Tiered Peer-to-Peer Resource Sharing Platform for Wireless Mesh Community Networks

    Get PDF
    The success of experiences such as Seattle and Houston Wireless has attracted the attention on the so called wireless mesh community networks. These are wireless multihop networks spontaneously deployed by users willing to share communication resources. Due to the community spirit characterizing such networks, it is likely that users will be willing to share other resources besides communication resources, such as data, images, music, movies, disk quotas for distributed backup, and so on. In other words, it is expected that peer-to-peer applications will be deployed in such type of networks. In this paper we propose Willage, a platform for resource localization in wireless mesh community networks with mobile users. The platform is based on a two-tiered architecture: resources are made available at the lower tier, which is composed of mobile terminals, whereas information on their localization is managed at the upper layer, which is composed of wireless mesh routers. We also introduce Georoy, an algorithm for the efficient retrieval of the information on resource localization based on the Viceroy algorithm. Simulation results show that Willage achieves its goal of enabling efficient and scalable peer-to-peer resource sharing in wireless mesh community networks

    Interest-Based Self-Organizing Peer-to-Peer Networks: A Club Economics Approach

    Get PDF
    Improving the information retrieval (IR) performance of peer-to-peer networks is an important and challenging problem. Recently, the computer science literature has attempted to address this problem by improving IR search algorithms. However, in peer-to-peer networks, IR performance is determined by both technology and user behavior, and very little attention has been paid in the literature to improving IR performance through incentives to change user behavior. We address this gap by combining the club goods economics literature and the IR literature to propose a next generation file sharing architecture. Using the popular Gnutella 0.6 architecture as context, we conceptualize a Gnutella ultrapeer and its local network of leaf nodes as a "club" (in economic terms). We specify an information retrieval-based utility model for a peer to determine which clubs to join, for a club to manage its membership, and for a club to determine to which other clubs they should connect. We simulate the performance of our model using a unique real-world dataset collected from the Gnutella 0.6 network. These simulations show that our club model accomplishes both performance goals. First, peers are self-organized into communities of interest - in our club model peers are 85% more likely to be able to obtain content from their local club than they are in the current Gnutella 0.6 architecture. Second, peers have increased incentives to share content - our model shows that peers who share can increase their recall performance by nearly five times over the performance offered to free-riders. We also show that the benefits provided by our club model outweigh the added protocol overhead imposed on the network for the most valuable peers

    Approximate information filtering in structured peer-to-peer networks

    Get PDF
    Today';s content providers are naturally distributed and produce large amounts of information every day, making peer-to-peer data management a promising approach offering scalability, adaptivity to dynamics, and failure resilience. In such systems, subscribing with a continuous query is of equal importance as one-time querying since it allows the user to cope with the high rate of information production and avoid the cognitive overload of repeated searches. In the information filtering setting users specify continuous queries, thus subscribing to newly appearing documents satisfying the query conditions. Contrary to existing approaches providing exact information filtering functionality, this doctoral thesis introduces the concept of approximate information filtering, where users subscribe to only a few selected sources most likely to satisfy their information demand. This way, efficiency and scalability are enhanced by trading a small reduction in recall for lower message traffic. This thesis contains the following contributions: (i) the first architecture to support approximate information filtering in structured peer-to-peer networks, (ii) novel strategies to select the most appropriate publishers by taking into account correlations among keywords, (iii) a prototype implementation for approximate information retrieval and filtering, and (iv) a digital library use case to demonstrate the integration of retrieval and filtering in a unified system.Heutige Content-Anbieter sind verteilt und produzieren riesige Mengen an Daten jeden Tag. Daher wird die Datenhaltung in Peer-to-Peer Netzen zu einem vielversprechenden Ansatz, der Skalierbarkeit, Anpassbarkeit an Dynamik und Ausfallsicherheit bietet. Für solche Systeme besitzt das Abonnieren mit Daueranfragen die gleiche Wichtigkeit wie einmalige Anfragen, da dies dem Nutzer erlaubt, mit der hohen Datenrate umzugehen und gleichzeitig die Überlastung durch erneutes Suchen verhindert. Im Information Filtering Szenario legen Nutzer Daueranfragen fest und abonnieren dadurch neue Dokumente, die die Anfrage erfüllen. Im Gegensatz zu vorhandenen Ansätzen für exaktes Information Filtering führt diese Doktorarbeit das Konzept von approximativem Information Filtering ein. Ein Nutzer abonniert nur wenige ausgewählte Quellen, die am ehesten die Anfrage erfüllen werden. Effizienz und Skalierbarkeit werden verbessert, indem Recall gegen einen geringeren Nachrichtenverkehr eingetauscht wird. Diese Arbeit beinhaltet folgende Beiträge: (i) die erste Architektur für approximatives Information Filtering in strukturierten Peer-to-Peer Netzen, (ii) Strategien zur Wahl der besten Anbieter unter Berücksichtigung von Schlüsselwörter-Korrelationen, (iii) ein Prototyp, der approximatives Information Retrieval und Filtering realisiert und (iv) ein Anwendungsfall für Digitale Bibliotheken, der beide Funktionalitäten in einem vereinten System aufzeigt

    Content-based image retrieval: reading one's mind and helping people share.

    Get PDF
    Sia Ka Cheung.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 85-91).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Problem Statement --- p.1Chapter 1.2 --- Contributions --- p.3Chapter 1.3 --- Thesis Organization --- p.4Chapter 2 --- Background --- p.5Chapter 2.1 --- Content-Based Image Retrieval --- p.5Chapter 2.1.1 --- Feature Extraction --- p.6Chapter 2.1.2 --- Indexing and Retrieval --- p.7Chapter 2.2 --- Relevance Feedback --- p.7Chapter 2.2.1 --- Weight Updating --- p.9Chapter 2.2.2 --- Bayesian Formulation --- p.11Chapter 2.2.3 --- Statistical Approaches --- p.12Chapter 2.2.4 --- Inter-query Feedback --- p.12Chapter 2.3 --- Peer-to-Peer Information Retrieval --- p.14Chapter 2.3.1 --- Distributed Hash Table Techniques --- p.16Chapter 2.3.2 --- Routing Indices and Shortcuts --- p.17Chapter 2.3.3 --- Content-Based Retrieval in P2P Systems --- p.18Chapter 3 --- Parameter Estimation-Based Relevance Feedback --- p.21Chapter 3.1 --- Parameter Estimation of Target Distribution --- p.21Chapter 3.1.1 --- Motivation --- p.21Chapter 3.1.2 --- Model --- p.23Chapter 3.1.3 --- Relevance Feedback --- p.24Chapter 3.1.4 --- Maximum Entropy Display --- p.26Chapter 3.2 --- Self-Organizing Map Based Inter-Query Feedback --- p.27Chapter 3.2.1 --- Motivation --- p.27Chapter 3.2.2 --- Initialization and Replication of SOM --- p.29Chapter 3.2.3 --- SOM Training for Inter-query Feedback --- p.31Chapter 3.2.4 --- Target Estimation and Display Set Selection for Intra- query Feedback --- p.33Chapter 3.3 --- Experiment --- p.35Chapter 3.3.1 --- Study of Parameter Estimation Method Using Synthetic Data --- p.35Chapter 3.3.2 --- Performance Study in Intra- and Inter- Query Feedback . --- p.40Chapter 3.4 --- Conclusion --- p.42Chapter 4 --- Distributed COntent-based Visual Information Retrieval --- p.44Chapter 4.1 --- Introduction --- p.44Chapter 4.2 --- Peer Clustering --- p.45Chapter 4.2.1 --- Basic Version --- p.45Chapter 4.2.2 --- Single Cluster Version --- p.47Chapter 4.2.3 --- Multiple Clusters Version --- p.51Chapter 4.3 --- Firework Query Model --- p.53Chapter 4.4 --- Implementation and System Architecture --- p.57Chapter 4.4.1 --- Gnutella Message Modification --- p.57Chapter 4.4.2 --- Architecture of DISCOVIR --- p.59Chapter 4.4.3 --- Flow of Operations --- p.60Chapter 4.5 --- Experiments --- p.62Chapter 4.5.1 --- Simulation Model of the Peer-to-Peer Network --- p.62Chapter 4.5.2 --- Number of Peers --- p.66Chapter 4.5.3 --- TTL of Query Message --- p.70Chapter 4.5.4 --- Effects of Data Resolution on Query Efficiency --- p.73Chapter 4.5.5 --- Discussion --- p.74Chapter 4.6 --- Conclusion --- p.77Chapter 5 --- Future Works and Conclusion --- p.79Chapter A --- Derivation of Update Equation --- p.81Chapter B --- An Efficient Discovery of Signatures --- p.82Bibliography --- p.8

    Peer to Peer Information Retrieval: An Overview

    Get PDF
    Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom
    corecore