Search CORE

33 research outputs found

The Study on Distributed Search Engine Based on MongoDB

Author: 吴阿妹
Publication venue
Publication date: 29/05/2013
Field of study

Internet上的信息量呈现着爆炸性的增长态势。人们面对海量信息并需要从中寻找到自己所要的资源时，搜索引擎已经成了最有效的方式。搜索引擎通过某种策略搜集信息，并对信息加以组织和整理，为客户提供检索服务。搜索引擎技术一直是学术界研究的热点之一。搜索引擎本身涉及了广泛的知识面，本文通过对搜索引擎的几个关键技术进行梳理和研究，对搜索引擎的背景、发展历史做了介绍，对抓取技术，中文分词算法以及网页索引技术做了进一步的分析和探讨，主要完成以下工作内容：研究并实现了基于MongoDB的分布式抓取技术。考虑到搜索引擎涉及到大规模的数据量，为了提高系统的性能，必须采用分布式技术。本文结合了Mongo...The information on the Internet is showing its explosive growth. When users are looking for what they want, the search engine turns out to be the most effective way. The search engine collects information through some certain strategy and provides retrieval service after information organization. Computer animation techniques have always been hot study topics. The search engine itself involves a...学位：工学硕士院系专业：软件学院_计算机软件与理论学号：2432010115225

Xiamen University Institutional Repository

A self-adapting latency/power tradeoff model for replicated search engines

Author: Baeza-Yates R.
Diao Y.
Economou D.
Kharitonov E.
Ounis I.
Shurman E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

For many search settings, distributed/replicated search engines deploy a large number of machines to ensure efficient retrieval. This paper investigates how the power consumption of a replicated search engine can be automatically reduced when the system has low contention, without compromising its efficiency. We propose a novel self-adapting model to analyse the trade-off between latency and power consumption for distributed search engines. When query volumes are high and there is contention for the resources, the model automatically increases the necessary number of active machines in the system to maintain acceptable query response times. On the other hand, when the load of the system is low and the queries can be served easily, the model is able to reduce the number of active machines, leading to power savings. The model bases its decisions on examining the current and historical query loads of the search engine. Our proposal is formulated as a general dynamic decision problem, which can be quickly solved by dynamic programming in response to changing query loads. Thorough experiments are conducted to validate the usefulness of the proposed adaptive model using historical Web search traffic submitted to a commercial search engine. Our results show that our proposed self-adapting model can achieve an energy saving of 33% while only degrading mean query completion time by 10 ms compared to a baseline that provisions replicas based on a previous day's traffic

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

Enlighten

Multi-objective resource selection in distributed information retrieval

Author: Crestani F.
Wu S.
Publication venue
Publication date: 01/01/2002
Field of study

In a Distributed Information Retrieval system, a user submits a query to a broker, which determines how to yield a given number of documents from all possible resource servers. In this paper, we propose a multi-objective model for this resource selection task. In this model, four aspects are considered simultaneously in the choice of the resource: document's relevance to the given query, time, monetary cost, and similarity between resources. An optimized solution is achieved by comparing the performances of all possible candidates. Some variations of the basic model are also given, which improve the basic model's efficiency

CiteSeerX

University of Strathclyde Institutional Repository

WARP: A ICN architecture for social data

Author: Angius Fabio
Gerla Mario
Pau Giovanni
Westphal Cedric
Publication venue
Publication date: 08/08/2013
Field of study

Social network companies maintain complete visibility and ownership of the data they store. However users should be able to maintain full control over their content. For this purpose, we propose WARP, an architecture based upon Information-Centric Networking (ICN) designs, which expands the scope of the ICN architecture beyond media distribution, to provide data control in social networks. The benefit of our solution lies in the lightweight nature of the protocol and in its layered design. With WARP, data distribution and access policies are enforced on the user side. Data can still be replicated in an ICN fashion but we introduce control channels, named \textit{thread updates}, which ensures that the access to the data is always updated to the latest control policy. WARP decentralizes the social network but still offers APIs so that social network providers can build products and business models on top of WARP. Social applications run directly on the user's device and store their data on the user's \textit{butler} that takes care of encryption and distribution. Moreover, users can still rely on third parties to have high-availability without renouncing their privacy

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Stochastic Models for the 3x+1 and 5x+1 Problems

Author: Kontorovich Alex V.
Lagarias Jeffrey C.
Publication venue
Publication date: 10/10/2009
Field of study

This paper discusses stochastic models for predicting the long-time behavior of the trajectories of orbits of the 3x+1 problem and, for comparison, the 5x+1 problem. The stochastic models are rigorously analyzable, and yield heuristic predictions (conjectures) for the behavior of 3x+1 orbits and 5x+1 orbits.Comment: 68 pages, 9 figures, 4 table

arXiv.org e-Print Archive

CiteSeerX

A Hybrid Optimized Weighted Minimum Spanning Tree for the Shortest Intrapath Selection in Wireless Sensor Network

Author: Matheswaran Saravanan
Muthusamy Madheswaran
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Wireless sensor network (WSN) consists of sensor nodes that need energy efficient routing techniques as they have limited battery power, computing, and storage resources. WSN routing protocols should enable reliable multihop communication with energy constraints. Clustering is an effective way to reduce overheads and when this is aided by effective resource allocation, it results in reduced energy consumption. In this work, a novel hybrid evolutionary algorithm called Bee Algorithm-Simulated Annealing Weighted Minimal Spanning Tree (BASA-WMST) routing is proposed in which randomly deployed sensor nodes are split into the best possible number of independent clusters with cluster head and optimal route. The former gathers data from sensors belonging to the cluster, forwarding them to the sink. The shortest intrapath selection for the cluster is selected using Weighted Minimum Spanning Tree (WMST). The proposed algorithm computes the distance-based Minimum Spanning Tree (MST) of the weighted graph for the multihop network. The weights are dynamically changed based on the energy level of each sensor during route selection and optimized using the proposed bee algorithm simulated annealing algorithm

Crossref

Directory of Open Access Journals

A Bound-Independent Pruning Technique to Speeding up Tree-Based Complete Search Algorithms for Distributed Constraint Optimization Problems

Author: Chen Dingding
Chen Ziyu
Gao Junsong
Liu Xiangshuang
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th International Conference on Principles and Practice of Constraint Programming (CP 2021)
Publication date: 01/01/2021
Field of study

Complete search algorithms are important methods for solving Distributed Constraint Optimization Problems (DCOPs), which generally utilize bounds to prune the search space. However, obtaining high-quality lower bounds is quite expensive since it requires each agent to collect more information aside from its local knowledge, which would cause tremendous traffic overheads. Instead of bothering for bounds, we propose a Bound-Independent Pruning (BIP) technique for existing tree-based complete search algorithms, which can independently reduce the search space only by exploiting local knowledge. Specifically, BIP enables each agent to determine a subspace containing the optimal solution only from its local constraints along with running contexts, which can be further exploited by any search strategies. Furthermore, we present an acceptability testing mechanism to tailor existing tree-based complete search algorithms to search the remaining space returned by BIP when they hold inconsistent contexts. Finally, we prove the correctness of our technique and the experimental results show that BIP can significantly speed up state-of-the-art tree-based complete search algorithms on various standard benchmarks

Dagstuhl Research Online Publication Server

Distributed search trees: Fault tolerance in an asynchronous environment

Author: Schlude Konrad
Soisalon-Soininen Eljas
Widmayer Peter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2003
Field of study

ISSN:1432-4350ISSN:1433-049

Repository for Publications and Research Data