33 research outputs found

    The Study on Distributed Search Engine Based on MongoDB

    Get PDF
    Internet上的信息量呈现着爆炸性的增长态势。人们面对海量信息并需要从中寻找到自己所要的资源时,搜索引擎已经成了最有效的方式。搜索引擎通过某种策略搜集信息,并对信息加以组织和整理,为客户提供检索服务。搜索引擎技术一直是学术界研究的热点之一。 搜索引擎本身涉及了广泛的知识面,本文通过对搜索引擎的几个关键技术进行梳理和研究,对搜索引擎的背景、发展历史做了介绍,对抓取技术,中文分词算法以及网页索引技术做了进一步的分析和探讨,主要完成以下工作内容: 研究并实现了基于MongoDB的分布式抓取技术。考虑到搜索引擎涉及到大规模的数据量,为了提高系统的性能,必须采用分布式技术。本文结合了Mongo...The information on the Internet is showing its explosive growth. When users are looking for what they want, the search engine turns out to be the most effective way. The search engine collects information through some certain strategy and provides retrieval service after information organization. Computer animation techniques have always been hot study topics. The search engine itself involves a...学位:工学硕士院系专业:软件学院_计算机软件与理论学号:2432010115225

    A self-adapting latency/power tradeoff model for replicated search engines

    Get PDF
    For many search settings, distributed/replicated search engines deploy a large number of machines to ensure efficient retrieval. This paper investigates how the power consumption of a replicated search engine can be automatically reduced when the system has low contention, without compromising its efficiency. We propose a novel self-adapting model to analyse the trade-off between latency and power consumption for distributed search engines. When query volumes are high and there is contention for the resources, the model automatically increases the necessary number of active machines in the system to maintain acceptable query response times. On the other hand, when the load of the system is low and the queries can be served easily, the model is able to reduce the number of active machines, leading to power savings. The model bases its decisions on examining the current and historical query loads of the search engine. Our proposal is formulated as a general dynamic decision problem, which can be quickly solved by dynamic programming in response to changing query loads. Thorough experiments are conducted to validate the usefulness of the proposed adaptive model using historical Web search traffic submitted to a commercial search engine. Our results show that our proposed self-adapting model can achieve an energy saving of 33% while only degrading mean query completion time by 10 ms compared to a baseline that provisions replicas based on a previous day's traffic

    Multi-objective resource selection in distributed information retrieval

    Get PDF
    In a Distributed Information Retrieval system, a user submits a query to a broker, which determines how to yield a given number of documents from all possible resource servers. In this paper, we propose a multi-objective model for this resource selection task. In this model, four aspects are considered simultaneously in the choice of the resource: document's relevance to the given query, time, monetary cost, and similarity between resources. An optimized solution is achieved by comparing the performances of all possible candidates. Some variations of the basic model are also given, which improve the basic model's efficiency

    WARP: A ICN architecture for social data

    Full text link
    Social network companies maintain complete visibility and ownership of the data they store. However users should be able to maintain full control over their content. For this purpose, we propose WARP, an architecture based upon Information-Centric Networking (ICN) designs, which expands the scope of the ICN architecture beyond media distribution, to provide data control in social networks. The benefit of our solution lies in the lightweight nature of the protocol and in its layered design. With WARP, data distribution and access policies are enforced on the user side. Data can still be replicated in an ICN fashion but we introduce control channels, named \textit{thread updates}, which ensures that the access to the data is always updated to the latest control policy. WARP decentralizes the social network but still offers APIs so that social network providers can build products and business models on top of WARP. Social applications run directly on the user's device and store their data on the user's \textit{butler} that takes care of encryption and distribution. Moreover, users can still rely on third parties to have high-availability without renouncing their privacy

    Stochastic Models for the 3x+1 and 5x+1 Problems

    Full text link
    This paper discusses stochastic models for predicting the long-time behavior of the trajectories of orbits of the 3x+1 problem and, for comparison, the 5x+1 problem. The stochastic models are rigorously analyzable, and yield heuristic predictions (conjectures) for the behavior of 3x+1 orbits and 5x+1 orbits.Comment: 68 pages, 9 figures, 4 table

    A Hybrid Optimized Weighted Minimum Spanning Tree for the Shortest Intrapath Selection in Wireless Sensor Network

    Get PDF
    Wireless sensor network (WSN) consists of sensor nodes that need energy efficient routing techniques as they have limited battery power, computing, and storage resources. WSN routing protocols should enable reliable multihop communication with energy constraints. Clustering is an effective way to reduce overheads and when this is aided by effective resource allocation, it results in reduced energy consumption. In this work, a novel hybrid evolutionary algorithm called Bee Algorithm-Simulated Annealing Weighted Minimal Spanning Tree (BASA-WMST) routing is proposed in which randomly deployed sensor nodes are split into the best possible number of independent clusters with cluster head and optimal route. The former gathers data from sensors belonging to the cluster, forwarding them to the sink. The shortest intrapath selection for the cluster is selected using Weighted Minimum Spanning Tree (WMST). The proposed algorithm computes the distance-based Minimum Spanning Tree (MST) of the weighted graph for the multihop network. The weights are dynamically changed based on the energy level of each sensor during route selection and optimized using the proposed bee algorithm simulated annealing algorithm

    A Bound-Independent Pruning Technique to Speeding up Tree-Based Complete Search Algorithms for Distributed Constraint Optimization Problems

    Get PDF
    Complete search algorithms are important methods for solving Distributed Constraint Optimization Problems (DCOPs), which generally utilize bounds to prune the search space. However, obtaining high-quality lower bounds is quite expensive since it requires each agent to collect more information aside from its local knowledge, which would cause tremendous traffic overheads. Instead of bothering for bounds, we propose a Bound-Independent Pruning (BIP) technique for existing tree-based complete search algorithms, which can independently reduce the search space only by exploiting local knowledge. Specifically, BIP enables each agent to determine a subspace containing the optimal solution only from its local constraints along with running contexts, which can be further exploited by any search strategies. Furthermore, we present an acceptability testing mechanism to tailor existing tree-based complete search algorithms to search the remaining space returned by BIP when they hold inconsistent contexts. Finally, we prove the correctness of our technique and the experimental results show that BIP can significantly speed up state-of-the-art tree-based complete search algorithms on various standard benchmarks

    Distributed search trees: Fault tolerance in an asynchronous environment

    Get PDF
    ISSN:1432-4350ISSN:1433-049
    corecore