2,708 research outputs found

    HEC: Collaborative Research: SAM^2 Toolkit: Scalable and Adaptive Metadata Management for High-End Computing

    Get PDF
    The increasing demand for Exa-byte-scale storage capacity by high end computing applications requires a higher level of scalability and dependability than that provided by current file and storage systems. The proposal deals with file systems research for metadata management of scalable cluster-based parallel and distributed file storage systems in the HEC environment. It aims to develop a scalable and adaptive metadata management (SAM2) toolkit to extend features of and fully leverage the peak performance promised by state-of-the-art cluster-based parallel and distributed file storage systems used by the high performance computing community. There is a large body of research on data movement and management scaling, however, the need to scale up the attributes of cluster-based file systems and I/O, that is, metadata, has been underestimated. An understanding of the characteristics of metadata traffic, and an application of proper load-balancing, caching, prefetching and grouping mechanisms to perform metadata management correspondingly, will lead to a high scalability. It is anticipated that by appropriately plugging the scalable and adaptive metadata management components into the state-of-the-art cluster-based parallel and distributed file storage systems one could potentially increase the performance of applications and file systems, and help translate the promise and potential of high peak performance of such systems to real application performance improvements. The project involves the following components: 1. Develop multi-variable forecasting models to analyze and predict file metadata access patterns. 2. Develop scalable and adaptive file name mapping schemes using the duplicative Bloom filter array technique to enforce load balance and increase scalability 3. Develop decentralized, locality-aware metadata grouping schemes to facilitate the bulk metadata operations such as prefetching. 4. Develop an adaptive cache coherence protocol using a distributed shared object model for client-side and server-side metadata caching. 5. Prototype the SAM2 components into the state-of-the-art parallel virtual file system PVFS2 and a distributed storage data caching system, set up an experimental framework for a DOE CMS Tier 2 site at University of Nebraska-Lincoln and conduct benchmark, evaluation and validation studies

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Exploiting Geographical and Temporal Locality to Boost Search Efficiency in Peer-to-Peer Systems

    Get PDF
    As a hot research topic, many search algorithms have been presented and studied for unstructured peer-to-peer (P2P) systems during the past few years. Unfortunately, current approaches either cannot yield good lookup performance, or incur high search cost and system maintenance overhead. The poor search efficiency of these approaches may seriously limit the scalability of current unstructured P2P systems. In this paper, we propose to exploit two-dimensional locality to improve P2P system search efficiency. We present a locality-aware P2P system architecture called Foreseer, which explicitly exploits geographical locality and temporal locality by constructing a neighbor overlay and a friend overlay, respectively. Each peer in Foreseer maintains a small number of neighbors and friends along with their content filters used as distributed indices. By combining the advantages of distributed indices and the utilization of two-dimensional locality, our scheme significantly boosts P2P search efficiency while introducing only modest overhead. In addition, several alternative forwarding policies of Foreseer search algorithm are studied in depth on how to fully exploit the two-dimensional locality

    Data replication and update propagation in XML P2P data management systems

    Get PDF
    XML P2P data management systems are P2P systems that use XML as the underlying data format shared between peers in the network. These systems aim to bring the benefits of XML and P2P systems to the distributed data management field. However, P2P systems are known for their lack of central control and high degree of autonomy. Peers may leave the network at any time at will, increasing the risk of data loss. Despite this, most research in XML P2P systems focus on novel and efficient XML indexing and retrieval techniques. Mechanisms for ensuring data availability in XML P2P systems has received comparatively little attention. This project attempts to address this issue. We design an XML P2P data management framework to improve data availability. This framework includes mechanisms for wide-spread data replication, replica location and update propagation. It allows XML documents to be broken down into fragments. By doing so, we aim to reduce the cost of replicating data by distributing smaller XML fragments throughout the network rather than entire documents. To tackle the data replication problem, we propose a suite of selection and placement algorithms that may be interchanged to form a particular replication strategy. To support the placement of replicas anywhere in the network, we use a Fragment Location Catalogue, a global index that maintains the locations of replicas. We also propose a lazy update propagation algorithm to propagate updates to replicas. Experiments show that the data replication algorithms improve data availability in our experimental network environment. We also find that breaking XML documents into smaller pieces and replicating those instead of whole XML documents considerably reduces the replication cost, but at the price of some loss in data availability. For the update propagation tests, we find that the probability that queries return up-to-date results increases, but improvements to the algorithm are necessary to handle environments with high update rates

    Search strategies in unstructured overlays

    Get PDF
    Trabalho de projecto de mestrado em Engenharia Informática, apresentado à Universidade de Lisboa, através da Faculdade de Ciências, 2008Unstructured peer-to-peer networks have a low maintenance cost, high resilience and tolerance to the continuous arrival and departure of nodes. In these networks search is usually performed by flooding, which generates a high number of duplicate messages. To improve scalability, unstructured overlays evolved to a two-tiered architecture where regular nodes rely on special nodes, called supernodes or superpeers, to locate resources, thus reducing the scope of flooding based searches. While this approach takes advantage of node heterogeneity, it makes the overlay less resilient to accidental and malicious faults, and less attractive to users concerned with the consumption of their resources and who may not desire to commit additional resources that are required by nodes selected as superpeers. Another point of concern is churn, defined as the constant entry and departure of nodes. Churn affects both structured and unstructured overlay networks and, in order to build resilient search protocols, it must be taken into account. This dissertation proposes a novel search algorithm, called FASE, which combines a replication policy and a search space division technique to achieve low hop counts using a small number of messages, on unstructured overlays with nonhierarquical topologies. The problem of churn is mitigated by a distributed monitoring algorithm designed with FASE in mind. Simulation results validate FASE efficiency when compared to other search algorithms for peer-to-peer networks. The evaluation of the distributed monitoring algorithm shows that it maintains FASE performance when subjected to churn.Os sistemas peer-to-peer, como aplicações de partilha e distribuição de conteúdos ou voz-sobre-IP, são construídos sobre redes sobrepostas. Redes sobrepostas são redes virtuais que existem sobre uma rede subjacente, em que a topologia da rede sobreposta não tem de ter uma correspondência com a topologia da rede subjacente. Ao contrário das suas congéneres estruturadas, as redes sobrepostas não-estru-turadas não restringem a localização dos seus participantes, ou seja, não limitam a escolha de vizinhos de um dado nó, o que torna a sua manutenção mais simples. O baixo custo de manutenção das redes sobrepostas não-estruturadas torna estas especialmente adequadas para a construção de sistemas peer-to-peer capazes de tolerar o comportamento dinâmico dos seus participantes, uma vez que estas redes são permanentemente afectadas pela entrada e saída de nós na rede, um fénomeno conhecido como churn. O algoritmo de pesquisa mais comum em redes sobrepostas não-estruturadas consiste em inundar a rede, o que origina uma grande quantidade de mensagens duplicadas por cada pesquisa. A escalabilidade destes algoritmos é limitada porque consomem demasiados recursos da rede em sistemas com muitos participantes. Para reduzir o número de mensagens, as redes sobrepostas não-estruturadas podem ser organizadas em topologias hierárquicas. Nestas topologias alguns nós da rede, chamados supernós, assumem um papel mais importante, responsabilizando-se pela localização de objectos. A utilização de supernós cria novos problemas, como a sua selecção e a dependência da rede de uma pequena percentagem dos nós. Esta dissertação apresenta um novo algoritmo de pesquisa, chamado FASE, criado para operar sobre redes sobrepostas não estruturadas com topologias não-hierárquicas. Este algoritmo combina uma política de replicação com uma técnica de divisão do espaço de procura para resolver pesquisas ao alcançe de um número reduzido de saltos com o menor custo possível. Adicionalmente, o algoritmo procura nivelar a contribuição dos participantes, já que todos contribuem de uma forma semelhante para o desempenho da pesquisa. A estratégia seguida pelo algo- ritmo consiste em dividir tanto os nós da rede como as chaves dos seus conteúdos por diferentes “frequências” e replicar chaves nas respectivas frequências, sem, no entanto, limitar a localização de um nó ou impor uma estrutura à rede ou mesmo aplicar uma definição rígida de chave. Com o objectivo de mitigar o problema do churn, é apresentado um algoritmo de monitorização distribuído para as réplicas originadas pelo FASE. Os algoritmos propostos são avaliados através de simulações, que validam a eficiência do FASE quando comparado com outros algoritmos de pesquisa em redes sobrepostas não-estruturadas. É também demonstrado que o FASE mantém o seu desempenho em redes sob o efeito do churn quando combinado com o algoritmo de monitorização

    Distributed Search in Semantic Web Service Discovery

    Get PDF
    This thesis presents a framework for semantic Web Service discovery using descriptive (non-functional) service characteristics in a large-scale, multi-domain setting. The framework uses Web Ontology Language for Services (OWL-S) to design a template for describing non-functional service parameters in a way that facilitates service discovery, and presents a layered scheme for organizing ontologies used in service description. This service description scheme serves as a core for desigining the four main functions of a service directory: a template-based user interface, semantic query expansion algorithms, a two-level indexing scheme that combines Bloom filters with a Distributed Hash Table, and a distributed approach for storing service description. The service directory is, in turn, implemented as an extension of the Open Service Discovery Architecture. The search algorithms presented in this thesis are designed to maximize precision and completeness of service discovery, while the distributed design of the directory allows individual administrative domains to retain a high degree of independence and maintain access control to information about their services

    MOOClm: Learner Modelling for MOOCs

    Get PDF
    Massively Open Online Learning systems, or MOOCs, generate enormous quantities of learning data. Analysis of this data has considerable potential benefits for learners, educators, teaching administrators and educational researchers. How to realise this potential is still an open question. This thesis explores use of such data to create a rich Open Learner Model (OLM). The OLM is designed to take account of the restrictions and goals of lifelong learner model usage. Towards this end, we structure the learner model around a standard curriculum-based ontology. Since such a learner model may be very large, we integrate a visualisation based on a highly scalable circular treemap representation. The visualisation allows the student to either drill down further into increasingly detailed views of the learner model, or filter the model down to a smaller, selected subset. We introduce the notion of a set of Reference learner models, such as an ideal student, a typical student, or a selected set of learning objectives within the curriculum. Introducing these provides a foundation for a learner to make a meaningful evaluation of their own model by comparing against a reference model. To validate the work, we created MOOClm to implement this framework, then used this in the context of a Small Private Online Course (SPOC) run at the University of Sydney. We also report a qualitative usability study to gain insights into the ways a learner can make use of the OLM. Our contribution is the design and validation of MOOClm, a framework that harnesses MOOC data to create a learner model with an OLM interface for student and educator usage
    corecore