189 research outputs found

    CMS computing operations during run 1

    Get PDF
    During the first run, CMS collected and processed more than 10B data events and simulated more than 15B events. Up to 100k processor cores were used simultaneously and 100PB of storage was managed. Each month petabytes of data were moved and hundreds of users accessed data samples. In this document we discuss the operational experience from this first run. We present the workflows and data flows that were executed, and we discuss the tools and services developed, and the operations and shift models used to sustain the system. Many techniques were followed from the original computing planning, but some were reactions to difficulties and opportunities. We also address the lessons learned from an operational perspective, and how this is shaping our thoughts for 2015

    A Roadmap for HEP Software and Computing R&D for the 2020s

    Get PDF
    Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade.Peer reviewe

    Transaction-filtering data mining and a predictive model for intelligent data management

    Get PDF
    This thesis, first of all, proposes a new data mining paradigm (transaction-filtering association rule mining) addressing a time consumption issue caused by the repeated scans of original transaction databases in conventional associate rule mining algorithms. An in-memory transaction filter is designed to discard those infrequent items in the pruning steps. This filter is a data structure to be updated at the end of each iteration. The results based on an IBM benchmark show that an execution time reduction of 10% - 19% is achieved compared with the base case. Next, a data mining-based predictive model is then established contributing to intelligent data management within the context of Centre for Grid Computing. The capability of discovering unseen rules, patterns and correlations enables data mining techniques favourable in areas where massive amounts of data are generated. The past behaviours of two typical scenarios (network file systems and Data Grids) have been analyzed to build the model. The future popularity of files can be forecasted with an accuracy of 90% by deploying the above predictor based on the given real system traces. A further step towards intelligent policy design is achieved by analyzing the prediction results of files’ future popularity. The real system trace-based simulations have shown improvements of 2-4 times in terms of data response time in network file system scenario and 24% mean job time reduction in Data Grids compared with conventional cases.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Distributed Late-binding Micro-scheduling and Data Caching for Data-Intensive Workflows

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 06-07-2015El mundo de hoy en día se encuentra inundado por ingentes cantidades de información digital procedente de muy diversas fuentes. Todo apunta, además, a que esta tendencia se agudizará en el futuro. Ni la industria, ni la sociedad en general, ni, muy particularmente, la ciencia, permanecen indiferentes ante este hecho. Al contrario, se esfuerzan por obtener el máximo provecho de esta información, lo que significa que deben capturarla, transferirla, almacenarla y procesarla puntual y eficientemente, utilizando una amplia gama de recursos computacionales. Pero esta tarea no es siempre sencilla. Un ejemplo representativo de los desafíos que suponen el manejo y procesamiento de grandes cantidades de datos es el de los experimentos de física de partículas del Large Hadron Collider (LHC), en Ginebra, que cada año deben gestionar decenas de petabytes de información. Basándonos en la experiencia de una de estas colaboraciones, hemos estudiado los principales problemas relativos a la gestión de volúmenes de datos masivos y a la ejecución de vastos flujos de trabajo que necesitan consumirlos. En este contexto, hemos desarrollado una arquitectura de propósito general para la planificación y ejecución de flujos de trabajo con importantes requisitos de datos, que hemos llamado Task Queue. Este nuevo sistema aprovecha el modelo de asignación tardía basado en agentes que ha ayudado a los experimentos del LHC a superar los problemas asociados con la heterogeneidad y la complejidad de las grandes infraestructuras grid de computación. Nuestra propuesta presenta varias mejoras con respecto a los sistemas existentes. Los agentes de ejecución de la arquitectura Task Queue comparten una tabla hash distribuida (Distributed Hash Table, DHT) y realizan la asignación de tareas de una manera cooperativa. De esta forma, se evitan los problemas de escalabilidad de los algoritmos centralizados de asignación y se mejoran los tiempos de ejecución. Esta escalabilidad nos permite realizar una microplanificación de grano fino lo cual posibilita nuevas funcionalidades, como la implementación de una cache distribuida en los nodos de ejecución y el uso de la información de ubicación de los datos en las decisiones de asignación de tareas. Esto mejora la eficiencia del procesado de datos y ayuda a aliviar los habitualmente congestionados servicios de almacenamiento del grid. Además, nuestro sistema es más robusto frente a problemas en la interacción con la cola central de tareas y ofrece mejor comportamiento en situaciones con patrones de acceso a datos exigentes o en ausencia de servicios de almacenamiento locales. Todo esto ha sido demostrado en una amplia serie de pruebas de evaluación. Dado que nuestro procedimiento de planificación de tareas distribuido requiere el uso de mensajes de broadcast, también hemos realizado un profundo estudio de las posibles aproximaciones a la implementación de esta operación sobre el DHT Kademlia, el cual es utilizado para la cache de datos compartida. Kademlia ofrece enrutamiento a nodos individuales pero no incluye ninguna primitiva de broadcast. Nuestro trabajo expone las peculiaridades de este sistema, particularmente su métrica basada en la operación XOR, y estudia analíticamente qué técnicas de broadcast pueden ser usadas con él. También se ha desarrollado un modelo que estima la cobertura de nodos en función de la probabilidad que cada mensaje individual alcance su destino correctamente. Como validación, los algoritmos se han implementado y se han evaluado exhaustivamente. Además, proponemos varias técnicas para mejorar los protocolos en situaciones adversas, por ejemplo cuando el sistema presenta una alta rotación de nodos o la tasa de error en las entregas no es despreciable. Esta técnicas incluyen redundancia, reenvío e inundación (flooding), así como combinaciones de las mismas. Presentamos un análisis de las fortalezas y debilidades de los diferentes algoritmos y las mencionadas técnicas complementarias.Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

    Video Caching, Analytics and Delivery at the Wireless Edge: A Survey and Future Directions

    Get PDF
    Future wireless networks will provide high bandwidth, low-latency, and ultra-reliable Internet connectivity to meet the requirements of different applications, ranging from mobile broadband to the Internet of Things. To this aim, mobile edge caching, computing, and communication (edge-C3) have emerged to bring network resources (i.e., bandwidth, storage, and computing) closer to end users. Edge-C3 allows improving the network resource utilization as well as the quality of experience (QoE) of end users. Recently, several video-oriented mobile applications (e.g., live content sharing, gaming, and augmented reality) have leveraged edge-C3 in diverse scenarios involving video streaming in both the downlink and the uplink. Hence, a large number of recent works have studied the implications of video analysis and streaming through edge-C3. This article presents an in-depth survey on video edge-C3 challenges and state-of-the-art solutions in next-generation wireless and mobile networks. Specifically, it includes: a tutorial on video streaming in mobile networks (e.g., video encoding and adaptive bitrate streaming); an overview of mobile network architectures, enabling technologies, and applications for video edge-C3; video edge computing and analytics in uplink scenarios (e.g., architectures, analytics, and applications); and video edge caching, computing and communication methods in downlink scenarios (e.g., collaborative, popularity-based, and context-aware). A new taxonomy for video edge-C3 is proposed and the major contributions of recent studies are first highlighted and then systematically compared. Finally, several open problems and key challenges for future research are outlined

    Clustering algorithm for D2D communication in next generation cellular networks : thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Engineering, Massey University, Auckland, New Zealand

    Get PDF
    Next generation cellular networks will support many complex services for smartphones, vehicles, and other devices. To accommodate such services, cellular networks need to go beyond the capabilities of their previous generations. Device-to-Device communication (D2D) is a key technology that can help fulfil some of the requirements of future networks. The telecommunication industry expects a significant increase in the density of mobile devices which puts more pressure on centralized schemes and poses risk in terms of outages, poor spectral efficiencies, and low data rates. Recent studies have shown that a large part of the cellular traffic pertains to sharing popular contents. This highlights the need for decentralized and distributive approaches to managing multimedia traffic. Content-sharing via D2D clustered networks has emerged as a popular approach for alleviating the burden on the cellular network. Different studies have established that D2D communication in clusters can improve spectral and energy efficiency, achieve low latency while increasing the capacity of the network. To achieve effective content-sharing among users, appropriate clustering strategies are required. Therefore, the aim is to design and compare clustering approaches for D2D communication targeting content-sharing applications. Currently, most of researched and implemented clustering schemes are centralized or predominantly dependent on Evolved Node B (eNB). This thesis proposes a distributed architecture that supports clustering approaches to incorporate multimedia traffic. A content-sharing network is presented where some D2D User Equipment (DUE) function as content distributors for nearby devices. Two promising techniques are utilized, namely, Content-Centric Networking and Network Virtualization, to propose a distributed architecture, that supports efficient content delivery. We propose to use clustering at the user level for content-distribution. A weighted multi-factor clustering algorithm is proposed for grouping the DUEs sharing a common interest. Various performance parameters such as energy consumption, area spectral efficiency, and throughput have been considered for evaluating the proposed algorithm. The effect of number of clusters on the performance parameters is also discussed. The proposed algorithm has been further modified to allow for a trade-off between fairness and other performance parameters. A comprehensive simulation study is presented that demonstrates that the proposed clustering algorithm is more flexible and outperforms several well-known and state-of-the-art algorithms. The clustering process is subsequently evaluated from an individual user’s perspective for further performance improvement. We believe that some users, sharing common interests, are better off with the eNB rather than being in the clusters. We utilize machine learning algorithms namely, Deep Neural Network, Random Forest, and Support Vector Machine, to identify the users that are better served by the eNB and form clusters for the rest of the users. This proposed user segregation scheme can be used in conjunction with most clustering algorithms including the proposed multi-factor scheme. A comprehensive simulation study demonstrates that with such novel user segregation, the performance of individual users, as well as the whole network, can be significantly improved for throughput, energy consumption, and fairness

    Novel applications and contexts for the cognitive packet network

    Get PDF
    Autonomic communication, which is the development of self-configuring, self-adapting, self-optimising and self-healing communication systems, has gained much attention in the network research community. This can be explained by the increasing demand for more sophisticated networking technologies with physical realities that possess computation capabilities and can operate successfully with minimum human intervention. Such systems are driving innovative applications and services that improve the quality of life of citizens both socially and economically. Furthermore, autonomic communication, because of its decentralised approach to communication, is also being explored by the research community as an alternative to centralised control infrastructures for efficient management of large networks. This thesis studies one of the successful contributions in the autonomic communication research, the Cognitive Packet Network (CPN). CPN is a highly scalable adaptive routing protocol that allows for decentralised control in communication. Consequently, CPN has achieved significant successes, and because of the direction of research, we expect it to continue to find relevance. To investigate this hypothesis, we research new applications and contexts for CPN. This thesis first studies Information-Centric Networking (ICN), a future Internet architecture proposal. ICN adopts a data-centric approach such that contents are directly addressable at the network level and in-network caching is easily supported. An optimal caching strategy for an information-centric network is first analysed, and approximate solutions are developed and evaluated. Furthermore, a CPN inspired forwarding strategy for directing requests in such a way that exploits the in-network caching capability of ICN is proposed. The proposed strategy is evaluated via discrete event simulations and shown to be more effective in its search for local cache hits compared to the conventional methods. Finally, CPN is proposed to implement the routing system of an Emergency Cyber-Physical System for guiding evacuees in confined spaces in emergency situations. By exploiting CPN’s QoS capabilities, different paths are assigned to evacuees based on their ongoing health conditions using well-defined path metrics. The proposed system is evaluated via discrete-event simulations and shown to improve survival chances compared to a static system that treats evacuees in the same way.Open Acces
    corecore