6 research outputs found

    Domain-Specialized Cache Management for Graph Analytics

    Get PDF
    Graph analytics power a range of applications in areas as diverse as finance, networking and business logistics. A common property of graphs used in the domain of graph analytics is a power-law distribution of vertex connectivity, wherein a small number of vertices are responsible for a high fraction of all connections in the graph. These richly-connected, hot, vertices inherently exhibit high reuse. However, this work finds that state-of-the-art hardware cache management schemes struggle in capitalizing on their reuse due to highly irregular access patterns of graph analytics. In response, we propose GRASP, domain-specialized cache management at the last-level cache for graph analytics. GRASP augments existing cache policies to maximize reuse of hot vertices by protecting them against cache thrashing, while maintaining sufficient flexibility to capture the reuse of other vertices as needed. GRASP keeps hardware cost negligible by leveraging lightweight software support to pinpoint hot vertices, thus eliding the need for storage-intensive prediction mechanisms employed by state-of-the-art cache management schemes. On a set of diverse graph-analytic applications with large high-skew graph datasets, GRASP outperforms prior domain-agnostic schemes on all datapoints, yielding an average speed-up of 4.2% (max 9.4%) over the best-performing prior scheme. GRASP remains robust on low-/no-skew datasets, whereas prior schemes consistently cause a slowdown.Comment: No content changes from the previous versio

    State Management for Efficient Event Pattern Detection

    Get PDF
    Event Stream Processing (ESP) Systeme überwachen kontinuierliche Datenströme, um benutzerdefinierte Queries auszuwerten. Die Herausforderung besteht darin, dass die Queryverarbeitung zustandsbehaftet ist und die Anzahl von Teilübereinstimmungen mit der Größe der verarbeiteten Events exponentiell anwächst. Die Dynamik von Streams und die Notwendigkeit, entfernte Daten zu integrieren, erschweren die Zustandsverwaltung. Erstens liefern heterogene Eventquellen Streams mit unvorhersehbaren Eingaberaten und Queryselektivitäten. Während Spitzenzeiten ist eine erschöpfende Verarbeitung unmöglich, und die Systeme müssen auf eine Best-Effort-Verarbeitung zurückgreifen. Zweitens erfordern Queries möglicherweise externe Daten, um ein bestimmtes Event für eine Query auszuwählen. Solche Abhängigkeiten sind problematisch: Das Abrufen der Daten unterbricht die Stream-Verarbeitung. Ohne eine Eventauswahl auf Grundlage externer Daten wird das Wachstum von Teilübereinstimmungen verstärkt. In dieser Dissertation stelle ich Strategien für optimiertes Zustandsmanagement von ESP Systemen vor. Zuerst ermögliche ich eine Best-Effort-Verarbeitung mittels Load Shedding. Dabei werden sowohl Eingabeeevents als auch Teilübereinstimmungen systematisch verworfen, um eine Latenzschwelle mit minimalem Qualitätsverlust zu garantieren. Zweitens integriere ich externe Daten, indem ich das Abrufen dieser von der Verwendung in der Queryverarbeitung entkoppele. Mit einem effizienten Caching-Mechanismus vermeide ich Unterbrechungen durch Übertragungslatenzen. Dazu werden externe Daten basierend auf ihrer erwarteten Verwendung vorab abgerufen und mittels Lazy Evaluation bei der Eventauswahl berücksichtigt. Dabei wird ein Kostenmodell verwendet, um zu bestimmen, wann welche externen Daten abgerufen und wie lange sie im Cache aufbewahrt werden sollen. Ich habe die Effektivität und Effizienz der vorgeschlagenen Strategien anhand von synthetischen und realen Daten ausgewertet und unter Beweis gestellt.Event stream processing systems continuously evaluate queries over event streams to detect user-specified patterns with low latency. However, the challenge is that query processing is stateful and it maintains partial matches that grow exponentially in the size of processed events. State management is complicated by the dynamicity of streams and the need to integrate remote data. First, heterogeneous event sources yield dynamic streams with unpredictable input rates, data distributions, and query selectivities. During peak times, exhaustive processing is unreasonable, and systems shall resort to best-effort processing. Second, queries may require remote data to select a specific event for a pattern. Such dependencies are problematic: Fetching the remote data interrupts the stream processing. Yet, without event selection based on remote data, the growth of partial matches is amplified. In this dissertation, I present strategies for optimised state management in event pattern detection. First, I enable best-effort processing with load shedding that discards both input events and partial matches. I carefully select the shedding elements to satisfy a latency bound while striving for a minimal loss in result quality. Second, to efficiently integrate remote data, I decouple the fetching of remote data from its use in query evaluation by a caching mechanism. To this end, I hide the transmission latency by prefetching remote data based on anticipated use and by lazy evaluation that postpones the event selection based on remote data to avoid interruptions. A cost model is used to determine when to fetch which remote data items and how long to keep them in the cache. I evaluated the above techniques with queries over synthetic and real-world data. I show that the load shedding technique significantly improves the recall of pattern detection over baseline approaches, while the technique for remote data integration significantly reduces the pattern detection latency

    Middleware de comunicações para a internet móvel futura

    Get PDF
    Doutoramento em Informática (MAP-I)A evolução constante em novas tecnologias que providenciam suporte à forma como os nossos dispositivos se ligam, bem como a forma como utilizamos diferentes capacidades e serviços on-line, criou um conjunto sem precedentes de novos desafios que motivam o desenvolvimento de uma recente área de investigação, denominada de Internet Futura. Nesta nova área de investigação, novos aspectos arquiteturais estão ser desenvolvidos, os quais, através da re-estruturação de componentes nucleares subjacentesa que compõem a Internet, progride-a de uma forma capaz de não são fazer face a estes novos desafios, mas também de a preparar para os desafios de amanhã. Aspectos chave pertencendo a este conjunto de desafios são os ambientes de rede heterogéneos compostos por diferentes tipos de redes de acesso, a cada vez maior mudança do tráfego peer-to-peer (P2P) como o tipo de tráfego mais utilizado na Internet, a orquestração de cenários da Internet das Coisas (IoT) que exploram mecanismos de interação Maquinaa-Maquina (M2M), e a utilização de mechanismos centrados na informação (ICN). Esta tese apresenta uma nova arquitetura capaz de simultaneamente fazer face a estes desafios, evoluindo os procedimentos de conectividade e entidades envolvidas, através da adição de uma camada de middleware, que age como um mecanismo de gestão de controlo avançado. Este mecanismo de gestão de controlo aproxima as entidades de alto nível (tais como serviços, aplicações, entidades de gestão de mobilidade, operações de encaminhamento, etc.) com as componentes das camadas de baixo nível (por exemplo, camadas de ligação, sensores e atuadores), permitindo uma otimização conjunta dos procedimentos de ligação subjacentes. Os resultados obtidos não só sublinham a flexibilidade dos mecanismos que compoem a arquitetura, mas também a sua capacidade de providenciar aumentos de performance quando comparados com outras soluÇÕes de funcionamento especÍfico, enquanto permite um maior leque de cenáios e aplicações.The constant evolution in new technologies that support the way our devices are able to connect, as well the way we use available on-line services and capabilities, has created a set of unprecedented new challenges that motivated the development of a recent research trend known as the Future Internet. In this research trend, new architectural aspects are being developed which, through the restructure of underlying core aspects composing the Internet, reshapes it in a way capable of not only facing these new challenges, but also preparing it to tackle tomorrow’s new set of complex issues. Key aspects belonging to this set of challenges are heterogeneous networking environments composed by di↵erent kinds of wireless access networks, the evergrowing change from peer-to-peer (P2P) to video as the most used kind of traffic in the Internet, the orchestration of Internet of Things (IoT) scenarios exploiting Machine-to-Machine (M2M) interactions, and the usage of Information-Centric Networking (ICN). This thesis presents a novel framework able to simultaneous tackle these challenges, empowering connectivity procedures and entities with a middleware acting as an advanced control management mechanism. This control management mechanism brings together both high-level entities (such as application services, mobility management entities, routing operations, etc.) with the lower layer components (e.g., link layers, sensor devices, actuators), allowing for a joint optimization of the underlying connectivity and operational procedures. Results highlight not only the flexibility of the mechanisms composing the framework, but also their ability in providing performance increases when compared with other specific purpose solutions, while allowing a wider range of scenarios and deployment possibilities

    Technologies and Applications for Big Data Value

    Get PDF
    This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part “Technologies and Methods” contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part “Processes and Applications” details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems

    Technologies and Applications for Big Data Value

    Get PDF
    This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part “Technologies and Methods” contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part “Processes and Applications” details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems
    corecore