301 research outputs found

    On the scalability of LISP and advanced overlaid services

    Get PDF
    In just four decades the Internet has gone from a lab experiment to a worldwide, business critical infrastructure that caters to the communication needs of almost a half of the Earth's population. With these figures on its side, arguing against the Internet's scalability would seem rather unwise. However, the Internet's organic growth is far from finished and, as billions of new devices are expected to be joined in the not so distant future, scalability, or lack thereof, is commonly believed to be the Internet's biggest problem. While consensus on the exact form of the solution is yet to be found, the need for a semantic decoupling of a node's location and identity, often called a location/identity separation, is generally accepted as a promising way forward. Typically, this requires the introduction of new network elements that provide the binding of the two names-paces and caches that avoid hampering router packet forwarding speeds. But due to this increased complexity the solution's scalability is itself questioned. This dissertation evaluates the suitability of using the Locator/ID Separation Protocol (LISP), one of the most successful proposals to follow the location/identity separation guideline, as a solution to the Internet's scalability problem. However, because the deployment of any new architecture depends not only on solving the incumbent's technical problems but also on the added value that it brings, our approach follows two lines. In the first part of the thesis, we develop the analytical tools to evaluate LISP's control plane scalability while in the second we show that the required control/data plane separation provides important benefits that could drive LISP's adoption. As a first step to evaluating LISP's scalability, we propose a methodology for an analytical analysis of cache performance that relies on the working-set theory to estimate traffic locality of reference. One of our main contribution is that we identify the conditions network traffic must comply with for the theory to be applicable and then use the result to develop a model that predicts average cache miss rates. Furthermore, we study the model's suitability for long term cache provisioning and assess the cache's vulnerability in front of malicious users through an extension that accounts for cache polluting traffic. As a last step, we investigate the main sources of locality and their impact on the asymptotic scalability of the LISP cache. An important finding here is that destination popularity distribution can accurately describe cache performance, independent of the much harder to model short term correlations. Under a small set of assumptions, this result finally enables us to characterize asymptotic scalability with respect to the amount of prefixes (Internet growth) and users (growth of the LISP site). We validate the models and discuss the accuracy of our assumptions using several one-day-long packet traces collected at the egress points of a campus and an academic network. To show the added benefits that could drive LISP's adoption, in the second part of the thesis we investigate the possibilities of performing inter-domain multicast and improving intra-domain routing. Although the idea of using overlaid services to improve underlay performance is not new, this dissertation argues that LISP offers the right tools to reliably and easily implement such services due to its reliance on network instead of application layer support. In particular, we present and extensively evaluate Lcast, a network-layer single-source multicast framework designed to merge the robustness and efficiency of IP multicast with the configurability and low deployment cost of application-layer overlays. Additionally, we describe and evaluate LISP-MPS, an architecture capable of exploiting LISP to minimize intra-domain routing tables and ensure, among other, support for multi protocol switching and virtual networks.En menos de cuatro décadas Internet ha evolucionado desde un experimento de laboratorio hasta una infraestructura de alcance mundial, de importancia crítica para negocios y que atiende a las necesidades de casi un tercio de los habitantes del planeta. Con estos números, es difícil tratar de negar la necesidad de escalabilidad de Internet. Sin embargo, el crecimiento orgánico de Internet está aún lejos de finalizar ya que se espera que mil millones de dispositivos nuevos se conecten en el futuro cercano. Así pues, la falta de escalabilidad es el mayor problema al que se enfrenta Internet hoy en día. Aunque la solución definitiva al problema está aún por definir, la necesidad de desacoplar semánticamente la localización e identidad de un nodo, a menudo llamada locator/identifier separation, es generalmente aceptada como un camino prometedor a seguir. Sin embargo, esto requiere la introducción de nuevos dispositivos en la red que unan los dos espacios de nombres disjuntos resultantes y de cachés que almacenen los enlaces temporales entre ellos con el fin de aumentar la velocidad de transmisión de los enrutadores. A raíz de esta complejidad añadida, la escalabilidad de la solución en si misma es también cuestionada. Este trabajo evalúa la idoneidad de utilizar Locator/ID Separation Protocol (LISP), una de las propuestas más exitosas que siguen la pauta locator/identity separation, como una solución para la escalabilidad de la Internet. Con tal fin, desarrollamos las herramientas analíticas para evaluar la escalabilidad del plano de control de LISP pero también para mostrar que la separación de los planos de control y datos proporciona un importante valor añadido que podría impulsar la adopción de LISP. Como primer paso para evaluar la escalabilidad de LISP, proponemos una metodología para un estudio analítico del rendimiento de la caché que se basa en la teoría del working-set para estimar la localidad de referencias. Identificamos las condiciones que el tráfico de red debe cumplir para que la teoría sea aplicable y luego desarrollamos un modelo que predice las tasas medias de fallos de caché con respecto a parámetros de tráfico fácilmente medibles. Por otra parte, para demostrar su versatilidad y para evaluar la vulnerabilidad de la caché frente a usuarios malintencionados, extendemos el modelo para considerar el rendimiento frente a tráfico generado por usuarios maliciosos. Como último paso, investigamos como usar la popularidad de los destinos para estimar el rendimiento de la caché, independientemente de las correlaciones a corto plazo. Bajo un pequeño conjunto de hipótesis conseguimos caracterizar la escalabilidad con respecto a la cantidad de prefijos (el crecimiento de Internet) y los usuarios (crecimiento del sitio LISP). Validamos los modelos y discutimos la exactitud de nuestras suposiciones utilizando varias trazas de paquetes reales. Para mostrar los beneficios adicionales que podrían impulsar la adopción de LISP, también investigamos las posibilidades de realizar multidifusión inter-dominio y la mejora del enrutamiento dentro del dominio. Aunque la idea de utilizar servicios superpuestos para mejorar el rendimiento de la capa subyacente no es nueva, esta tesis sostiene que LISP ofrece las herramientas adecuadas para poner en práctica de forma fiable y fácilmente este tipo de servicios debido a que LISP actúa en la capa de red y no en la capa de aplicación. En particular, presentamos y evaluamos extensamente Lcast, un marco de multidifusión con una sola fuente diseñado para combinar la robustez y eficiencia de la multidifusión IP con la capacidad de configuración y bajo coste de implementación de una capa superpuesta a nivel de aplicación. Además, describimos y evaluamos LISP-MPS, una arquitectura capaz de explotar LISP para minimizar las tablas de enrutamiento intra-dominio y garantizar, entre otras, soporte para conmutación multi-protocolo y redes virtuales

    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)

    Get PDF
    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.The PhD Symposium was a very good opportunity for the young researchers to share information and knowledge, to present their current research, and to discuss topics with other students in order to look for synergies and common research topics. The idea was very successful and the assessment made by the PhD Student was very good. It also helped to achieve one of the major goals of the NESUS Action: to establish an open European research network targeting sustainable solutions for ultrascale computing aiming at cross fertilization among HPC, large scale distributed systems, and big data management, training, contributing to glue disparate researchers working across different areas and provide a meeting ground for researchers in these separate areas to exchange ideas, to identify synergies, and to pursue common activities in research topics such as sustainable software solutions (applications and system software stack), data management, energy efficiency, and resilience.European Cooperation in Science and Technology. COS

    A framework for the dynamic management of Peer-to-Peer overlays

    Get PDF
    Peer-to-Peer (P2P) applications have been associated with inefficient operation, interference with other network services and large operational costs for network providers. This thesis presents a framework which can help ISPs address these issues by means of intelligent management of peer behaviour. The proposed approach involves limited control of P2P overlays without interfering with the fundamental characteristics of peer autonomy and decentralised operation. At the core of the management framework lays the Active Virtual Peer (AVP). Essentially intelligent peers operated by the network providers, the AVPs interact with the overlay from within, minimising redundant or inefficient traffic, enhancing overlay stability and facilitating the efficient and balanced use of available peer and network resources. They offer an “insider‟s” view of the overlay and permit the management of P2P functions in a compatible and non-intrusive manner. AVPs can support multiple P2P protocols and coordinate to perform functions collectively. To account for the multi-faceted nature of P2P applications and allow the incorporation of modern techniques and protocols as they appear, the framework is based on a modular architecture. Core modules for overlay control and transit traffic minimisation are presented. Towards the latter, a number of suitable P2P content caching strategies are proposed. Using a purpose-built P2P network simulator and small-scale experiments, it is demonstrated that the introduction of AVPs inside the network can significantly reduce inter-AS traffic, minimise costly multi-hop flows, increase overlay stability and load-balancing and offer improved peer transfer performance

    Intelligent instrumentation techniques to improve the traces information-volume ratio

    Get PDF
    With ever more powerful machines being constantly deployed, it is crucial to manage the computational resources efficiently. This is important both from the point of view of the individual user, who expects fast results; and the supercomputing center hosting the whole infrastructure, that is interested in maximizing its overall productivity. Nevertheless, the real sustained performance achieved by the applications can be significantly lower than the theoretical peak performance of the machines. A key factor to bridge this performance gap is to understand how parallel computers behave. Performance analysis tools are essential not only to understand the behavior of parallel applications, but to identify why performance expectations might not have been met, serving as guidelines to improve the inefficiencies that caused poor performance, and driving both software and hardware optimizations. However, detailed analysis of the behavior of a parallel application requires to process a large amount of data that also grows extremely fast. Current large scale systems already comprise hundreds of thousands of cores, and upcoming exascale systems are expected to assemble more than a million processing elements. With such number of hardware components, the traditional analysis methodologies consisting in blindly collecting as much data as possible and then performing exhaustive lookups are no longer applicable, because the volume of performance data generated becomes absolutely unmanageable to store, process and analyze. The evolution of the tools suggests that more complex approaches are needed, incorporating intelligence to perform competently the challenging and important task of detailed analysis. In this thesis, we address the problem of scalability of performance analysis tools in large scale systems. In such scenarios, in-depth understanding of the interactions between all the system components is more compelling than ever for an effective use of the parallel resources. To this end, our work includes a thorough review of techniques that have been successfully applied to aid in the task of Big Data Analytics in fields like machine learning, data mining, signal processing and computer vision. We have leveraged these techniques to improve the analysis of large-scale parallel applications by automatically uncovering repetitive patterns, finding data correlations, detecting performance trends and further useful analysis information. Combinining their use, we have minimized the volume of performance data captured from an execution, while maximizing the benefit and insight gained from this data, and have proposed new and more effective methodologies for single and multi-experiment performance analysis.Con el incesante aumento de potencia y capacidad de los superordenadores, la habilidad de emplear de forma efectiva todos los recursos disponibles se ha convertido en un factor crucial. La necesidad de un uso eficiente radica tanto en la aspiración de los usuarios por obtener resultados en el menor tiempo posible, como en el interés del propio centro de cálculo que alberga la infraestructura computacional por maximizar la productividad de los recursos. Sin embargo, el rendimiento real que las aplicaciones son capaces de alcanzar suele ser significativamente menor que el rendimiento teórico de las máquinas. Y la clave para salvar esta distancia consiste en comprender el comportamiento de las máquinas paralelas. Las herramientas de análisis de rendimiento son instrumentos fundamentales no solo para entender como funcionan las aplicaciones paralelas, sino también para identificar los problemas por los que el rendimiento obtenido dista del esperado, sirviendo como guías para mejorar aquellas deficiencias software y/o hardware que son causas de degradación. No obstante, un análisis en detalle del comportamiento de una aplicación paralela requiere procesar una gran cantidad de datos que crece extremadamente rápido. Los sistemas actuales de gran escala ya comprenden cientos de miles de procesadores, y se espera que los inminentes sistemas exa-escala reunan millones de elementos de procesamiento. Con semejante número de componentes, las estrategias tradicionales de obtención indiscriminada de datos para mejorar la precisión de las herramientas de análisis caerán en desuso debido a las dificultades que entraña almacenarlos y procesarlos. En este aspecto, la evolución de las herramientas sugiere que son necesarios métodos más sofisticados, que incorporen inteligencia para desarrollar la tarea de análisis de manera más competente. Esta tesis aborda el problema de escalabilidad de las herramientas de análisis en sistemas de gran escala, donde es primordial el conocimiento detallado de las interacciones entre todos los componentes para emplear los recursos paralelos de la forma más óptima. Con este fin, esta investigación incluye una revisión exhaustiva de las técnicas que se han aplicado satisfactoriamente para extraer información de grandes volumenes de datos en otras áreas como aprendizaje automático, minería de datos y procesado de señal. Hemos adaptado estas técnicas para mejorar el análisis de aplicaciones paralelas de gran escala, detectando automáticamente patrones repetitivos, correlaciones de datos, tendencias de rendimiento, y demás información relevante. Combinando el uso de estas técnicas, se ha conseguido disminuir el volumen de datos generado durante una ejecución, a la vez que aumentar la cantidad de información útil que se puede extraer de los datos mediante la aplicación de nuevas y más efectivas metodologías de análisis para el estudio del rendimiento de experimentos individuales o en seri

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Efficient Learning Machines

    Get PDF
    Computer scienc

    A survey of the application of soft computing to investment and financial trading

    Get PDF

    Big Data and Artificial Intelligence in Digital Finance

    Get PDF
    This open access book presents how cutting-edge digital technologies like Big Data, Machine Learning, Artificial Intelligence (AI), and Blockchain are set to disrupt the financial sector. The book illustrates how recent advances in these technologies facilitate banks, FinTech, and financial institutions to collect, process, analyze, and fully leverage the very large amounts of data that are nowadays produced and exchanged in the sector. To this end, the book also describes some more the most popular Big Data, AI and Blockchain applications in the sector, including novel applications in the areas of Know Your Customer (KYC), Personalized Wealth Management and Asset Management, Portfolio Risk Assessment, as well as variety of novel Usage-based Insurance applications based on Internet-of-Things data. Most of the presented applications have been developed, deployed and validated in real-life digital finance settings in the context of the European Commission funded INFINITECH project, which is a flagship innovation initiative for Big Data and AI in digital finance. This book is ideal for researchers and practitioners in Big Data, AI, banking and digital finance

    Real-time operating system support for multicore applications

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia de Automação e Sistemas, Florianópolis, 2014Plataformas multiprocessadas atuais possuem diversos níveis da memória cache entre o processador e a memória principal para esconder a latência da hierarquia de memória. O principal objetivo da hierarquia de memória é melhorar o tempo médio de execução, ao custo da previsibilidade. O uso não controlado da hierarquia da cache pelas tarefas de tempo real impacta a estimativa dos seus piores tempos de execução, especialmente quando as tarefas de tempo real acessam os níveis da cache compartilhados. Tal acesso causa uma disputa pelas linhas da cache compartilhadas e aumenta o tempo de execução das aplicações. Além disso, essa disputa na cache compartilhada pode causar a perda de prazos, o que é intolerável em sistemas de tempo real críticos. O particionamento da memória cache compartilhada é uma técnica bastante utilizada em sistemas de tempo real multiprocessados para isolar as tarefas e melhorar a previsibilidade do sistema. Atualmente, os estudos que avaliam o particionamento da memória cache em multiprocessadores carecem de dois pontos fundamentais. Primeiro, o mecanismo de particionamento da cache é tipicamente implementado em um ambiente simulado ou em um sistema operacional de propósito geral. Consequentemente, o impacto das atividades realizados pelo núcleo do sistema operacional, tais como o tratamento de interrupções e troca de contexto, no particionamento das tarefas tende a ser negligenciado. Segundo, a avaliação é restrita a um escalonador global ou particionado, e assim não comparando o desempenho do particionamento da cache em diferentes estratégias de escalonamento. Ademais, trabalhos recentes confirmaram que aspectos da implementação do SO, tal como a estrutura de dados usada no escalonamento e os mecanismos de tratamento de interrupções, impactam a escalonabilidade das tarefas de tempo real tanto quanto os aspectos teóricos. Entretanto, tais estudos também usaram sistemas operacionais de propósito geral com extensões de tempo real, que afetamos sobre custos de tempo de execução observados e a escalonabilidade das tarefas de tempo real. Adicionalmente, os algoritmos de escalonamento tempo real para multiprocessadores atuais não consideram cenários onde tarefas de tempo real acessam as mesmas linhas da cache, o que dificulta a estimativa do pior tempo de execução. Esta pesquisa aborda os problemas supracitados com as estratégias de particionamento da cache e com os algoritmos de escalonamento tempo real multiprocessados da seguinte forma. Primeiro, uma infraestrutura de tempo real para multiprocessadores é projetada e implementada em um sistema operacional embarcado. A infraestrutura consiste em diversos algoritmos de escalonamento tempo real, tais como o EDF global e particionado, e um mecanismo de particionamento da cache usando a técnica de coloração de páginas. Segundo, é apresentada uma comparação em termos da taxa de escalonabilidade considerando o sobre custo de tempo de execução da infraestrutura criada e de um sistema operacional de propósito geral com extensões de tempo real. Em alguns casos, o EDF global considerando o sobre custo do sistema operacional embarcado possui uma melhor taxa de escalonabilidade do que o EDF particionado com o sobre custo do sistema operacional de propósito geral, mostrando claramente como diferentes sistemas operacionais influenciam os escalonadores de tempo real críticos em multiprocessadores. Terceiro, é realizada uma avaliação do impacto do particionamento da memória cache em diversos escalonadores de tempo real multiprocessados. Os resultados desta avaliação indicam que um sistema operacional "leve" não compromete as garantias de tempo real e que o particionamento da cache tem diferentes comportamentos dependendo do escalonador e do tamanho do conjunto de trabalho das tarefas. Quarto, é proposto um algoritmo de particionamento de tarefas que atribui as tarefas que compartilham partições ao mesmo processador. Os resultados mostram que essa técnica de particionamento de tarefas reduz a disputa pelas linhas da cache compartilhadas e provê garantias de tempo real para sistemas críticos. Finalmente, é proposto um escalonador de tempo real de duas fases para multiprocessadores. O escalonador usa informações coletadas durante o tempo de execução das tarefas através dos contadores de desempenho em hardware. Com base nos valores dos contadores, o escalonador detecta quando tarefas de melhor esforço o interferem com tarefas de tempo real na cache. Assim é possível impedir que tarefas de melhor esforço acessem as mesmas linhas da cache que tarefas de tempo real. O resultado desta estratégia de escalonamento é o atendimento dos prazos críticos e não críticos das tarefas de tempo real.Abstracts: Modern multicore platforms feature multiple levels of cache memory placed between the processor and main memory to hide the latency of ordinary memory systems. The primary goal of this cache hierarchy is to improve average execution time (at the cost of predictability). The uncontrolled use of the cache hierarchy by realtime tasks may impact the estimation of their worst-case execution times (WCET), specially when real-time tasks access a shared cache level, causing a contention for shared cache lines and increasing the application execution time. This contention in the shared cache may leadto deadline losses, which is intolerable particularly for hard real-time (HRT) systems. Shared cache partitioning is a well-known technique used in multicore real-time systems to isolate task workloads and to improve system predictability. Presently, the state-of-the-art studies that evaluate shared cache partitioning on multicore processors lack two key issues. First, the cache partitioning mechanism is typically implemented either in a simulated environment or in a general-purpose OS (GPOS), and so the impact of kernel activities, such as interrupt handlers and context switching, on the task partitions tend to be overlooked. Second, the evaluation is typically restricted to either a global or partitioned scheduler, thereby by falling to compare the performance of cache partitioning when tasks are scheduled by different schedulers. Furthermore, recent works have confirmed that OS implementation aspects, such as the choice of scheduling data structures and interrupt handling mechanisms, impact real-time schedulability as much as scheduling theoretic aspects. However, these studies also used real-time patches applied into GPOSes, which affects the run-time overhead observed in these works and consequently the schedulability of real-time tasks. Additionally, current multicore scheduling algorithms do not consider scenarios where real-time tasks access the same cache lines due to true or false sharing, which also impacts the WCET. This thesis addresses these aforementioned problems with cache partitioning techniques and multicore real-time scheduling algorithms as following. First, a real-time multicore support is designed and implemented on top of an embedded operating system designed from scratch. This support consists of several multicore real-time scheduling algorithms, such as global and partitioned EDF, and a cache partitioning mechanism based on page coloring. Second, it is presented a comparison in terms of schedulability ratio considering the run-time overhead of the implemented RTOS and a GPOS patched with real-time extensions. In some cases, Global-EDF considering the overhead of the RTOS is superior to Partitioned-EDF considering the overhead of the patched GPOS, which clearly shows how different OSs impact hard realtime schedulers. Third, an evaluation of the cache partitioning impacton partitioned, clustered, and global real-time schedulers is performed.The results indicate that a lightweight RTOS does not impact real-time tasks, and shared cache partitioning has different behavior depending on the scheduler and the task's working set size. Fourth, a task partitioning algorithm that assigns tasks to cores respecting their usage of cache partitions is proposed. The results show that by simply assigning tasks that shared cache partitions to the same processor, it is possible to reduce the contention for shared cache lines and to provideHRT guarantees. Finally, a two-phase multicore scheduler that provides HRT and soft real-time (SRT) guarantees is proposed. It is shown that by using information from hardware performance counters at run-time, the RTOS can detect when best-effort tasks interfere with real-time tasks in the shared cache. Then, the RTOS can prevent best effort tasks from interfering with real-time tasks. The results also show that the assignment of exclusive partitions to HRT tasks together with the two-phase multicore scheduler provides HRT and SRT guarantees, even when best-effort tasks share partitions with real-time tasks
    corecore