891 research outputs found

    Framework for real-time, autonomous anomaly detection over voluminous time-series geospatial data streams, A

    Get PDF
    2014 Summer.Includes bibliographical references.In this research work we present an approach encompassing both algorithm and system design to detect anomalies in data streams. Individual observations within these streams are multidimensional, with each dimension corresponding to a feature of interest. We consider time-series geospatial datasets generated by remote and in situ observational devices. Three aspects make this problem particularly challenging: (1) the cumulative volume and rates of data arrivals, (2) anomalies evolve over time, and (3) there are spatio-temporal correlations associated with the data. Therefore, anomaly detections must be accurate and performed in real time. Given the data volumes involved, solutions must minimize user intervention and be amenable to distributed processing to ensure scalability. Our approach achieves accurate, high throughput classications in real time. We rely on Expectation Maximization (EM) to build Gaussian Mixture Models (GMMs) that model the densities of the training data. Rather than one all-encompassing model, our approach involves multiple model instances, each of which is responsible for a particular geographical extent and can also adapt as data evolves. We have incorporated these algorithms into our distributed storage platform, Galileo, and proled their suitability through empirical analysis which demonstrates high throughput (10,000 observations per-second, per-node) and low latency on real-world datasets

    Edge Intelligence Simulator:a platform for simulating intelligent edge orchestration solutions

    Get PDF
    Abstract. To support the stringent requirements of the future intelligent and interactive applications, intelligence needs to become an essential part of the resource management in the edge environment. Developing intelligent orchestration solutions is a challenging and arduous task, where the evaluation and comparison of the proposed solution is a focal point. Simulation is commonly used to evaluate and compare proposed solutions. However, there does not currently exist openly available simulators that would have a specific focus on supporting the research on intelligent edge orchestration methods. This thesis presents a simulation platform called Edge Intelligence Simulator (EISim), the purpose of which is to facilitate the research on intelligent edge orchestration solutions. In its current form, the platform supports simulating deep reinforcement learning based solutions and different orchestration control topologies in scenarios related to task offloading and resource pricing on edge. The platform also includes additional tools for creating simulation environments, running simulations for agent training and evaluation, and plotting results. This thesis gives a comprehensive overview of the state of the art in edge and fog simulation, orchestration, offloading, and resource pricing, which provides a basis for the design of EISim. The methods and tools that form the foundation of the current EISim implementation are also presented, along with a detailed description of the EISim architecture, default implementations, use, and additional tools. Finally, EISim with its default implementations is validated and evaluated through a large-scale simulation study with 24 simulation scenarios. The results of the simulation study verify the end-to-end performance of EISim and show its capability to produce sensible results. The results also illustrate how EISim can help the researcher in controlling and monitoring the training of intelligent agents, as well as in evaluating solutions against different control topologies.Reunaälysimulaattori : alusta älykkäiden reunalaskennan orkestrointiratkaisujen simulointiin. Tiivistelmä. Älykkäiden ratkaisujen täytyy tulla olennaiseksi osaksi reunaympäristön resurssien hallinnointia, jotta tulevaisuuden vuorovaikutteisten ja älykkäiden sovellusten suoritusta voidaan tukea tasolla, joka täyttää sovellusten tiukat suoritusvaatimukset. Älykkäiden orkestrointiratkaisujen kehitys on vaativa ja työläs prosessi, jonka keskiöön kuuluu olennaisesti menetelmien testaaminen ja vertailu muita menetelmiä vasten. Simulointia käytetään tyypillisesti menetelmien arviointiin ja vertailuun, mutta tällä hetkellä ei ole avoimesti saatavilla simulaattoreita, jotka eritoten keskittyisivät tukemaan älykkäiden reunaorkestrointiratkaisujen kehitystä. Tässä opinnäytetyössä esitellään simulaatioalusta nimeltään Edge Intelligence Simulator (EISim; Reunaälysimulaattori), jonka tarkoitus on helpottaa älykkäiden reunaorkestrointiratkaisujen tutkimusta. Nykymuodossaan se tukee vahvistusoppimispohjaisten ratkaisujen sekä erityyppisten orkestroinnin kontrollitopologioiden simulointia skenaarioissa, jotka liittyvät laskennan siirtoon ja resurssien hinnoitteluun reunaympäristössä. Alustan mukana tulee myös lisätyökaluja, joita voi käyttää simulaatioympäristöjen luomiseen, simulaatioiden ajamiseen agenttien koulutusta ja arviointia varten, sekä simulaatiotulosten visualisoimiseen. Tämä opinnäytetyö sisältää kattavan katsauksen reunaympäristön simuloinnin, reunaorkestroinnin, laskennan siirron ja resurssien hinnoittelun nykytilaan kirjallisuudessa, mikä tarjoaa kunnollisen lähtökohdan EISimin toteutukselle. Opinnäytetyö esittelee menetelmät ja työkalut, joihin EISimin tämänhetkinen toteutus perustuu, sekä antaa yksityiskohtaisen kuvauksen EISimin arkkitehtuurista, oletustoteutuksista, käytöstä ja lisätyökaluista. EISimin validointia ja arviointia varten esitellään laaja simulaatiotutkimus, jossa EISimin oletustoteutuksia simuloidaan 24 simulaatioskenaariossa. Simulaatiotutkimuksen tulokset todentavat EISimin kokonaisvaltaisen toimintakyvyn, sekä osoittavat EISimin kyvyn tuottaa järkeviä tuloksia. Tulokset myös havainnollistavat, miten EISim voi auttaa tutkijoita älykkäiden agenttien koulutuksessa ja ratkaisujen arvioinnissa eri kontrollitopologioita vasten

    Enabling Scalable and Sustainable Softwarized 5G Environments

    Get PDF
    The fifth generation of telecommunication systems (5G) is foreseen to play a fundamental role in our socio-economic growth by supporting various and radically new vertical applications (such as Industry 4.0, eHealth, Smart Cities/Electrical Grids, to name a few), as a one-fits-all technology that is enabled by emerging softwarization solutions \u2013 specifically, the Fog, Multi-access Edge Computing (MEC), Network Functions Virtualization (NFV) and Software-Defined Networking (SDN) paradigms. Notwithstanding the notable potential of the aforementioned technologies, a number of open issues still need to be addressed to ensure their complete rollout. This thesis is particularly developed towards addressing the scalability and sustainability issues in softwarized 5G environments through contributions in three research axes: a) Infrastructure Modeling and Analytics, b) Network Slicing and Mobility Management, and c) Network/Services Management and Control. The main contributions include a model-based analytics approach for real-time workload profiling and estimation of network key performance indicators (KPIs) in NFV infrastructures (NFVIs), as well as a SDN-based multi-clustering approach to scale geo-distributed virtual tenant networks (VTNs) and to support seamless user/service mobility; building on these, solutions to the problems of resource consolidation, service migration, and load balancing are also developed in the context of 5G. All in all, this generally entails the adoption of Stochastic Models, Mathematical Programming, Queueing Theory, Graph Theory and Team Theory principles, in the context of Green Networking, NFV and SDN

    Descoberta de recursos para sistemas de escala arbitrarias

    Get PDF
    Doutoramento em InformáticaTecnologias de Computação Distribuída em larga escala tais como Cloud, Grid, Cluster e Supercomputadores HPC estão a evoluir juntamente com a emergência revolucionária de modelos de múltiplos núcleos (por exemplo: GPU, CPUs num único die, Supercomputadores em single die, Supercomputadores em chip, etc) e avanços significativos em redes e soluções de interligação. No futuro, nós de computação com milhares de núcleos podem ser ligados entre si para formar uma única unidade de computação transparente que esconde das aplicações a complexidade e a natureza distribuída desses sistemas com múltiplos núcleos. A fim de beneficiar de forma eficiente de todos os potenciais recursos nesses ambientes de computação em grande escala com múltiplos núcleos ativos, a descoberta de recursos é um elemento crucial para explorar ao máximo as capacidade de todos os recursos heterogéneos distribuídos, através do reconhecimento preciso e localização desses recursos no sistema. A descoberta eficiente e escalável de recursos ´e um desafio para tais sistemas futuros, onde os recursos e as infira-estruturas de computação e comunicação subjacentes são altamente dinâmicas, hierarquizadas e heterogéneas. Nesta tese, investigamos o problema da descoberta de recursos no que diz respeito aos requisitos gerais da escalabilidade arbitrária de ambientes de computação futuros com múltiplos núcleos ativos. A principal contribuição desta tese ´e a proposta de uma entidade de descoberta de recursos adaptativa híbrida (Hybrid Adaptive Resource Discovery - HARD), uma abordagem de descoberta de recursos eficiente e altamente escalável, construída sobre uma sobreposição hierárquica virtual baseada na auto-organizaçãoo e auto-adaptação de recursos de processamento no sistema, onde os recursos computacionais são organizados em hierarquias distribuídas de acordo com uma proposta de modelo de descriçãoo de recursos multi-camadas hierárquicas. Operacionalmente, em cada camada, que consiste numa arquitetura ponto-a-ponto de módulos que, interagindo uns com os outros, fornecem uma visão global da disponibilidade de recursos num ambiente distribuído grande, dinâmico e heterogéneo. O modelo de descoberta de recursos proposto fornece a adaptabilidade e flexibilidade para executar consultas complexas através do apoio a um conjunto de características significativas (tais como multi-dimensional, variedade e consulta agregada) apoiadas por uma correspondência exata e parcial, tanto para o conteúdo de objetos estéticos e dinâmicos. Simulações mostram que o HARD pode ser aplicado a escalas arbitrárias de dinamismo, tanto em termos de complexidade como de escala, posicionando esta proposta como uma arquitetura adequada para sistemas futuros de múltiplos núcleos. Também contribuímos com a proposta de um regime de gestão eficiente dos recursos para sistemas futuros que podem utilizar recursos distribuíos de forma eficiente e de uma forma totalmente descentralizada. Além disso, aproveitando componentes de descoberta (RR-RPs) permite que a nossa plataforma de gestão de recursos encontre e aloque dinamicamente recursos disponíeis que garantam os parâmetros de QoS pedidos.Large scale distributed computing technologies such as Cloud, Grid, Cluster and HPC supercomputers are progressing along with the revolutionary emergence of many-core designs (e.g. GPU, CPUs on single die, supercomputers on chip, etc.) and significant advances in networking and interconnect solutions. In future, computing nodes with thousands of cores may be connected together to form a single transparent computing unit which hides from applications the complexity and distributed nature of these many core systems. In order to efficiently benefit from all the potential resources in such large scale many-core-enabled computing environments, resource discovery is the vital building block to maximally exploit the capabilities of all distributed heterogeneous resources through precisely recognizing and locating those resources in the system. The efficient and scalable resource discovery is challenging for such future systems where the resources and the underlying computation and communication infrastructures are highly-dynamic, highly-hierarchical and highly-heterogeneous. In this thesis, we investigate the problem of resource discovery with respect to the general requirements of arbitrary scale future many-core-enabled computing environments. The main contribution of this thesis is to propose Hybrid Adaptive Resource Discovery (HARD), a novel efficient and highly scalable resource-discovery approach which is built upon a virtual hierarchical overlay based on self-organization and self-adaptation of processing resources in the system, where the computing resources are organized into distributed hierarchies according to a proposed hierarchical multi-layered resource description model. Operationally, at each layer, it consists of a peer-to-peer architecture of modules that, by interacting with each other, provide a global view of the resource availability in a large, dynamic and heterogeneous distributed environment. The proposed resource discovery model provides the adaptability and flexibility to perform complex querying by supporting a set of significant querying features (such as multi-dimensional, range and aggregate querying) while supporting exact and partial matching, both for static and dynamic object contents. The simulation shows that HARD can be applied to arbitrary scales of dynamicity, both in terms of complexity and of scale, positioning this proposal as a proper architecture for future many-core systems. We also contributed to propose a novel resource management scheme for future systems which efficiently can utilize distributed resources in a fully decentralized fashion. Moreover, leveraging discovery components (RR-RPs) enables our resource management platform to dynamically find and allocate available resources that guarantee the QoS parameters on demand

    Cascading Behaviour in Complex Soci-Technical Networks

    Get PDF
    Most human interactions today take place with the mediation of information and communications technology. This is extending the boundaries of interdependence: the group of reference, ideas and behaviour to which people are exposed is larger and less restricted to old geographical and cultural boundaries; but it is also providing more and better data with which to build more informative models on the effects of social interactions, amongst them, the way in which contagion and cascades diffuse in social networks. Online data are not only helping us gain deeper insights into the structural complexity of social systems, they are also illuminating the consequences of that complexity, especially around collective and temporal dynamics. This paper offers an overview of the models and applications that have been developed in what is still a nascent area of research, as well as an outline of immediate lines of work that promise to open new vistas in our understanding of cascading behaviour in social networks

    Clustering in the Big Data Era: methods for efficient approximation, distribution, and parallelization

    Get PDF
    Data clustering is an unsupervised machine learning task whose objective is to group together similar items. As a versatile data mining tool, data clustering has numerous applications, such as object detection and localization using data from 3D laser-based sensors, finding popular routes using geolocation data, and finding similar patterns of electricity consumption using smart meters.The datasets in modern IoT-based applications are getting more and more challenging for conventional clustering schemes. Big Data is a term used to loosely describe hard-to-manage datasets. Particularly, large numbers of data points, high rates of data production, large numbers of dimensions, high skewness, and distributed data sources are aspects that challenge the classical data processing schemes, including clustering methods. This thesis contributes to efficient big data clustering for distributed and parallel computing architectures, representative of the processing environments in edge-cloud computing continuum. The thesis also proposes approximation techniques to cope with certain challenging aspects of big data.Regarding distributed clustering, the thesis proposes MAD-C, abbreviating Multi-stage Approximate Distributed Cluster-Combining. MAD-C leverages an approximation-based data synopsis that drastically lowers the required communication bandwidth among the distributed nodes and achieves multiplicative savings in computation time, compared to a baseline that centrally gathers and clusters the data. The thesis shows MAD-C can be used to detect and localize objects using data from distributed 3D laser-based sensors with high accuracy. Furthermore, the work in the thesis shows how to utilize MAD-C to efficiently detect the objects within a restricted area for geofencing purposes.Regarding parallel clustering, the thesis proposes a family of algorithms called PARMA-CC, abbreviating Parallel Multistage Approximate Cluster Combining. Using approximation-based data synopsis, PARMA-CC algorithms achieve scalability on multi-core systems by facilitating parallel execution of threads with limited dependencies which get resolved using fine-grained synchronization techniques. To further enhance the efficiency, PARMA-CC algorithms can be configured with respect to different data properties. Analytical and empirical evaluations show PARMA-CC algorithms achieve significantly higher scalability than the state-of-the-art methods while preserving a high accuracy.On parallel high dimensional clustering, the thesis proposes IP.LSH.DBSCAN, abbreviating Integrated Parallel Density-Based Clustering through Locality-Sensitive Hashing (LSH). IP.LSH.DBSCAN fuses the process of creating an LSH index into the process of data clustering, and it takes advantage of data parallelization and fine-grained synchronization. Analytical and empirical evaluations show IP.LSH.DBSCAN facilitates parallel density-based clustering of massive datasets using desired distance measures resulting in several orders of magnitude lower latency than state-of-the-art for high dimensional data.In essence, the thesis proposes methods and algorithmic implementations targeting the problem of big data clustering and applications using distributed and parallel processing. The proposed methods (available as open source software) are extensible and can be used in combination with other methods

    A bottom-up approach to real-time search in large networks and clouds

    Full text link

    The Effects of Inequality, Density, and Heterogeneous Residential Preferences on Urban Displacement and Metropolitan Structure: An Agent-Based Model

    Full text link
    Urban displacement - when a household is forced to relocate due to conditions affecting its home or surroundings - often results from rising housing costs, particularly in wealthy, prosperous cities. However, its dynamics are complex and often difficult to understand. This paper presents an agent-based model of urban settlement, agglomeration, displacement, and sprawl. New settlements form around a spatial amenity that draws initial, poor settlers to subsist on the resource. As the settlement grows, subsequent settlers of varying income, skills, and interests are heterogeneously drawn to either the original amenity or to the emerging human agglomeration. As this agglomeration grows and densifies, land values increase, and the initial poor settlers are displaced from the spatial amenity on which they relied. Through path dependence, high-income residents remain clustered around this original amenity for which they have no direct use or interest. This toy model explores these dynamics, demonstrating a simplified mechanism of how urban displacement and gentrification can be sensitive to income inequality, density, and varied preferences for different types of amenities

    Resource discovery for distributed computing systems: A comprehensive survey

    Get PDF
    Large-scale distributed computing environments provide a vast amount of heterogeneous computing resources from different sources for resource sharing and distributed computing. Discovering appropriate resources in such environments is a challenge which involves several different subjects. In this paper, we provide an investigation on the current state of resource discovery protocols, mechanisms, and platforms for large-scale distributed environments, focusing on the design aspects. We classify all related aspects, general steps, and requirements to construct a novel resource discovery solution in three categories consisting of structures, methods, and issues. Accordingly, we review the literature, analyzing various aspects for each category

    Algorithms and Software for the Analysis of Large Complex Networks

    Get PDF
    The work presented intersects three main areas, namely graph algorithmics, network science and applied software engineering. Each computational method discussed relates to one of the main tasks of data analysis: to extract structural features from network data, such as methods for community detection; or to transform network data, such as methods to sparsify a network and reduce its size while keeping essential properties; or to realistically model networks through generative models
    corecore