61 research outputs found

    On the evaluation of exact-match and range queries over multidimensional data in distributed hash tables

    Get PDF
    2012 Fall.Includes bibliographical references.The quantity and precision of geospatial and time series observational data being collected has increased alongside the steady expansion of processing and storage capabilities in modern computing hardware. The storage requirements for this information are vastly greater than the capabilities of a single computer, and are primarily met in a distributed manner. However, distributed solutions often impose strict constraints on retrieval semantics. In this thesis, we investigate the factors that influence storage and retrieval operations on large datasets in a cloud setting, and propose a lightweight data partitioning and indexing scheme to facilitate these operations. Our solution provides expressive retrieval support through range-based and exact-match queries and can be applied over massive quantities of multidimensional data. We provide benchmarks to illustrate the relative advantage of using our solution over a general-purpose cloud storage engine in a distributed network of heterogeneous computing resources

    Low-latency, query-driven analytics over voluminous multidimensional, spatiotemporal datasets

    Get PDF
    2017 Summer.Includes bibliographical references.Ubiquitous data collection from sources such as remote sensing equipment, networked observational devices, location-based services, and sales tracking has led to the accumulation of voluminous datasets; IDC projects that by 2020 we will generate 40 zettabytes of data per year, while Gartner and ABI estimate 20-35 billion new devices will be connected to the Internet in the same time frame. The storage and processing requirements of these datasets far exceed the capabilities of modern computing hardware, which has led to the development of distributed storage frameworks that can scale out by assimilating more computing resources as necessary. While challenging in its own right, storing and managing voluminous datasets is only the precursor to a broader field of study: extracting knowledge, insights, and relationships from the underlying datasets. The basic building block of this knowledge discovery process is analytic queries, encompassing both query instrumentation and evaluation. This dissertation is centered around query-driven exploratory and predictive analytics over voluminous, multidimensional datasets. Both of these types of analysis represent a higher-level abstraction over classical query models; rather than indexing every discrete value for subsequent retrieval, our framework autonomously learns the relationships and interactions between dimensions in the dataset (including time series and geospatial aspects), and makes the information readily available to users. This functionality includes statistical synopses, correlation analysis, hypothesis testing, probabilistic structures, and predictive models that not only enable the discovery of nuanced relationships between dimensions, but also allow future events and trends to be predicted. This requires specialized data structures and partitioning algorithms, along with adaptive reductions in the search space and management of the inherent trade-off between timeliness and accuracy. The algorithms presented in this dissertation were evaluated empirically on real-world geospatial time-series datasets in a production environment, and are broadly applicable across other storage frameworks

    Framework for real-time, autonomous anomaly detection over voluminous time-series geospatial data streams, A

    Get PDF
    2014 Summer.Includes bibliographical references.In this research work we present an approach encompassing both algorithm and system design to detect anomalies in data streams. Individual observations within these streams are multidimensional, with each dimension corresponding to a feature of interest. We consider time-series geospatial datasets generated by remote and in situ observational devices. Three aspects make this problem particularly challenging: (1) the cumulative volume and rates of data arrivals, (2) anomalies evolve over time, and (3) there are spatio-temporal correlations associated with the data. Therefore, anomaly detections must be accurate and performed in real time. Given the data volumes involved, solutions must minimize user intervention and be amenable to distributed processing to ensure scalability. Our approach achieves accurate, high throughput classications in real time. We rely on Expectation Maximization (EM) to build Gaussian Mixture Models (GMMs) that model the densities of the training data. Rather than one all-encompassing model, our approach involves multiple model instances, each of which is responsible for a particular geographical extent and can also adapt as data evolves. We have incorporated these algorithms into our distributed storage platform, Galileo, and proled their suitability through empirical analysis which demonstrates high throughput (10,000 observations per-second, per-node) and low latency on real-world datasets

    COMPARISON OF SPATIOTEMPORAL MAPPING TECHNIQUES FOR ENORMOUS ETL AND EXPLOITATION PATTERNS

    Get PDF

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    Hierarchical distributed fog-to-cloud data management in smart cities

    Get PDF
    There is a vast amount of data being generated every day in the world with different formats, quality levels, etc. This new data, together with the archived historical data, constitute the seed for future knowledge discovery and value generation in several fields of science and big data environments. Discovering value from data is a complex computing process where data is the key resource, not only during its processing, but also during its entire life cycle. However, there is still a huge concern about how to organize and manage this data in all fields for efficient usage and exploitation during all data life cycles. Although several specific Data LifeCycle (DLC) models have been recently defined for particular scenarios, we argue that there is no global and comprehensive DLC framework to be widely used in different fields. In particular scenario, smart cities are the current technological solutions to handle the challenges and complexity of the growing urban density. Traditionally, Smart City resources management rely on cloud based solutions where sensors data are collected to provide a centralized and rich set of open data. The advantages of cloud-based frameworks are their ubiquity, as well as an (almost) unlimited resources capacity. However, accessing data from the cloud implies large network traffic, high latencies usually not appropriate for real-time or critical solutions, as well as higher security risks. Alternatively, fog computing emerges as a promising technology to absorb these inconveniences. It proposes the use of devices at the edge to provide closer computing facilities and, therefore, reducing network traffic, reducing latencies drastically while improving security. We have defined a new framework for data management in the context of a Smart City through a global fog to cloud resources management architecture. This model has the advantages of both, fog and cloud technologies, as it allows reduced latencies for critical applications while being able to use the high computing capabilities of cloud technology. In this thesis, we propose many novel ideas in the design of a novel F2C Data Management architecture for smart cities as following. First, we draw and describe a comprehensive scenario agnostic Data LifeCycle model successfully addressing all challenges included in the 6Vs not tailored to any specific environment, but easy to be adapted to fit the requirements of any particular field. Then, we introduce the Smart City Comprehensive Data LifeCycle model, a data management architecture generated from a comprehensive scenario agnostic model, tailored for the particular scenario of Smart Cities. We define the management of each data life phase, and explain its implementation on a Smart City with Fog-to-Cloud (F2C) resources management. And then, we illustrate a novel architecture for data management in the context of a Smart City through a global fog to cloud resources management architecture. We show this model has the advantages of both, fog and cloud, as it allows reduced latencies for critical applications while being able to use the high computing capabilities of cloud technology. As a first experiment for the F2C data management architecture, a real Smart City is analyzed, corresponding to the city of Barcelona, with special emphasis on the layers responsible for collecting the data generated by the deployed sensors. The amount of daily sensors data transmitted through the network has been estimated and a rough projection has been made assuming an exhaustive deployment that fully covers all city. And, we provide some solutions to both reduce the data transmission and improve the data management. Then, we used some data filtering techniques (including data aggregation and data compression) to estimate the network traffic in this model during data collection and compare it with a traditional real system. Indeed, we estimate the total data storage sizes through F2C scenario for Barcelona smart citiesAl món es generen diàriament una gran quantitat de dades, amb diferents formats, nivells de qualitat, etc. Aquestes noves dades, juntament amb les dades històriques arxivades, constitueixen la llavor per al descobriment de coneixement i la generació de valor en diversos camps de la ciència i grans entorns de dades (big data). Descobrir el valor de les dades és un procés complex de càlcul on les dades són el recurs clau, no només durant el seu processament, sinó també durant tot el seu cicle de vida. Tanmateix, encara hi ha una gran preocupació per com organitzar i gestionar aquestes dades en tots els camps per a un ús i explotació eficients durant tots els cicles de vida de les dades. Encara que recentment s'han definit diversos models específics de Data LifeCycle (DLC) per a escenaris particulars, argumentem que no hi ha un marc global i complet de DLC que s'utilitzi àmpliament en diferents camps. En particular, les ciutats intel·ligents són les solucions tecnològiques actuals per fer front als reptes i la complexitat de la creixent densitat urbana. Tradicionalment, la gestió de recursos de Smart City es basa en solucions basades en núvol (cloud computing) on es recopilen dades de sensors per proporcionar un conjunt de dades obert i centralitzat. Les avantatges dels entorns basats en núvol són la seva ubiqüitat, així com una capacitat (gairebé) il·limitada de recursos. Tanmateix, l'accés a dades del núvol implica un gran trànsit de xarxa i, en general, les latències elevades no són apropiades per a solucions crítiques o en temps real, així com també per a riscos de seguretat més elevats. Alternativament, el processament de boira (fog computing) sorgeix com una tecnologia prometedora per absorbir aquests inconvenients. Proposa l'ús de dispositius a la vora per proporcionar recuirsos informàtics més propers i, per tant, reduir el trànsit de la xarxa, reduint les latències dràsticament mentre es millora la seguretat. Hem definit un nou marc per a la gestió de dades en el context d'una ciutat intel·ligent a través d'una arquitectura de gestió de recursos des de la boira fins al núvol (Fog-to-Cloud computing, o F2C). Aquest model té els avantatges combinats de les tecnologies de boira i de núvol, ja que permet reduir les latències per a aplicacions crítiques mentre es poden utilitzar les grans capacitats informàtiques de la tecnologia en núvol. En aquesta tesi, proposem algunes idees noves en el disseny d'una arquitectura F2C de gestió de dades per a ciutats intel·ligents. En primer lloc, dibuixem i descrivim un model de Data LifeCycle global agnòstic que aborda amb èxit tots els reptes inclosos en els 6V i no adaptats a un entorn específic, però fàcil d'adaptar-se als requisits de qualsevol camp en concret. A continuació, presentem el model de Data LifeCycle complet per a una ciutat intel·ligent, una arquitectura de gestió de dades generada a partir d'un model agnòstic d'escenari global, adaptat a l'escenari particular de ciutat intel·ligent. Definim la gestió de cada fase de la vida de les dades i expliquem la seva implementació en una ciutat intel·ligent amb gestió de recursos F2C. I, a continuació, il·lustrem la nova arquitectura per a la gestió de dades en el context d'una Smart City a través d'una arquitectura de gestió de recursos F2C. Mostrem que aquest model té els avantatges d'ambdues, la tecnologia de boira i de núvol, ja que permet reduir les latències per a aplicacions crítiques mentre es pot utilitzar la gran capacitat de processament de la tecnologia en núvol. Com a primer experiment per a l'arquitectura de gestió de dades F2C, s'analitza una ciutat intel·ligent real, corresponent a la ciutat de Barcelona, amb especial èmfasi en les capes responsables de recollir les dades generades pels sensors desplegats. S'ha estimat la quantitat de dades de sensors diàries que es transmet a través de la xarxa i s'ha realitzat una projecció aproximada assumint un desplegament exhaustiu que cobreix tota la ciutat

    Hierarchical distributed fog-to-cloud data management in smart cities

    Get PDF
    There is a vast amount of data being generated every day in the world with different formats, quality levels, etc. This new data, together with the archived historical data, constitute the seed for future knowledge discovery and value generation in several fields of science and big data environments. Discovering value from data is a complex computing process where data is the key resource, not only during its processing, but also during its entire life cycle. However, there is still a huge concern about how to organize and manage this data in all fields for efficient usage and exploitation during all data life cycles. Although several specific Data LifeCycle (DLC) models have been recently defined for particular scenarios, we argue that there is no global and comprehensive DLC framework to be widely used in different fields. In particular scenario, smart cities are the current technological solutions to handle the challenges and complexity of the growing urban density. Traditionally, Smart City resources management rely on cloud based solutions where sensors data are collected to provide a centralized and rich set of open data. The advantages of cloud-based frameworks are their ubiquity, as well as an (almost) unlimited resources capacity. However, accessing data from the cloud implies large network traffic, high latencies usually not appropriate for real-time or critical solutions, as well as higher security risks. Alternatively, fog computing emerges as a promising technology to absorb these inconveniences. It proposes the use of devices at the edge to provide closer computing facilities and, therefore, reducing network traffic, reducing latencies drastically while improving security. We have defined a new framework for data management in the context of a Smart City through a global fog to cloud resources management architecture. This model has the advantages of both, fog and cloud technologies, as it allows reduced latencies for critical applications while being able to use the high computing capabilities of cloud technology. In this thesis, we propose many novel ideas in the design of a novel F2C Data Management architecture for smart cities as following. First, we draw and describe a comprehensive scenario agnostic Data LifeCycle model successfully addressing all challenges included in the 6Vs not tailored to any specific environment, but easy to be adapted to fit the requirements of any particular field. Then, we introduce the Smart City Comprehensive Data LifeCycle model, a data management architecture generated from a comprehensive scenario agnostic model, tailored for the particular scenario of Smart Cities. We define the management of each data life phase, and explain its implementation on a Smart City with Fog-to-Cloud (F2C) resources management. And then, we illustrate a novel architecture for data management in the context of a Smart City through a global fog to cloud resources management architecture. We show this model has the advantages of both, fog and cloud, as it allows reduced latencies for critical applications while being able to use the high computing capabilities of cloud technology. As a first experiment for the F2C data management architecture, a real Smart City is analyzed, corresponding to the city of Barcelona, with special emphasis on the layers responsible for collecting the data generated by the deployed sensors. The amount of daily sensors data transmitted through the network has been estimated and a rough projection has been made assuming an exhaustive deployment that fully covers all city. And, we provide some solutions to both reduce the data transmission and improve the data management. Then, we used some data filtering techniques (including data aggregation and data compression) to estimate the network traffic in this model during data collection and compare it with a traditional real system. Indeed, we estimate the total data storage sizes through F2C scenario for Barcelona smart citiesAl món es generen diàriament una gran quantitat de dades, amb diferents formats, nivells de qualitat, etc. Aquestes noves dades, juntament amb les dades històriques arxivades, constitueixen la llavor per al descobriment de coneixement i la generació de valor en diversos camps de la ciència i grans entorns de dades (big data). Descobrir el valor de les dades és un procés complex de càlcul on les dades són el recurs clau, no només durant el seu processament, sinó també durant tot el seu cicle de vida. Tanmateix, encara hi ha una gran preocupació per com organitzar i gestionar aquestes dades en tots els camps per a un ús i explotació eficients durant tots els cicles de vida de les dades. Encara que recentment s'han definit diversos models específics de Data LifeCycle (DLC) per a escenaris particulars, argumentem que no hi ha un marc global i complet de DLC que s'utilitzi àmpliament en diferents camps. En particular, les ciutats intel·ligents són les solucions tecnològiques actuals per fer front als reptes i la complexitat de la creixent densitat urbana. Tradicionalment, la gestió de recursos de Smart City es basa en solucions basades en núvol (cloud computing) on es recopilen dades de sensors per proporcionar un conjunt de dades obert i centralitzat. Les avantatges dels entorns basats en núvol són la seva ubiqüitat, així com una capacitat (gairebé) il·limitada de recursos. Tanmateix, l'accés a dades del núvol implica un gran trànsit de xarxa i, en general, les latències elevades no són apropiades per a solucions crítiques o en temps real, així com també per a riscos de seguretat més elevats. Alternativament, el processament de boira (fog computing) sorgeix com una tecnologia prometedora per absorbir aquests inconvenients. Proposa l'ús de dispositius a la vora per proporcionar recuirsos informàtics més propers i, per tant, reduir el trànsit de la xarxa, reduint les latències dràsticament mentre es millora la seguretat. Hem definit un nou marc per a la gestió de dades en el context d'una ciutat intel·ligent a través d'una arquitectura de gestió de recursos des de la boira fins al núvol (Fog-to-Cloud computing, o F2C). Aquest model té els avantatges combinats de les tecnologies de boira i de núvol, ja que permet reduir les latències per a aplicacions crítiques mentre es poden utilitzar les grans capacitats informàtiques de la tecnologia en núvol. En aquesta tesi, proposem algunes idees noves en el disseny d'una arquitectura F2C de gestió de dades per a ciutats intel·ligents. En primer lloc, dibuixem i descrivim un model de Data LifeCycle global agnòstic que aborda amb èxit tots els reptes inclosos en els 6V i no adaptats a un entorn específic, però fàcil d'adaptar-se als requisits de qualsevol camp en concret. A continuació, presentem el model de Data LifeCycle complet per a una ciutat intel·ligent, una arquitectura de gestió de dades generada a partir d'un model agnòstic d'escenari global, adaptat a l'escenari particular de ciutat intel·ligent. Definim la gestió de cada fase de la vida de les dades i expliquem la seva implementació en una ciutat intel·ligent amb gestió de recursos F2C. I, a continuació, il·lustrem la nova arquitectura per a la gestió de dades en el context d'una Smart City a través d'una arquitectura de gestió de recursos F2C. Mostrem que aquest model té els avantatges d'ambdues, la tecnologia de boira i de núvol, ja que permet reduir les latències per a aplicacions crítiques mentre es pot utilitzar la gran capacitat de processament de la tecnologia en núvol. Com a primer experiment per a l'arquitectura de gestió de dades F2C, s'analitza una ciutat intel·ligent real, corresponent a la ciutat de Barcelona, amb especial èmfasi en les capes responsables de recollir les dades generades pels sensors desplegats. S'ha estimat la quantitat de dades de sensors diàries que es transmet a través de la xarxa i s'ha realitzat una projecció aproximada assumint un desplegament exhaustiu que cobreix tota la ciutat.Postprint (published version

    Study of the Business Model of three Earth Observation (EO) companies already present in the Very Low Earth Orbit market (VLEO)

    Get PDF
    The emergence of a new private spaceflight industry has taken the Earth Observation (EO) sector by surprise. NewSpace companies are challenging the traditional satellite sector by addressing their services to mass market requirements of high-quality and low-cost EO. As part of the DISCOVERER project, this study aims to determine the Key Success Factors to consider by a new EO company at Low Earth Orbit (LEO). Hence, three businesses fitting the description were analyzed with the Case Study Methodology to establish their Business Model Canvas (BMC), associated Patterns, and Key Success Factors. The investigation consolidated the newly proposed Democratizing Business Model Pattern and added new characteristics. Successful EO NewSpace firms are getting divided between integrated operators, integrated manufacturers, and end-user specialists. A new EO company should consider the Democratizing Pattern success factors and the Vertically Integrated Strategies (VIS), depending on its disruptive idea and resource capabilities. Further research is needed to identify new factors, strengthen the validity of the Pattern, and VIS tendencies
    • …
    corecore