3,131 research outputs found

    Context Aware Computing for The Internet of Things: A Survey

    Get PDF
    As we are moving towards the Internet of Things (IoT), the number of sensors deployed around the world is growing at a rapid pace. Market research has shown a significant growth of sensor deployments over the past decade and has predicted a significant increment of the growth rate in the future. These sensors continuously generate enormous amounts of data. However, in order to add value to raw sensor data we need to understand it. Collection, modelling, reasoning, and distribution of context in relation to sensor data plays critical role in this challenge. Context-aware computing has proven to be successful in understanding sensor data. In this paper, we survey context awareness from an IoT perspective. We present the necessary background by introducing the IoT paradigm and context-aware fundamentals at the beginning. Then we provide an in-depth analysis of context life cycle. We evaluate a subset of projects (50) which represent the majority of research and commercial solutions proposed in the field of context-aware computing conducted over the last decade (2001-2011) based on our own taxonomy. Finally, based on our evaluation, we highlight the lessons to be learnt from the past and some possible directions for future research. The survey addresses a broad range of techniques, methods, models, functionalities, systems, applications, and middleware solutions related to context awareness and IoT. Our goal is not only to analyse, compare and consolidate past research work but also to appreciate their findings and discuss their applicability towards the IoT.Comment: IEEE Communications Surveys & Tutorials Journal, 201

    A smartwater metering deployment based on the fog computing paradigm

    Get PDF
    In this paper, we look into smart water metering infrastructures that enable continuous, on-demand and bidirectional data exchange between metering devices, water flow equipment, utilities and end-users. We focus on the design, development and deployment of such infrastructures as part of larger, smart city, infrastructures. Until now, such critical smart city infrastructures have been developed following a cloud-centric paradigm where all the data are collected and processed centrally using cloud services to create real business value. Cloud-centric approaches need to address several performance issues at all levels of the network, as massive metering datasets are transferred to distant machine clouds while respecting issues like security and data privacy. Our solution uses the fog computing paradigm to provide a system where the computational resources already available throughout the network infrastructure are utilized to facilitate greatly the analysis of fine-grained water consumption data collected by the smart meters, thus significantly reducing the overall load to network and cloud resources. Details of the system's design are presented along with a pilot deployment in a real-world environment. The performance of the system is evaluated in terms of network utilization and computational performance. Our findings indicate that the fog computing paradigm can be applied to a smart grid deployment to reduce effectively the data volume exchanged between the different layers of the architecture and provide better overall computational, security and privacy capabilities to the system

    Antares :a scalable, efficient platform for stream, historic, combined and geospatial querying

    Get PDF
    PhD ThesisTraditional methods for storing and analysing data are proving inadequate for processing \Big Data". This is due to its volume, and the rate at which it is being generated. The limitations of current technologies are further exacerbated by the increased demand for applications which allow users to access and interact with data as soon as it is generated. Near real-time analysis such as this can be partially supported by stream processing systems, however they currently lack the ability to store data for e cient historic processing: many applications require a combination of near real-time and historic data analysis. This thesis investigates this problem, and describes and evaluates a novel approach for addressing it. Antares is a layered framework that has been designed to exploit and extend the scalability of NoSQL databases to support low latency querying and high throughput rates for both stream and historic data analysis simultaneously. Antares began as a company funded project, sponsored by Red Hat the motivation was to identify a new technology which could provide scalable analysis of data, both stream and historic. The motivation for this was to explore new methods for supporting scale and e ciency, for example a layered approach. A layered approach would exploit the scale of historic stores and the speed of in-memory processing. New technologies were investigates to identify current mechanisms and suggest a means of improvement. Antares supports a layered approach for analysis, the motivation for the platform was to provide scalable, low latency querying of Twitter data for other researchers to help automate analysis. Antares needed to provide temporal and spatial analysis of Twitter data using the timestamp and geotag. The approach used Twitter as a use case and derived requirements from social scientists for a broader research project called Tweet My Street. Many data streaming applications have a location-based aspect, using geospatial data to enhance the functionality they provide. However geospatial data is inherently di - cult to process at scale due to its multidimensional nature. To address these di culties, - i - this thesis proposes Antares as a new solution to providing scalable and e cient mechanisms for querying geospatial data. The thesis describes the design of Antares and evaluates its performance on a range of scenarios taken from a real social media analytics application. The results show signi cant performance gains when compared to existing approaches, for particular types of analysis. The approach is evaluated by executing experiments across Antares and similar systems to show the improved results. Antares demonstrates a layered approach can be used to improve performance for inserts and searches as well as increasing the ingestion rate of the system

    MEdit4CEP-SP: A model-driven solution to improve decision-making through user-friendly management and real-time processing of heterogeneous data streams

    Get PDF
    Organisations today are constantly consuming and processing huge amounts of data. Such datasets are often heterogeneous, making it difficult to work with them quickly and easily due to their format constraints or their disparate data structures. Therefore, being able to efficiently and intuitively work with such data to analyse them in real time to detect situations of interest as quickly as possible is a great competitive advantage for companies. Existing approaches have tried to address this issue by providing users with analytics or modelling tools in an isolated way, but not combining them as a onein- all solution. In order to fill this gap, we present MEdit4CEP-SP, a model-driven system that integrates Stream Processing (SP) and Complex Event Processing (CEP) technologies for consuming, processing and analysing heterogeneous data in real time. It provides domain experts with a graphical editor that allows them to infer and define heterogeneous data domains, while also modelling, in a user-friendly way, the situations of interest to be detected in such domains. These graphical definitions are then automatically transformed into code, which is deployed in the processing system at runtime. The alerts detected by the system, in real-time, allow users to react as quickly as possible, thus improving the decision-making process. Additionally, MEdit4CEP-SP provides persistence, storing these definitions in a NoSQL database to permit their reuse by other instances of the system. Further benefits of this system are evaluated and compared with other existing approaches in this paper.Las organizaciones hoy en día están constantemente consumiendo y procesando grandes cantidades de datos. Tales conjuntos de datos son a menudo heterogéneos, lo que dificulta trabajar con ellos de manera rápida y fácil debido a sus restricciones de formato o sus estructuras de datos dispares. Por lo tanto, ser capaz de trabajar de manera eficiente e intuitiva con estos datos para analizarlos en tiempo real para detectar situaciones de interés lo más rápido posible es una gran ventaja competitiva para las empresas. Los enfoques existentes han intentado abordar este problema proporcionando a los usuarios herramientas de análisis o modelado de forma aislada, pero no combinándolas como una solución todo en uno. Para llenar este vacío, presentamos MEdit4CEP-SP, un sistema impulsado por modelos que integra las tecnologías de Procesamiento de Flujo (SP) y Procesamiento de Eventos Complejos (CEP) para consumir, procesar y analizar datos heterogéneos en tiempo real. Proporciona a los expertos del dominio un editor gráfico que les permite inferir y definir dominios de datos heterogéneos, al mismo tiempo que modela, de una manera amigable para el usuario, las situaciones de interés a ser detectadas en dichos dominios. Estas definiciones gráficas se transforman automáticamente en código, que se despliega en el sistema de procesamiento en tiempo de ejecución. Las alertas detectadas por el sistema, en tiempo real, permiten a los usuarios reaccionar lo más rápido posible, mejorando así el proceso de toma de decisiones. Además, MEdit4CEP-SP proporciona persistencia, almacenando estas definiciones en una base de datos NoSQL para permitir su reutilización por otras instancias del sistema. Los beneficios adicionales de este sistema se evalúan y comparan con otros enfoques existentes en este documento.This work was partly supported by the Spanish Ministry of Science and Innovation and the European Regional Development Fund (ERDF) under project FAME (RTI2018-093608-B-C33), and also by the pre-doctoral program of the University of Cádiz, Spain (2017-020/PU/EPIF-FPI-CT/CP)

    Some Contribution of Statistical Techniques in Big Data: A Review

    Get PDF
    Big Data is a popular topic in research work. Everyone is talking about big data, and it is believed that science, business, industry, government, society etc. will undergo a through change with the impact of big data.Big data is used to refer to very huge data set having large, more complex, hidden pattern, structured and unstructured nature of data with the difficulties to collect, storage, analysing for process or result. So proper advanced techniques to use to gain knowledge about big data. In big data research big challenge is created in storage, process, search, sharing, transfer, analysis and visualizing. To deeply discuss on introduction of big data, issue, management and all used big data techniques. Also in this paper present a review of various advanced statistical techniques to handling the key application of big data have large data set. These advanced techniques handle the structure as well as unstructured big data in different area

    Real-time probabilistic reasoning system using Lambda architecture

    Get PDF
    Thesis (MTech (Information Technology))--Cape Peninsula University of Technology, 2019The proliferation of data from sources like social media, and sensor devices has become overwhelming for traditional data storage and analysis technologies to handle. This has prompted a radical improvement in data management techniques, tools and technologies to meet the increasing demand for effective collection, storage and curation of large data set. Most of the technologies are open-source. Big data is usually described as very large dataset. However, a major feature of big data is its velocity. Data flow in as continuous stream and require to be actioned in real-time to enable meaningful, relevant value. Although there is an explosion of technologies to handle big data, they are usually targeted at processing large dataset (historic) and real-time big data independently. Thus, the need for a unified framework to handle high volume dataset and real-time big data. This resulted in the development of models such as the Lambda architecture. Effective decision-making requires processing of historic data as well as real-time data. Some decision-making involves complex processes, depending on the likelihood of events. To handle uncertainty, probabilistic systems were designed. Probabilistic systems use probabilistic models developed with probability theories such as hidden Markov models with inference algorithms to process data and produce probabilistic scores. However, development of these models requires extensive knowledge of statistics and machine learning, making it an uphill task to model real-life circumstances. A new research area called probabilistic programming has been introduced to alleviate this bottleneck. This research proposes the combination of modern open-source big data technologies with probabilistic programming and Lambda architecture on easy-to-get hardware to develop a highly fault-tolerant, and scalable processing tool to process both historic and real-time big data in real-time; a common solution. This system will empower decision makers with the capacity to make better informed resolutions especially in the face of uncertainty. The outcome of this research will be a technology product, built and assessed using experimental evaluation methods. This research will utilize the Design Science Research (DSR) methodology as it describes guidelines for the effective and rigorous construction and evaluation of an artefact. Probabilistic programming in the big data domain is still at its infancy, however, the developed artefact demonstrated an important potential of probabilistic programming combined with Lambda architecture in the processing of big data

    A Data-driven Methodology Towards Mobility- and Traffic-related Big Spatiotemporal Data Frameworks

    Get PDF
    Human population is increasing at unprecedented rates, particularly in urban areas. This increase, along with the rise of a more economically empowered middle class, brings new and complex challenges to the mobility of people within urban areas. To tackle such challenges, transportation and mobility authorities and operators are trying to adopt innovative Big Data-driven Mobility- and Traffic-related solutions. Such solutions will help decision-making processes that aim to ease the load on an already overloaded transport infrastructure. The information collected from day-to-day mobility and traffic can help to mitigate some of such mobility challenges in urban areas. Road infrastructure and traffic management operators (RITMOs) face several limitations to effectively extract value from the exponentially growing volumes of mobility- and traffic-related Big Spatiotemporal Data (MobiTrafficBD) that are being acquired and gathered. Research about the topics of Big Data, Spatiotemporal Data and specially MobiTrafficBD is scattered, and existing literature does not offer a concrete, common methodological approach to setup, configure, deploy and use a complete Big Data-based framework to manage the lifecycle of mobility-related spatiotemporal data, mainly focused on geo-referenced time series (GRTS) and spatiotemporal events (ST Events), extract value from it and support decision-making processes of RITMOs. This doctoral thesis proposes a data-driven, prescriptive methodological approach towards the design, development and deployment of MobiTrafficBD Frameworks focused on GRTS and ST Events. Besides a thorough literature review on Spatiotemporal Data, Big Data and the merging of these two fields through MobiTraffiBD, the methodological approach comprises a set of general characteristics, technical requirements, logical components, data flows and technological infrastructure models, as well as guidelines and best practices that aim to guide researchers, practitioners and stakeholders, such as RITMOs, throughout the design, development and deployment phases of any MobiTrafficBD Framework. This work is intended to be a supporting methodological guide, based on widely used Reference Architectures and guidelines for Big Data, but enriched with inherent characteristics and concerns brought about by Big Spatiotemporal Data, such as in the case of GRTS and ST Events. The proposed methodology was evaluated and demonstrated in various real-world use cases that deployed MobiTrafficBD-based Data Management, Processing, Analytics and Visualisation methods, tools and technologies, under the umbrella of several research projects funded by the European Commission and the Portuguese Government.A população humana cresce a um ritmo sem precedentes, particularmente nas áreas urbanas. Este aumento, aliado ao robustecimento de uma classe média com maior poder económico, introduzem novos e complexos desafios na mobilidade de pessoas em áreas urbanas. Para abordar estes desafios, autoridades e operadores de transportes e mobilidade estão a adotar soluções inovadoras no domínio dos sistemas de Dados em Larga Escala nos domínios da Mobilidade e Tráfego. Estas soluções irão apoiar os processos de decisão com o intuito de libertar uma infraestrutura de estradas e transportes já sobrecarregada. A informação colecionada da mobilidade diária e da utilização da infraestrutura de estradas pode ajudar na mitigação de alguns dos desafios da mobilidade urbana. Os operadores de gestão de trânsito e de infraestruturas de estradas (em inglês, road infrastructure and traffic management operators — RITMOs) estão limitados no que toca a extrair valor de um sempre crescente volume de Dados Espaciotemporais em Larga Escala no domínio da Mobilidade e Tráfego (em inglês, Mobility- and Traffic-related Big Spatiotemporal Data —MobiTrafficBD) que estão a ser colecionados e recolhidos. Os trabalhos de investigação sobre os tópicos de Big Data, Dados Espaciotemporais e, especialmente, de MobiTrafficBD, estão dispersos, e a literatura existente não oferece uma metodologia comum e concreta para preparar, configurar, implementar e usar uma plataforma (framework) baseada em tecnologias Big Data para gerir o ciclo de vida de dados espaciotemporais em larga escala, com ênfase nas série temporais georreferenciadas (em inglês, geo-referenced time series — GRTS) e eventos espacio- temporais (em inglês, spatiotemporal events — ST Events), extrair valor destes dados e apoiar os RITMOs nos seus processos de decisão. Esta dissertação doutoral propõe uma metodologia prescritiva orientada a dados, para o design, desenvolvimento e implementação de plataformas de MobiTrafficBD, focadas em GRTS e ST Events. Além de uma revisão de literatura completa nas áreas de Dados Espaciotemporais, Big Data e na junção destas áreas através do conceito de MobiTrafficBD, a metodologia proposta contem um conjunto de características gerais, requisitos técnicos, componentes lógicos, fluxos de dados e modelos de infraestrutura tecnológica, bem como diretrizes e boas práticas para investigadores, profissionais e outras partes interessadas, como RITMOs, com o objetivo de guiá-los pelas fases de design, desenvolvimento e implementação de qualquer pla- taforma MobiTrafficBD. Este trabalho deve ser visto como um guia metodológico de suporte, baseado em Arqui- teturas de Referência e diretrizes amplamente utilizadas, mas enriquecido com as característi- cas e assuntos implícitos relacionados com Dados Espaciotemporais em Larga Escala, como no caso de GRTS e ST Events. A metodologia proposta foi avaliada e demonstrada em vários cenários reais no âmbito de projetos de investigação financiados pela Comissão Europeia e pelo Governo português, nos quais foram implementados métodos, ferramentas e tecnologias nas áreas de Gestão de Dados, Processamento de Dados e Ciência e Visualização de Dados em plataformas MobiTrafficB

    Design and implementation of a telemetry platform for high-performance computing environments

    Get PDF
    A new generation of high-performance and distributed computing applications and services rely on adaptive and dynamic architectures and execution strategies to run efficiently, resiliently, and at scale in today’s HPC environments. These architectures require insights into their execution behaviour and the state of their execution environment at various levels of detail, in order to make context-aware decisions. HPC telemetry provides this information. It describes the continuous stream of time series and event data that is generated on HPC systems by the hardware, operating systems, services, runtime systems, and applications. Current HPC ecosystems do not provide the conceptual models, infrastructure, and interfaces to collect, store, analyse, and integrate telemetry in a structured and efficient way. Consequently, applications and services largely depend on one-off solutions and custom-built technologies to achieve these goals; introducing significant development overheads that inhibit portability and mobility. To facilitate a broader mix of applications, more efficient application development, and swift adoption of adaptive architectures in production, a comprehensive framework for telemetry management and analysis must be provided as part of future HPC ecosystem designs. This thesis provides the blueprint for such a framework: it proposes a new approach to telemetry management in HPC: the Telemetry Platform concept. Departing from the observation that telemetry data and the corresponding analysis, and integration pat- terns on modern multi-tenant HPC systems have a lot of similarities to the patterns observed in large-scale data analytics or “Big Data” platforms, the telemetry platform concept takes the data platform paradigm and architectural approach and applies them to HPC telemetry. The result is the blueprint for a system that provides services for storing, searching, analysing, and integrating telemetry data in HPC applications and other HPC system services. It allows users to create and share telemetry data-driven insights using everything from simple time-series analysis to complex statistical and machine learning models while at the same time hiding many of the inherent complexities of data management such as data transport, clean-up, storage, cataloguing, access management, and providing appropriate and scalable analytics and integration capabilities. The main contributions of this research are (1) the application of the data platform concept to HPC telemetry data management and usage; (2) a graph-based, time-variant telemetry data model that captures structures and properties of platform and applications and in which telemetry data can be organized; (3) an architecture blueprint and prototype of a concrete implementation and integration architecture of the telemetry platform; and (4) a proposal for decoupled HPC application architectures, separating telemetry data management, and feedback-control-loop logic from the core application code. First experimental results with the prototype implementation suggest that the telemetry platform paradigm can reduce overhead and redundancy in the development of telemetry-based application architectures, and lower the barrier for HPC systems research and the provisioning of new, innovative HPC system services

    Unwarranted variations modelling and analysis of healthcare services based on heterogeneous service data

    Get PDF
    There is a growing demand worldwide to increase the quality and productivity of healthcare services thereby increasing the value of the healthcare services delivered. To deal with these demands, increasingly importance is being placed on analysing and reducing unwarranted variations in healthcare services to achieve significant savings in healthcare expenditure. Unwarranted variations are defined as the variations in the utilisation of healthcare services that cannot be explained by variation in patient illness or patient preferences. Current modelling and simulation approaches for healthcare service efficiency and effectiveness improvements in hospitals do not utilise multiple types of heterogeneous service data such as qualitative information about hospital services and quantitative data such as historic system data, electronic patient records (EPR), and real time tracking data for analysing unwarranted variations in hospital. Consequently, due to the presence of large amount of unwarranted variations in the service delivery systems, service improvement efforts are often inadequate or ineffective. Therefore, there is urgent need to: (i) accurately and efficiently model complex care delivery services provided in hospital; (ii) develop integrated simulation model to analyse unwarranted variations on a care pathway of a hospitals; and, (iii) develop analytical and simulation models to analyse unwarranted variations from a care pathway. Current process modelling methods to represent healthcare services rely on simplified flowchart of patient flow obtained based on on-site observations and clinician workshops. However, gathering and documenting qualitative data from workshops is challenging. Furthermore, resulting models are insufficient in modelling important service interactions and hence the resulting models are often inaccurate. Therefore, a detailed and accurate process modelling methodology is proposed together with a systematic knowledge acquisition approach based on staff interviews. Traditional simulation models utilised simplified flow diagrams as an input together with the historic system data for analysing unwarranted variations on a care pathway. The resulting simulation models are often incomplete leading to oversimplified outputs from the conducted simulations. Therefore, an integrated simulation modelling approach is presented together with the capability to systematically use heterogeneous data to analyse unwarranted variations on service delivery process of a hospital. Maintaining and using care services pathway within hospitals to provide complex care to patients have challenges related to unwarranted variations from a care pathway. These variations from care pathway predominantly occur due ineffective decision making processes, unclear process steps, their interactions, conflicting performance measures for speciality units, and availability of resources. These variations from care pathway are largely unnecessary and lead to longer waiting times, delays, and lower productivity of care pathways. Therefore, methodologies for analysing unwarranted variations from a care pathway such as: (i) system variations (decision makers (roles) and decision making process); (ii) patient variations (patient diversion from care pathway); are discussed in this thesis. A system variations modelling methodology to model system variations in radiology based on real time tracking data is proposed. The methodology employs generalised concepts from graph theory to identify and represent system variations. In particular, edge coloured directed multi-graphs (ECDMs) are used to model system variations which are reflected in paths adopted by staff, i.e., sequence of rooms/areas traversed while delivering services. A pathway variations analysis (PVA) methodology is proposed which simulates patient diversions from the care pathway by modelling hospital operational parameters, assessing the accuracy of clinical decisions, and performance measures of speciality units involved in care pathway to suggest set-based solutions for reducing variations from care pathway. PVA employs the detailed service model of care pathway together with the electronic patient records (EPRs) and historic data. The main steps of the methodology are: (i) generate sample of patients for analysis; (ii) simulate patient diversions from care pathway; and, (iii) simulation analysis to suggest set-based solutions. The aforementioned unwarranted variations analysis approaches have been applied to Magnetic Resonance (MR) scanning process of radiology and stroke care pathway of a large UK hospital as a case study. Proposed improvement options contributed to achieve the performance target of stroke services

    Generating demand responsive bus routes from social network data analysis

    Get PDF
    Acknowledgment The research reflected in this paper has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 770115.Peer reviewedPostprin
    corecore