299 research outputs found

    Managing big data experiments on smartphones

    Get PDF
    The explosive number of smartphones with ever growing sensing and computing capabilities have brought a paradigm shift to many traditional domains of the computing field. Re-programming smartphones and instrumenting them for application testing and data gathering at scale is currently a tedious and time-consuming process that poses significant logistical challenges. Next generation smartphone applications are expected to be much larger-scale and complex, demanding that these undergo evaluation and testing under different real-world datasets, devices and conditions. In this paper, we present an architecture for managing such large-scale data management experiments on real smartphones. We particularly present the building blocks of our architecture that encompassed smartphone sensor data collected by the crowd and organized in our big data repository. The given datasets can then be replayed on our testbed comprising of real and simulated smartphones accessible to developers through a web-based interface. We present the applicability of our architecture through a case study that involves the evaluation of individual components that are part of a complex indoor positioning system for smartphones, coined Anyplace, which we have developed over the years. The given study shows how our architecture allows us to derive novel insights into the performance of our algorithms and applications, by simplifying the management of large-scale data on smartphones

    METADATA MANAGEMENT FOR CLINICAL DATA INTEGRATION

    Get PDF
    Clinical data have been continuously collected and growing with the wide adoption of electronic health records (EHR). Clinical data have provided the foundation to facilitate state-of-art researches such as artificial intelligence in medicine. At the same time, it has become a challenge to integrate, access, and explore study-level patient data from large volumes of data from heterogeneous databases. Effective, fine-grained, cross-cohort data exploration, and semantically enabled approaches and systems are needed. To build semantically enabled systems, we need to leverage existing terminology systems and ontologies. Numerous ontologies have been developed recently and they play an important role in semantically enabled applications. Because they contain valuable codified knowledge, the management of these ontologies, as metadata, also requires systematic approaches. Moreover, in most clinical settings, patient data are collected with the help of a data dictionary. Knowledge of the relationships between an ontology and a related data dictionary is important for semantic interoperability. Such relationships are represented and maintained by mappings. Mappings store how data source elements and domain ontology concepts are linked, as well as how domain ontology concepts are linked between different ontologies. While mappings are crucial to the maintenance of relationships between an ontology and a related data dictionary, they are commonly captured by CSV files with limits capabilities for sharing, tracking, and visualization. The management of mappings requires an innovative, interactive, and collaborative approach. Metadata management servers to organize data that describes other data. In computer science and information science, ontology is the metadata consisting of the representation, naming, and definition of the hierarchies, properties, and relations between concepts. A structural, scalable, and computer understandable way for metadata management is critical to developing systems with the fine-grained data exploration capabilities. This dissertation presents a systematic approach called MetaSphere using metadata and ontologies to support the management and integration of clinical research data through our ontology-based metadata management system for multiple domains. MetaSphere is a general framework that aims to manage specific domain metadata, provide fine-grained data exploration interface, and store patient data in data warehouses. Moreover, MetaSphere provides a dedicated mapping interface called Interactive Mapping Interface (IMI) to map the data dictionary to well-recognized and standardized ontologies. MetaSphere has been applied to three domains successfully, sleep domain (X-search), pressure ulcer injuries and deep tissue pressure (SCIPUDSphere), and cancer. Specifically, MetaSphere stores domain ontology structurally in databases. Patient data in the corresponding domains are also stored in databases as data warehouses. MetaSphere provides a powerful query interface to enable interaction between human and actual patient data. Query interface is a mechanism allowing researchers to compose complex queries to pinpoint specific cohort over a large amount of patient data. The MetaSphere framework has been instantiated into three domains successfully and the detailed results are as below. X-search is publicly available at https://www.x-search.net with nine sleep domain datasets consisting of over 26,000 unique subjects. The canonical data dictionary contains over 900 common data elements across the datasets. X-search has received over 1800 cross-cohort queries by users from 16 countries. SCIPUDSphere has integrated a total number of 268,562 records containing 282 ICD9 codes related to pressure ulcer injuries among 36,626 individuals with spinal cord injuries. IMI is publicly available at http://epi-tome.com/. Using IMI, we have successfully mapped the North American Association of Central Cancer Registries (NAACCR) data dictionary to the National Cancer Institute Thesaurus (NCIt) concepts

    Cloud-based data-intensive framework towards fault diagnosis in large-scale petrochemical plants

    Get PDF
    Industrial Wireless Sensor Networks (IWSNs) are expected to offer promising monitoring solutions to meet the demands of monitoring applications for fault diagnosis in large-scale petrochemical plants, however, involves heterogeneity and Big Data problems due to large amounts of sensor data with high volume and velocity. Cloud Computing is an outstanding approach which provides a flexible platform to support the addressing of such heterogeneous and data-intensive problems with massive computing, storage, and data-based services. In this paper, we propose a Cloud-based Data-intensive Framework (CDF) for on-line equipment fault diagnosis system that facilitates the integration and processing of mass sensor data generated from Industrial Sensing Ecosystem (ISE). ISE enables data collection of interest with topic-specific industrial monitoring systems. Moreover, this approach contributes the establishment of on-line fault diagnosis monitoring system with sensor streaming computing and storage paradigms based on Hadoop as a key to the complex problems. Finally, we present a practical illustration referred to this framework serving equipment fault diagnosis systems with the ISE

    Adaptive Mechanisms for Mobile Spatio-Temporal Applications

    Get PDF
    Mobile spatio-temporal applications play a key role in many mission critical fields, including Business Intelligence, Traffic Management and Disaster Management. They are characterized by high data volume, velocity and large and variable number of mobile users. The design and implementation of these applications should not only consider this variablility, but also support other quality requirements such as performance and cost. In this thesis we propose an architecture for mobile spatio-temporal applications, which enables multiple angles of adaptivity. We also introduce a two-level adaptation mechanism that ensures system performance while facilitating scalability and context-aware adaptivity. We validate the architecture and adaptation mechanisms by implementing a road quality assessment mobile application as a use case and by performing a series of experiments on cloud environment. We show that our proposed architecture can adapt at runtime and maintain service level objectives while offering cost-efficiency and robustness

    The Analysis of Big Data on Cites and Regions - Some Computational and Statistical Challenges

    Get PDF
    Big Data on cities and regions bring new opportunities and challenges to data analysts and city planners. On the one side, they hold great promise to combine increasingly detailed data for each citizen with critical infrastructures to plan, govern and manage cities and regions, improve their sustainability, optimize processes and maximize the provision of public and private services. On the other side, the massive sample size and high-dimensionality of Big Data and their geo-temporal character introduce unique computational and statistical challenges. This chapter provides overviews on the salient characteristics of Big Data and how these features impact on paradigm change of data management and analysis, and also on the computing environment.Series: Working Papers in Regional Scienc

    Revolutionising the quality of life: the role of real-time sensing in smart cities

    Get PDF
    To further evolve urban quality of life, this paper explores the potential of crowdsensing and crowdsourcing in the context of smart cities. To aid urban planners and residents in understanding the nuances of day-to-day urban dynamics, we actively pursue the improvement of data visualisation tools that can adapt to changing conditions. An architecture was created and implemented that ensures secure and easy connectivity between various sources, such as a network of Internet of Things (IoT) devices, to merge with crowdsensing data and use them efficiently. In addition, we expanded the scope of our study to include the development of mobile and online applications, emphasizing the integration of autonomous and geo-surveillance. The main findings highlight the importance of sensor data in urban knowledge. Their incorporation via Tepresentational State Transfer (REST) Application Programming Interface (APIs) improves data access and informed decision-making, and dynamic data visualisation provides better insights. The geofencing of the application encourages community participation in urban planning and resource allocation, supporting sustainable urban innovation.This work was supported by FCT-Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020 and the project “Integrated and Innovative Solutions for the well-being of people in complex urban centers” within the Project Scope NORTE-01-0145-FEDER-000086. Rui Miranda was supported by grant no. UMINHO/BID/2021/137; Carlos Alves was supported by grant nos. 2022.12629.BD and UMINHO/BID/2021/134; Regina Sousa was supported by grant no. UMINHO/BID/2021/136; António Chaves was supported by grant no. UMINHO/BID/2021/135; Larissa Montenegro was supported by grant no. UMINHO/BID/2022/53

    Wiki-health: from quantified self to self-understanding

    Get PDF
    Today, healthcare providers are experiencing explosive growth in data, and medical imaging represents a significant portion of that data. Meanwhile, the pervasive use of mobile phones and the rising adoption of sensing devices, enabling people to collect data independently at any time or place is leading to a torrent of sensor data. The scale and richness of the sensor data currently being collected and analysed is rapidly growing. The key challenges that we will be facing are how to effectively manage and make use of this abundance of easily-generated and diverse health data. This thesis investigates the challenges posed by the explosive growth of available healthcare data and proposes a number of potential solutions to the problem. As a result, a big data service platform, named Wiki-Health, is presented to provide a unified solution for collecting, storing, tagging, retrieving, searching and analysing personal health sensor data. Additionally, it allows users to reuse and remix data, along with analysis results and analysis models, to make health-related knowledge discovery more available to individual users on a massive scale. To tackle the challenge of efficiently managing the high volume and diversity of big data, Wiki-Health introduces a hybrid data storage approach capable of storing structured, semi-structured and unstructured sensor data and sensor metadata separately. A multi-tier cloud storage system—CACSS has been developed and serves as a component for the Wiki-Health platform, allowing it to manage the storage of unstructured data and semi-structured data, such as medical imaging files. CACSS has enabled comprehensive features such as global data de-duplication, performance-awareness and data caching services. The design of such a hybrid approach allows Wiki-Health to potentially handle heterogeneous formats of sensor data. To evaluate the proposed approach, we have developed an ECG-based health monitoring service and a virtual sensing service on top of the Wiki-Health platform. The two services demonstrate the feasibility and potential of using the Wiki-Health framework to enable better utilisation and comprehension of the vast amounts of sensor data available from different sources, and both show significant potential for real-world applications.Open Acces

    Towards Mobility Data Science (Vision Paper)

    Full text link
    Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the emerging domain of mobility data science. Towards a unified approach to mobility data science, we envision a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art and describe open challenges for the research community in the coming years.Comment: Updated arXiv metadata to include two authors that were missing from the metadata. PDF has not been change
    • …
    corecore