26 research outputs found

    Social network data analysis for event detection

    Get PDF
    Cities concentrate enough Social Network (SN) activity to empower rich models. We present an approach to event discovery based on the information provided by three SN, minimizing the data properties used to maximize the total amount of usable data. We build a model of the normal city behavior which we use to detect abnormal situations (events). After collecting half a year of data we show examples of the events detected and introduce some applications.Peer ReviewedPostprint (published version

    An online interactive dashboard to explore personal exposure to air pollution

    Get PDF
    Studies increasingly examine individual exposure to air pollution while accounting for person-specific activity-travel patterns. Supporting policymakers and local communities using the resulting data requires transparent and ethical communication of exposure levels to affected individuals and other stakeholders. This paper asks how an interactive online dashboard might represent individual-level air pollution exposure profiles to different audiences while respecting individuals’ geoprivacy. Using data from 37 Oxford (UK) residents, it shows that heterogeneous individual-level exposure profiles can be shared ethically through different combinations of visualisation method, spatial and temporal resolution of data representation and Geomasking techniques for different dashboard user groups

    Enabling autoscaling for in-memory storage in cluster computing framework

    Get PDF
    2019 Spring.Includes bibliographical references.IoT enabled devices and observational instruments continuously generate voluminous data. A large portion of these datasets are delivered with the associated geospatial locations. The increased volumes of geospatial data, alongside the emerging geospatial services, pose computational challenges for large-scale geospatial analytics. We have designed and implemented STRETCH , an in-memory distributed geospatial storage that preserves spatial proximity and enables proactive autoscaling for frequently accessed data. STRETCH stores data with a delayed data dispersion scheme that incrementally adds data nodes to the storage system. We have devised an autoscaling feature that proactively repartitions data to alleviate computational hotspots before they occur. We compared the performance of S TRETCH with Apache Ignite and the results show that STRETCH provides up to 3 times the throughput when the system encounters hotspots. STRETCH is built on Apache Spark and Ignite and interacts with them at runtime

    Scalable Spatial Framework for NoSQL Databases - Haslam Scholars Program Undergraduate Thesis

    Get PDF
    The spatial frameworks used for knowledge discovery in “Big Data” areas such as urban information systems (UIS) are well- developed in SQL databases but are not as extensive within certain NoSQL databases. The focus of this project is to develop this framework for emerging search systems (ESS) in UIS by utilizing NoSQL databases, notably the document-based MongoDB. Such framework includes spatial functions for the most fundamental spatial queries. An ESS in UIS can take advantage of these new and attractive features of scalability within MongoDB to provide a robust approach to spatial search that differs from SQL relations and scalability. MongoDB, which is relatively in its early stages of spatial search in contrast to PostgreSQL, will require contributions to its spatial “toolbox”. Many of the operations present in SQL packages, such as PostGIS, are not in MongoDB. Thus, there is an opportunity to contribute to MongoDB’s ongoing geospatial evolution by developing, testing, and optimizing the spatial utilities used for large NoSQL datasets. Within UIS, these core operations can prove to be an important starting point for detailed geospatial analysis and high-impact data production. We hope, by open sourcing this framework (as an extension), it can serve the research community as the foundation for scalable NoSQL platforms for big geospatial data analytics and be the next stage for open source contributions to MongoDB

    Distributed spatial indexing for the Internet of Things data management

    Get PDF
    The Internet of Things (IoT) has become a new enabler for collecting real-world observation and measurement data from the physical world. The IoT allows objects with sensing and network capabilities (i.e. Things and devices) to communicate with one another and with other resources (e.g. services) on the digital world. The heterogeneity, dynamicity and ad-hoc nature of underlying data, and services published by most of IoT resources make accessing and processing the data and services a challenging task. The IoT demands distributed, scalable, and efficient indexing solutions for large-scale distributed IoT networks. We describe a novel distributed indexing approach for IoT resources and their published data. The index structure is constructed by encoding the locations of IoT resources into geohashes and then building a quadtree on the minimum bounding box of the geohash representations. This allows to aggregate resources with similar geohashes and reduce the size of the index. We have evaluated our proposed solution on a large-scale dataset and our results show that the proposed approach can efficiently index and enable discovery of the IoT resources with 65% better response time than a centralised approach and with a high success rate (around 90% in the first few attempts)

    Locality-aware scientific workflow engine for fast-evolving spatiotemporal sensor data, A

    Get PDF
    2017 Spring.Includes bibliographical references.Discerning knowledge from voluminous data involves a series of data manipulation steps. Scientists typically compose and execute workflows for these steps using scientific workflow management systems (SWfMSs). SWfMSs have been developed for several research communities including but not limited to bioinformatics, biology, astronomy, computational science, and physics. Parallel execution of workflows has been widely employed in SWfMSs by exploiting the storage and computing resources of grid and cloud services. However, none of these systems have been tailored for the needs of spatiotemporal analytics on real-time sensor data with high arrival rates. This thesis demonstrates the development and evaluation of a target-oriented workflow model that enables a user to specify dependencies among the workflow components, including data availability. The underlying spatiotemporal data dispersion and indexing scheme provides fast data search and retrieval to plan and execute computations comprising the workflow. This work includes a scheduling algorithm that targets minimizing data movement across machines while ensuring fair and efficient resource allocation among multiple users. The study includes empirical evaluations performed on the Google cloud

    R*-Grove: Balanced Spatial Partitioning for Large-scale Datasets

    Full text link
    The rapid growth of big spatial data urged the research community to develop several big spatial data systems. Regardless of their architecture, one of the fundamental requirements of all these systems is to spatially partition the data efficiently across machines. The core challenges of big spatial partitioning are building high spatial quality partitions while simultaneously taking advantages of distributed processing models by providing load balanced partitions. Previous works on big spatial partitioning are to reuse existing index search trees as-is, e.g., the R-tree family, STR, Kd-tree, and Quad-tree, by building a temporary tree for a sample of the input and use its leaf nodes as partition boundaries. However, we show in this paper that none of those techniques has addressed the mentioned challenges completely. This paper proposes a novel partitioning method, termed R*-Grove, which can partition very large spatial datasets into high quality partitions with excellent load balance and block utilization. This appealing property allows R*-Grove to outperform existing techniques in spatial query processing. R*-Grove can be easily integrated into any big data platforms such as Apache Spark or Apache Hadoop. Our experiments show that R*-Grove outperforms the existing partitioning techniques for big spatial data systems. With all the proposed work publicly available as open source, we envision that R*-Grove will be adopted by the community to better serve big spatial data research.Comment: 29 pages, to be published in Frontiers in Big Dat
    corecore