412 research outputs found

    Real-time detection of moving crowds using spatio-temporal data streams

    Get PDF
    Over the last decade we have seen a tremendous change in Location Based Services. From primitive reactive applications, explicitly invoked by users, they have evolved into modern complex proactive systems, that are able to automatically provide information based on context and user location. This was caused by the rapid development of outdoor and indoor positioning technologies. GPS modules, which are now included almost into every device, together with indoor technologies, based on WiFi fingerprinting or Bluetooth beacons, allow to determine the user location almost everywhere and at any time. This also led to an enormous growth of spatio-temporal data. Being very efficient using user-centric approach for a single target current Location Based Services remain quite primitive in the area of a multitarget knowledge extraction. This is rather surprising, taking into consideration the data availability and current processing technologies. Discovering useful information from the location of multiple objects is from one side limited by legal issues related to privacy and data ownership. From the other side, mining group location data over time is not a trivial task and require special algorithms and technologies in order to be effective. Recent development in data processing area has led to a huge shift from batch processing offline engines, like MapReduce, to real-time distributed streaming frameworks, like Apache Flink or Apache Spark, which are able to process huge amounts of data, including spatio-temporal datastreams. This thesis presents a system for detecting and analyzing crowds in a continuous spatio-temporal data stream. The aim of the system is to provide relevant knowledge in terms of proactive LBS. The motivation comes from the fact of constant spatio-temporal data growth and recent rapid technological development to process such data

    Big Data Computing for Geospatial Applications

    Get PDF
    The convergence of big data and geospatial computing has brought forth challenges and opportunities to Geographic Information Science with regard to geospatial data management, processing, analysis, modeling, and visualization. This book highlights recent advancements in integrating new computing approaches, spatial methods, and data management strategies to tackle geospatial big data challenges and meanwhile demonstrates opportunities for using big data for geospatial applications. Crucial to the advancements highlighted in this book is the integration of computational thinking and spatial thinking and the transformation of abstract ideas and models to concrete data structures and algorithms

    On-the-fly tracing for data-centric computing : parallelization, workflow and applications

    Get PDF
    As data-centric computing becomes the trend in science and engineering, more and more hardware systems, as well as middleware frameworks, are emerging to handle the intensive computations associated with big data. At the programming level, it is crucial to have corresponding programming paradigms for dealing with big data. Although MapReduce is now a known programming model for data-centric computing where parallelization is completely replaced by partitioning the computing task through data, not all programs particularly those using statistical computing and data mining algorithms with interdependence can be re-factorized in such a fashion. On the other hand, many traditional automatic parallelization methods put an emphasis on formalism and may not achieve optimal performance with the given limited computing resources. In this work we propose a cross-platform programming paradigm, called on-the-fly data tracing , to provide source-to-source transformation where the same framework also provides the functionality of workflow optimization on larger applications. Using a big-data approximation computations related to large-scale data input are identified in the code and workflow and a simplified core dependence graph is built based on the computational load taking in to account big data. The code can then be partitioned into sections for efficient parallelization; and at the workflow level, optimization can be performed by adjusting the scheduling for big-data considerations, including the I/O performance of the machine. Regarding each unit in both source code and workflow as a model, this framework enables model-based parallel programming that matches the available computing resources. The techniques used in model-based parallel programming as well as the design of the software framework for both parallelization and workflow optimization as well as its implementations with multiple programming languages are presented in the dissertation. Then, the following experiments are performed to validate the framework: i) the benchmarking of parallelization speed-up using typical examples in data analysis and machine learning (e.g. naive Bayes, k-means) and ii) three real-world applications in data-centric computing with the framework are also described to illustrate the efficiency: pattern detection from hurricane and storm surge simulations, road traffic flow prediction and text mining from social media data. In the applications, it illustrates how to build scalable workflows with the framework along with performance enhancements

    A cloud-based remote sensing data production system

    Get PDF
    The data processing capability of existing remote sensing system has not kept pace with the amount of data typically received and need to be processed. Existing product services are not capable of providing users with a variety of remote sensing data sources for selection, either. Therefore, in this paper, we present a product generation programme using multisource remote sensing data, across distributed data centers in a cloud environment, so as to compensate for the low productive efficiency, less types and simple services of the existing system. The programme adopts “master–slave” architecture. Specifically, the master center is mainly responsible for the production order receiving and parsing, as well as task and data scheduling, results feedback, and so on; the slave centers are the distributed remote sensing data centers, which storage one or more types of remote sensing data, and mainly responsible for production task execution. In general, each production task only runs on one data center, and the data scheduling among centers adopts a “minimum data transferring” strategy. The logical workflow of each production task is organized based on knowledge base, and then turned into the actual executed workflow by Kepler. In addition, the scheduling strategy of each production task mainly depends on the Ganglia monitoring results, thus the computing resources can be allocated or expanded adaptively. Finally, we evaluated the proposed programme using test experiments performed at global, regional and local areas, and the results showed that our proposed cloud-based remote sensing production system could deal with massive remote sensing data and different products generating, as well as on-demand remote sensing computing and information service

    Privacy and trustworthiness management in moving object environments

    Get PDF
    The use of location-based services (LBS) (e.g., Intel\u27s Thing Finder) is expanding. Besides the traditional centralized location-based services, distributed ones are also emerging due to the development of Vehicular Ad-hoc Networks (VANETs), a dynamic network which allows vehicles to communicate with one another. Due to the nature of the need of tracking users\u27 locations, LBS have raised increasing concerns on users\u27 location privacy. Although many research has been carried out for users to submit their locations anonymously, the collected anonymous location data may still be mapped to individuals when the adversary has related background knowledge. To improve location privacy, in this dissertation, the problem of anonymizing the collected location datasets is addressed so that they can be published for public use without violating any privacy concerns. Specifically, a privacy-preserving trajectory publishing algorithm is proposed that preserves high data utility rate. Moreover, the scalability issue is tackled in the case the location datasets grows gigantically due to continuous data collection as well as increase of LBS users by developing a distributed version of our trajectory publishing algorithm which leveraging the MapReduce technique. As a consequence of users being anonymous, it becomes more challenging to evaluate the trustworthiness of messages disseminated by anonymous users. Existing research efforts are mainly focused on privacy-preserving authentication of users which helps in tracing malicious vehicles only after the damage is done. However, it is still not sufficient to prevent malicious behavior from happening in the case where attackers do not care whether they are caught later on. Therefore, it would be more effective to also evaluate the content of the message. In this dissertation, a novel information-oriented trustworthiness evaluation is presented which enables each individual user to evaluate the message content and make informed decisions --Abstract, page iii

    Key concepts of group pattern discovery algorithms from spatio-temporal trajectories

    Get PDF
    Over the years, the increasing development of location acquisition devices have generated a significant amount of spatio-temporal data. This data can be further analysed in search for some interesting patterns, new information, or to construct predictive models such as next location prediction. The goal of this paper is to contribute to the future research and development of group pattern discovery algorithms from spatio-temporal data by providing an insight into algorithms design in this research area which is based on a comprehensive classification of state-of-the-art models. This work includes static, big data as well as data stream processing models which to the best of authors’knowledge is the first attempt of presenting them in this context.Furthermore, the currently available surveys and taxonomies in this research area do not focus on group pattern mining algorithms nor include the state-of-the-art models. The authors conclude with the proposal of a conceptual model of Universal,Streaming, Distributed and Parameter-light (UDSP) algorithm that addresses current challenges in this research area

    From Social Data Mining to Forecasting Socio-Economic Crisis

    Full text link
    Socio-economic data mining has a great potential in terms of gaining a better understanding of problems that our economy and society are facing, such as financial instability, shortages of resources, or conflicts. Without large-scale data mining, progress in these areas seems hard or impossible. Therefore, a suitable, distributed data mining infrastructure and research centers should be built in Europe. It also appears appropriate to build a network of Crisis Observatories. They can be imagined as laboratories devoted to the gathering and processing of enormous volumes of data on both natural systems such as the Earth and its ecosystem, as well as on human techno-socio-economic systems, so as to gain early warnings of impending events. Reality mining provides the chance to adapt more quickly and more accurately to changing situations. Further opportunities arise by individually customized services, which however should be provided in a privacy-respecting way. This requires the development of novel ICT (such as a self- organizing Web), but most likely new legal regulations and suitable institutions as well. As long as such regulations are lacking on a world-wide scale, it is in the public interest that scientists explore what can be done with the huge data available. Big data do have the potential to change or even threaten democratic societies. The same applies to sudden and large-scale failures of ICT systems. Therefore, dealing with data must be done with a large degree of responsibility and care. Self-interests of individuals, companies or institutions have limits, where the public interest is affected, and public interest is not a sufficient justification to violate human rights of individuals. Privacy is a high good, as confidentiality is, and damaging it would have serious side effects for society.Comment: 65 pages, 1 figure, Visioneer White Paper, see http://www.visioneer.ethz.c
    • …
    corecore