68 research outputs found

    Big Data for Traffic Estimation and Prediction: A Survey of Data and Tools

    Full text link
    Big data has been used widely in many areas including the transportation industry. Using various data sources, traffic states can be well estimated and further predicted for improving the overall operation efficiency. Combined with this trend, this study presents an up-to-date survey of open data and big data tools used for traffic estimation and prediction. Different data types are categorized and the off-the-shelf tools are introduced. To further promote the use of big data for traffic estimation and prediction tasks, challenges and future directions are given for future studies

    Big Data Computing for Geospatial Applications

    Get PDF
    The convergence of big data and geospatial computing has brought forth challenges and opportunities to Geographic Information Science with regard to geospatial data management, processing, analysis, modeling, and visualization. This book highlights recent advancements in integrating new computing approaches, spatial methods, and data management strategies to tackle geospatial big data challenges and meanwhile demonstrates opportunities for using big data for geospatial applications. Crucial to the advancements highlighted in this book is the integration of computational thinking and spatial thinking and the transformation of abstract ideas and models to concrete data structures and algorithms

    Proceedings of the 3rd Open Source Geospatial Research & Education Symposium OGRS 2014

    Get PDF
    The third Open Source Geospatial Research & Education Symposium (OGRS) was held in Helsinki, Finland, on 10 to 13 June 2014. The symposium was hosted and organized by the Department of Civil and Environmental Engineering, Aalto University School of Engineering, in partnership with the OGRS Community, on the Espoo campus of Aalto University. These proceedings contain the 20 papers presented at the symposium. OGRS is a meeting dedicated to exchanging ideas in and results from the development and use of open source geospatial software in both research and education.  The symposium offers several opportunities for discussing, learning, and presenting results, principles, methods and practices while supporting a primary theme: how to carry out research and educate academic students using, contributing to, and launching open source geospatial initiatives. Participating in open source initiatives can potentially boost innovation as a value creating process requiring joint collaborations between academia, foundations, associations, developer communities and industry. Additionally, open source software can improve the efficiency and impact of university education by introducing open and freely usable tools and research results to students, and encouraging them to get involved in projects. This may eventually lead to new community projects and businesses. The symposium contributes to the validation of the open source model in research and education in geoinformatics

    Compact semantic representations of observational data

    Get PDF
    Das Konzept des Internet der Dinge (IoT) ist in mehreren Bereichen weit verbreitet, damit GerĂ€te miteinander interagieren und bestimmte Aufgaben erfĂŒllen können. IoT-GerĂ€te umfassen verschiedene Konzepte, z.B. Sensoren, Programme, Computer und Aktoren. IoT-GerĂ€te beobachten ihre Umgebung, um Informationen zu sammeln und miteinander zu kommunizieren, um gemeinsame Aufgaben zu erfĂŒllen. Diese Vorrichtungen erzeugen kontinuierlich Beobachtungsdatenströme, die zu historischen Daten werden, wenn diese Beobachtungen gespeichert werden. Durch die Zunahme der Anzahl der IoT-GerĂ€te wird eine große Menge an Streaming- und historischen Beobachtungsdaten erzeugt. DarĂŒber hinaus wurden mehrere Ontologien, wie die Semantic Sensor Network (SSN) Ontologie, fĂŒr die semantische Annotation von Beobachtungsdaten vorgeschlagen - entweder Stream oder historisch. Das Resource Description Framework (RDF) ist ein weit verbreitetes Datenmodell zur semantischen Beschreibung der DatensĂ€tze. Semantische Annotation bietet ein gemeinsames VerstĂ€ndnis fĂŒr die Verarbeitung und Analyse von Beobachtungsdaten. Durch das HinzufĂŒgen von Semantik wird die DatengrĂ¶ĂŸe jedoch weiter erhöht, insbesondere wenn die Beobachtungswerte von mehreren GerĂ€ten redundant erfasst werden. So können beispielsweise mehrere Sensoren Beobachtungen erzeugen, die den gleichen Wert fĂŒr die relative Luftfeuchtigkeit in einem bestimmten Zeitstempel und einer bestimmten Stadt anzeigen. Diese Situation kann in einem RDF-Diagramm mit vier RDF-Tripel dargestellt werden, wobei Beobachtungen als Tripel dargestellt werden, die das beobachtete PhĂ€nomen, die Maßeinheit, den Zeitstempel und die Koordinaten beschreiben. Die RDF-Tripel einer Beobachtung sind mit dem gleichen Thema verbunden. Solche Beobachtungen teilen sich die gleichen Objekte in einer bestimmten Gruppe von Eigenschaften, d.h. sie entsprechen einem Sternmuster, das sich aus diesen Eigenschaften und Objekten zusammensetzt. Wenn die Anzahl dieser SubjektentitĂ€ten oder Eigenschaften in diesen Sternmustern groß ist, wird die GrĂ¶ĂŸe des RDF-Diagramms und der Abfrageverarbeitung negativ beeinflusst; wir bezeichnen diese Sternmuster als hĂ€ufige Sternmuster. Diese Arbeit befasst sich mit dem Problem der Identifizierung von hĂ€ufigen Sternenmustern in RDF-Diagrammen und entwickelt Berechnungsmethoden, um hĂ€ufige Sternmuster zu identifizieren und ein faktorisiertes RDF-Diagramm zu erzeugen, bei dem die Anzahl der hĂ€ufigen Sternmuster minimiert wird. DarĂŒber hinaus wenden wir diese faktorisierten RDF-Darstellungen ĂŒber historische semantische Sensordaten an, die mit der SSN-Ontologie beschrieben werden, und prĂ€sentieren tabellarische Darstellungen von faktorisierten semantischen Sensordaten, um Big Data-Frameworks auszunutzen. DarĂŒber hinaus entwickelt diese Arbeit einen wissensbasierten Ansatz namens DESERT, der in der Lage ist, bei Bedarf Streamdaten zu faktorisieren und semantisch anzureichern (on-Demand factorizE and Semantically Enrich stReam daTa). Wir bewerten die Leistung unserer vorgeschlagenen Techniken anhand mehrerer RDF-Diagramm-Benchmarks. Die Ergebnisse zeigen, dass unsere Techniken in der Lage sind, hĂ€ufige Sternmuster effektiv und effizient zu erkennen, und die GrĂ¶ĂŸe der RDF-Diagramme kann um bis zu 66,56% reduziert werden, wĂ€hrend die im ursprĂŒnglichen RDF-Diagramm dargestellten Daten erhalten bleiben. DarĂŒber hinaus sind die kompakten Darstellungen in der Lage, die Anzahl der RDF-Tripel um mindestens 53,25% in historischen Beobachtungsdaten und bis zu 94,34% in Beobachtungsdatenströmen zu reduzieren. DarĂŒber hinaus reduzieren die Ergebnisse der Anfrageauswertung ĂŒber historische Daten die AusfĂŒhrungszeit der Anfrage um bis zu drei GrĂ¶ĂŸenordnungen. In Beobachtungsdatenströmen wird die GrĂ¶ĂŸe der zur Beantwortung der Anfrage benötigten Daten um 92,53% reduziert, wodurch der Speicherplatzbedarf zur Beantwortung der Anfragen reduziert wird. Diese Ergebnisse belegen, dass IoT-Daten mit den vorgeschlagenen kompakten Darstellungen effizient dargestellt werden können, wodurch die negativen Auswirkungen semantischer Annotationen auf das IoT-Datenmanagement reduziert werden.The Internet of Things (IoT) concept has been widely adopted in several domains to enable devices to interact with each other and perform certain tasks. IoT devices encompass different concepts, e.g., sensors, programs, computers, and actuators. IoT devices observe their surroundings to collect information and communicate with each other in order to perform mutual tasks. These devices continuously generate observational data streams, which become historical data when these observations are stored. Due to an increase in the number of IoT devices, a large amount of streaming and historical observational data is being produced. Moreover, several ontologies, like the Semantic Sensor Network (SSN) Ontology, have been proposed for semantic annotation of observational data-either streams or historical. Resource Description Framework (RDF) is widely adopted data model to semantically describe the datasets. Semantic annotation provides a shared understanding for processing and analysis of observational data. However, adding semantics, further increases the data size especially when the observation values are redundantly sensed by several devices. For example, several sensors can generate observations indicating the same value for relative humidity in a given timestamp and city. This situation can be represented in an RDF graph using four RDF triples where observations are represented as triples that describe the observed phenomenon, the unit of measurement, the timestamp, and the coordinates. The RDF triples of an observation are associated with the same subject. Such observations share the same objects in a certain group of properties, i.e., they match star patterns composed of these properties and objects. In case the number of these subject entities or properties in these star patterns is large, the size of the RDF graph and query processing are negatively impacted; we refer these star patterns as frequent star patterns. This thesis addresses the problem of identifying frequent star patterns in RDF graphs and develop computational methods to identify frequent star patterns and generate a factorized RDF graph where the number of frequent star patterns is minimized. Furthermore, we apply these factorized RDF representations over historical semantic sensor data described using the SSN ontology and present tabular-based representations of factorized semantic sensor data in order to exploit Big Data frameworks. In addition, this thesis devises a knowledge-driven approach named DESERT that is able to on-Demand factorizE and Semantically Enrich stReam daTa. We evaluate the performance of our proposed techniques on several RDF graph benchmarks. The outcomes show that our techniques are able to effectively and efficiently detect frequent star patterns and RDF graph size can be reduced by up to 66.56% while data represented in the original RDF graph is preserved. Moreover, the compact representations are able to reduce the number of RDF triples by at least 53.25% in historical observational data and upto 94.34% in observational data streams. Additionally, query evaluation results over historical data reduce query execution time by up to three orders of magnitude. In observational data streams the size of the data required to answer the query is reduced by 92.53% reducing the memory space requirements to answer the queries. These results provide evidence that IoT data can be efficiently represented using the proposed compact representations, reducing thus, the negative impact that semantic annotations may have on IoT data management

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    User-centric Visualization of Data Provenance

    Get PDF
    The need to understand and track files (and inherently, data) in cloud computing systems is in high demand. Over the past years, the use of logs and data representation using graphs have become the main method for tracking and relating information to the cloud users. While it is still in use, tracking and relating information with ‘Data Provenance’ (i.e. series of chronicles and the derivation history of data on meta-data) is the new trend for cloud users. However, there is still much room for improving representation of data activities in cloud systems for end-users. In this thesis, we propose “UVisP (User-centric Visualization of Data Provenance with Gestalt)”, a novel user-centric visualization technique for data provenance. This technique aims to facilitate the missing link between data movements in cloud computing environments and the end-users’ uncertain queries over their files’ security and life cycle within cloud systems. The proof of concept for the UVisP technique integrates D3 (an open-source visualization API) with Gestalts’ theory of perception to provide a range of user-centric visualizations. UVisP allows users to transform and visualize provenance (logs) with implicit prior knowledge of ‘Gestalts’ theory of perception.’ We presented the initial development of the UVisP technique and our results show that the integration of Gestalt and the existence of ‘perceptual key(s)’ in provenance visualization allows end-users to enhance their visualizing capabilities, extract useful knowledge and understand the visualizations better. This technique also enables end-users to develop certain methods and preferences when sighting different visualizations. For example, having the prior knowledge of Gestalt’s theory of perception and integrated with the types of visualizations offers the user-centric experience when using different visualizations. We also present significant future work that will help profile new user-centric visualizations for cloud users

    24th International Conference on Information Modelling and Knowledge Bases

    Get PDF
    In the last three decades information modelling and knowledge bases have become essentially important subjects not only in academic communities related to information systems and computer science but also in the business area where information technology is applied. The series of European – Japanese Conference on Information Modelling and Knowledge Bases (EJC) originally started as a co-operation initiative between Japan and Finland in 1982. The practical operations were then organised by professor Ohsuga in Japan and professors Hannu Kangassalo and Hannu Jaakkola in Finland (Nordic countries). Geographical scope has expanded to cover Europe and also other countries. Workshop characteristic - discussion, enough time for presentations and limited number of participants (50) / papers (30) - is typical for the conference. Suggested topics include, but are not limited to: 1. Conceptual modelling: Modelling and specification languages; Domain-specific conceptual modelling; Concepts, concept theories and ontologies; Conceptual modelling of large and heterogeneous systems; Conceptual modelling of spatial, temporal and biological data; Methods for developing, validating and communicating conceptual models. 2. Knowledge and information modelling and discovery: Knowledge discovery, knowledge representation and knowledge management; Advanced data mining and analysis methods; Conceptions of knowledge and information; Modelling information requirements; Intelligent information systems; Information recognition and information modelling. 3. Linguistic modelling: Models of HCI; Information delivery to users; Intelligent informal querying; Linguistic foundation of information and knowledge; Fuzzy linguistic models; Philosophical and linguistic foundations of conceptual models. 4. Cross-cultural communication and social computing: Cross-cultural support systems; Integration, evolution and migration of systems; Collaborative societies; Multicultural web-based software systems; Intercultural collaboration and support systems; Social computing, behavioral modeling and prediction. 5. Environmental modelling and engineering: Environmental information systems (architecture); Spatial, temporal and observational information systems; Large-scale environmental systems; Collaborative knowledge base systems; Agent concepts and conceptualisation; Hazard prediction, prevention and steering systems. 6. Multimedia data modelling and systems: Modelling multimedia information and knowledge; Contentbased multimedia data management; Content-based multimedia retrieval; Privacy and context enhancing technologies; Semantics and pragmatics of multimedia data; Metadata for multimedia information systems. Overall we received 56 submissions. After careful evaluation, 16 papers have been selected as long paper, 17 papers as short papers, 5 papers as position papers, and 3 papers for presentation of perspective challenges. We thank all colleagues for their support of this issue of the EJC conference, especially the program committee, the organising committee, and the programme coordination team. The long and the short papers presented in the conference are revised after the conference and published in the Series of “Frontiers in Artificial Intelligence” by IOS Press (Amsterdam). The books “Information Modelling and Knowledge Bases” are edited by the Editing Committee of the conference. We believe that the conference will be productive and fruitful in the advance of research and application of information modelling and knowledge bases. Bernhard Thalheim Hannu Jaakkola Yasushi Kiyok

    Coastal management and adaptation: an integrated data-driven approach

    Get PDF
    Coastal regions are some of the most exposed to environmental hazards, yet the coast is the preferred settlement site for a high percentage of the global population, and most major global cities are located on or near the coast. This research adopts a predominantly anthropocentric approach to the analysis of coastal risk and resilience. This centres on the pervasive hazards of coastal flooding and erosion. Coastal management decision-making practices are shown to be reliant on access to current and accurate information. However, constraints have been imposed on information flows between scientists, policy makers and practitioners, due to a lack of awareness and utilisation of available data sources. This research seeks to tackle this issue in evaluating how innovations in the use of data and analytics can be applied to further the application of science within decision-making processes related to coastal risk adaptation. In achieving this aim a range of research methodologies have been employed and the progression of topics covered mark a shift from themes of risk to resilience. The work focuses on a case study region of East Anglia, UK, benefiting from the input of a partner organisation, responsible for the region’s coasts: Coastal Partnership East. An initial review revealed how data can be utilised effectively within coastal decision-making practices, highlighting scope for application of advanced Big Data techniques to the analysis of coastal datasets. The process of risk evaluation has been examined in detail, and the range of possibilities afforded by open source coastal datasets were revealed. Subsequently, open source coastal terrain and bathymetric, point cloud datasets were identified for 14 sites within the case study area. These were then utilised within a practical application of a geomorphological change detection (GCD) method. This revealed how analysis of high spatial and temporal resolution point cloud data can accurately reveal and quantify physical coastal impacts. Additionally, the research reveals how data innovations can facilitate adaptation through insurance; more specifically how the use of empirical evidence in pricing of coastal flood insurance can result in both communication and distribution of risk. The various strands of knowledge generated throughout this study reveal how an extensive range of data types, sources, and advanced forms of analysis, can together allow coastal resilience assessments to be founded on empirical evidence. This research serves to demonstrate how the application of advanced data-driven analytical processes can reduce levels of uncertainty and subjectivity inherent within current coastal environmental management practices. Adoption of methods presented within this research could further the possibilities for sustainable and resilient management of the incredibly valuable environmental resource which is the coast

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Geospatial Computing: Architectures and Algorithms for Mapping Applications

    Get PDF
    Beginning with the MapTube website (1), which was launched in 2007 for crowd-sourcing maps, this project investigates approaches to exploratory Geographic Information Systems (GIS) using web-based mapping, or ‘web GIS’. Users can log in to upload their own maps and overlay different layers of GIS data sets. This work looks into the theory behind how web-based mapping systems function and whether their performance can be modelled and predicted. One of the important questions when dealing with different geospatial data sets is how they relate to one another. Internet data stores provide another source of information, which can be exploited if more generic geospatial data mining techniques are developed. The identification of similarities between thousands of maps is a GIS technique that can give structure to the overall fabric of the data, once the problems of scalability and comparisons between different geographies are solved. After running MapTube for nine years to crowd-source data, this would mark a natural progression from visualisation of individual maps to wider questions about what additional knowledge can be discovered from the data collected. In the new ‘data science’ age, the introduction of real-time data sets introduces a new challenge for web-based mapping applications. The mapping of real-time geospatial systems is technically challenging, but has the potential to show inter-dependencies as they emerge in the time series. Combined geospatial and temporal data mining of realtime sources can provide archives of transport and environmental data from which to accurately model the systems under investigation. By using techniques from machine learning, the models can be built directly from the real-time data stream. These models can then be used for analysis and experimentation, being derived directly from city data. This then leads to an analysis of the behaviours of the interacting systems. (1) The MapTube website: http://www.maptube.org
    • 

    corecore