9,578 research outputs found

    Searching and mining in enriched geo-spatial data

    Get PDF
    The emergence of new data collection mechanisms in geo-spatial applications paired with a heightened tendency of users to volunteer information provides an ever-increasing flow of data of high volume, complex nature, and often associated with inherent uncertainty. Such mechanisms include crowdsourcing, automated knowledge inference, tracking, and social media data repositories. Such data bearing additional information from multiple sources like probability distributions, text or numerical attributes, social context, or multimedia content can be called multi-enriched. Searching and mining this abundance of information holds many challenges, if all of the data's potential is to be released. This thesis addresses several major issues arising in that field, namely path queries using multi-enriched data, trend mining in social media data, and handling uncertainty in geo-spatial data. In all cases, the developed methods have made significant contributions and have appeared in or were accepted into various renowned international peer-reviewed venues. A common use of geo-spatial data is path queries in road networks where traditional methods optimise results based on absolute and ofttimes singular metrics, i.e., finding the shortest paths based on distance or the best trade-off between distance and travel time. Integrating additional aspects like qualitative or social data by enriching the data model with knowledge derived from sources as mentioned above allows for queries that can be issued to fit a broader scope of needs or preferences. This thesis presents two implementations of incorporating multi-enriched data into road networks. In one case, a range of qualitative data sources is evaluated to gain knowledge about user preferences which is subsequently matched with locations represented in a road network and integrated into its components. Several methods are presented for highly customisable path queries that incorporate a wide spectrum of data. In a second case, a framework is described for resource distribution with reappearance in road networks to serve one or more clients, resulting in paths that provide maximum gain based on a probabilistic evaluation of available resources. Applications for this include finding parking spots. Social media trends are an emerging research area giving insight in user sentiment and important topics. Such trends consist of bursts of messages concerning a certain topic within a time frame, significantly deviating from the average appearance frequency of the same topic. By investigating the dissemination of such trends in space and time, this thesis presents methods to classify trend archetypes to predict future dissemination of a trend. Processing and querying uncertain data is particularly demanding given the additional knowledge required to yield results with probabilistic guarantees. Since such knowledge is not always available and queries are not easily scaled to larger datasets due to the #P-complete nature of the problem, many existing approaches reduce the data to a deterministic representation of its underlying model to eliminate uncertainty. However, data uncertainty can also provide valuable insight into the nature of the data that cannot be represented in a deterministic manner. This thesis presents techniques for clustering uncertain data as well as query processing, that take the additional information from uncertainty models into account while preserving scalability using a sampling-based approach, while previous approaches could only provide one of the two. The given solutions enable the application of various existing clustering techniques or query types to a framework that manages the uncertainty.Das Erscheinen neuer Methoden zur Datenerhebung in räumlichen Applikationen gepaart mit einer erhöhten Bereitschaft der Nutzer, Daten über sich preiszugeben, generiert einen stetig steigenden Fluss von Daten in großer Menge, komplexer Natur, und oft gepaart mit inhärenter Unsicherheit. Beispiele für solche Mechanismen sind Crowdsourcing, automatisierte Wissensinferenz, Tracking, und Daten aus sozialen Medien. Derartige Daten, angereichert mit mit zusätzlichen Informationen aus verschiedenen Quellen wie Wahrscheinlichkeitsverteilungen, Text- oder numerische Attribute, sozialem Kontext, oder Multimediainhalten, werden als multi-enriched bezeichnet. Suche und Datamining in dieser weiten Datenmenge hält viele Herausforderungen bereit, wenn das gesamte Potenzial der Daten genutzt werden soll. Diese Arbeit geht auf mehrere große Fragestellungen in diesem Feld ein, insbesondere Pfadanfragen in multi-enriched Daten, Trend-mining in Daten aus sozialen Netzwerken, und die Beherrschung von Unsicherheit in räumlichen Daten. In all diesen Fällen haben die entwickelten Methoden signifikante Forschungsbeiträge geleistet und wurden veröffentlicht oder angenommen zu diversen renommierten internationalen, von Experten begutachteten Konferenzen und Journals. Ein gängiges Anwendungsgebiet räumlicher Daten sind Pfadanfragen in Straßennetzwerken, wo traditionelle Methoden die Resultate anhand absoluter und oft auch singulärer Maße optimieren, d.h., der kürzeste Pfad in Bezug auf die Distanz oder der beste Kompromiss zwischen Distanz und Reisezeit. Durch die Integration zusätzlicher Aspekte wie qualitativer Daten oder Daten aus sozialen Netzwerken als Anreicherung des Datenmodells mit aus diesen Quellen abgeleitetem Wissen werden Anfragen möglich, die ein breiteres Spektrum an Anforderungen oder Präferenzen erfüllen. Diese Arbeit präsentiert zwei Ansätze, solche multi-enriched Daten in Straßennetze einzufügen. Zum einen wird eine Reihe qualitativer Datenquellen ausgewertet, um Wissen über Nutzerpräferenzen zu generieren, welches darauf mit Örtlichkeiten im Straßennetz abgeglichen und in das Netz integriert wird. Diverse Methoden werden präsentiert, die stark personalisierbare Pfadanfragen ermöglichen, die ein weites Spektrum an Daten mit einbeziehen. Im zweiten Fall wird ein Framework präsentiert, das eine Ressourcenverteilung im Straßennetzwerk modelliert, bei der einmal verbrauchte Ressourcen erneut auftauchen können. Resultierende Pfade ergeben einen maximalen Ertrag basieren auf einer probabilistischen Evaluation der verfügbaren Ressourcen. Eine Anwendung ist die Suche nach Parkplätzen. Trends in sozialen Medien sind ein entstehendes Forscchungsgebiet, das Einblicke in Benutzerverhalten und wichtige Themen zulässt. Solche Trends bestehen aus großen Mengen an Nachrichten zu einem bestimmten Thema innerhalb eines Zeitfensters, so dass die Auftrittsfrequenz signifikant über den durchschnittlichen Level liegt. Durch die Untersuchung der Fortpflanzung solcher Trends in Raum und Zeit präsentiert diese Arbeit Methoden, um Trends nach Archetypen zu klassifizieren und ihren zukünftigen Weg vorherzusagen. Die Anfragebearbeitung und Datamining in unsicheren Daten ist besonders herausfordernd, insbesondere im Hinblick auf das notwendige Zusatzwissen, um Resultate mit probabilistischen Garantien zu erzielen. Solches Wissen ist nicht immer verfügbar und Anfragen lassen sich aufgrund der \P-Vollständigkeit des Problems nicht ohne Weiteres auf größere Datensätze skalieren. Dennoch kann Datenunsicherheit wertvollen Einblick in die Struktur der Daten liefern, der mit deterministischen Methoden nicht erreichbar wäre. Diese Arbeit präsentiert Techniken zum Clustering unsicherer Daten sowie zur Anfragebearbeitung, die die Zusatzinformation aus dem Unsicherheitsmodell in Betracht ziehen, jedoch gleichzeitig die Skalierbarkeit des Ansatzes auf große Datenmengen sicherstellen

    IEEE Access Special Section Editorial: Big Data Technology and Applications in Intelligent Transportation

    Get PDF
    During the last few years, information technology and transportation industries, along with automotive manufacturers and academia, are focusing on leveraging intelligent transportation systems (ITS) to improve services related to driver experience, connected cars, Internet data plans for vehicles, traffic infrastructure, urban transportation systems, traffic collaborative management, road traffic accidents analysis, road traffic flow prediction, public transportation service plan, personal travel route plans, and the development of an effective ecosystem for vehicles, drivers, traffic controllers, city planners, and transportation applications. Moreover, the emerging technologies of the Internet of Things (IoT) and cloud computing have provided unprecedented opportunities for the development and realization of innovative intelligent transportation systems where sensors and mobile devices can gather information and cloud computing, allowing knowledge discovery, information sharing, and supported decision making. However, the development of such data-driven ITS requires the integration, processing, and analysis of plentiful information obtained from millions of vehicles, traffic infrastructures, smartphones, and other collaborative systems like weather stations and road safety and early warning systems. The huge amount of data generated by ITS devices is only of value if utilized in data analytics for decision-making such as accident prevention and detection, controlling road risks, reducing traffic carbon emissions, and other applications which bring big data analytics into the picture

    Spatial Big Data Analytics: Classification Techniques for Earth Observation Imagery

    Get PDF
    University of Minnesota Ph.D. dissertation. August 2016. Major: Computer Science. Advisor: Shashi Shekhar. 1 computer file (PDF); xi, 120 pages.Spatial Big Data (SBD), e.g., earth observation imagery, GPS trajectories, temporally detailed road networks, etc., refers to geo-referenced data whose volume, velocity, and variety exceed the capability of current spatial computing platforms. SBD has the potential to transform our society. Vehicle GPS trajectories together with engine measurement data provide a new way to recommend environmentally friendly routes. Satellite and airborne earth observation imagery plays a crucial role in hurricane tracking, crop yield prediction, and global water management. The potential value of earth observation data is so significant that the White House recently declared that full utilization of this data is one of the nation's highest priorities. However, SBD poses significant challenges to current big data analytics. In addition to its huge dataset size (NASA collects petabytes of earth images every year), SBD exhibits four unique properties related to the nature of spatial data that must be accounted for in any data analysis. First, SBD exhibits spatial autocorrelation effects. In other words, we cannot assume that nearby samples are statistically independent. Current analytics techniques that ignore spatial autocorrelation often perform poorly such as low prediction accuracy and salt-and-pepper noise (i.e., pixels predicted as different from neighbors by mistake). Second, spatial interactions are not isotropic and vary across directions. Third, spatial dependency exists in multiple spatial scales. Finally, spatial big data exhibits heterogeneity, i.e., identical feature values may correspond to distinct class labels in different regions. Thus, learned predictive models may perform poorly in many local regions. My thesis investigates novel SBD analytics techniques to address some of these challenges. To date, I have been mostly focusing on the challenges of spatial autocorrelation and anisotropy via developing novel spatial classification models such as spatial decision trees for raster SBD (e.g., earth observation imagery). To scale up the proposed models, I developed efficient learning algorithms via computational pruning. The proposed techniques have been applied to real world remote sensing imagery for wetland mapping. I also had developed spatial ensemble learning framework to address the challenge of spatial heterogeneity, particularly the class ambiguity issues in geographical classification, i.e., samples with the same feature values belong to different classes in different spatial zones. Evaluations on three real world remote sensing datasets confirmed that proposed spatial ensemble learning outperforms current approaches such as bagging, boosting, and mixture of experts when class ambiguity exists

    A geographic knowledge discovery approach to property valuation

    Get PDF
    This thesis involves an investigation of how knowledge discovery can be applied in the area Geographic Information Science. In particular, its application in the area of property valuation in order to reveal how different spatial entities and their interactions affect the price of the properties is explored. This approach is entirely data driven and does not require previous knowledge of the area applied. To demonstrate this process, a prototype system has been designed and implemented. It employs association rule mining and associative classification algorithms to uncover any existing inter-relationships and perform the valuation. Various algorithms that perform the above tasks have been proposed in the literature. The algorithm developed in this work is based on the Apriori algorithm. It has been however, extended with an implementation of a ‘Best Rule’ classification scheme based on the Classification Based on Associations (CBA) algorithm. For the modelling of geographic relationships a graph-theoretic approach has been employed. Graphs have been widely used as modelling tools within the geography domain, primarily for the investigation of network-type systems. In the current context, the graph reflects topological and metric relationships between the spatial entities depicting general spatial arrangements. An efficient graph search algorithm has been developed, based on the Djikstra shortest path algorithm that enables the investigation of relationships between spatial entities beyond first degree connectivity. A case study with data from three central London boroughs has been performed to validate the methodology and algorithms, and demonstrate its effectiveness for computer aided property valuation. In addition, through the case study, the influence of location in the value of properties in those boroughs has been examined. The results are encouraging as they demonstrate the effectiveness of the proposed methodology and algorithms, provided that the data is appropriately pre processed and is of high quality

    Searching and mining in enriched geo-spatial data

    Get PDF
    The emergence of new data collection mechanisms in geo-spatial applications paired with a heightened tendency of users to volunteer information provides an ever-increasing flow of data of high volume, complex nature, and often associated with inherent uncertainty. Such mechanisms include crowdsourcing, automated knowledge inference, tracking, and social media data repositories. Such data bearing additional information from multiple sources like probability distributions, text or numerical attributes, social context, or multimedia content can be called multi-enriched. Searching and mining this abundance of information holds many challenges, if all of the data's potential is to be released. This thesis addresses several major issues arising in that field, namely path queries using multi-enriched data, trend mining in social media data, and handling uncertainty in geo-spatial data. In all cases, the developed methods have made significant contributions and have appeared in or were accepted into various renowned international peer-reviewed venues. A common use of geo-spatial data is path queries in road networks where traditional methods optimise results based on absolute and ofttimes singular metrics, i.e., finding the shortest paths based on distance or the best trade-off between distance and travel time. Integrating additional aspects like qualitative or social data by enriching the data model with knowledge derived from sources as mentioned above allows for queries that can be issued to fit a broader scope of needs or preferences. This thesis presents two implementations of incorporating multi-enriched data into road networks. In one case, a range of qualitative data sources is evaluated to gain knowledge about user preferences which is subsequently matched with locations represented in a road network and integrated into its components. Several methods are presented for highly customisable path queries that incorporate a wide spectrum of data. In a second case, a framework is described for resource distribution with reappearance in road networks to serve one or more clients, resulting in paths that provide maximum gain based on a probabilistic evaluation of available resources. Applications for this include finding parking spots. Social media trends are an emerging research area giving insight in user sentiment and important topics. Such trends consist of bursts of messages concerning a certain topic within a time frame, significantly deviating from the average appearance frequency of the same topic. By investigating the dissemination of such trends in space and time, this thesis presents methods to classify trend archetypes to predict future dissemination of a trend. Processing and querying uncertain data is particularly demanding given the additional knowledge required to yield results with probabilistic guarantees. Since such knowledge is not always available and queries are not easily scaled to larger datasets due to the #P-complete nature of the problem, many existing approaches reduce the data to a deterministic representation of its underlying model to eliminate uncertainty. However, data uncertainty can also provide valuable insight into the nature of the data that cannot be represented in a deterministic manner. This thesis presents techniques for clustering uncertain data as well as query processing, that take the additional information from uncertainty models into account while preserving scalability using a sampling-based approach, while previous approaches could only provide one of the two. The given solutions enable the application of various existing clustering techniques or query types to a framework that manages the uncertainty.Das Erscheinen neuer Methoden zur Datenerhebung in räumlichen Applikationen gepaart mit einer erhöhten Bereitschaft der Nutzer, Daten über sich preiszugeben, generiert einen stetig steigenden Fluss von Daten in großer Menge, komplexer Natur, und oft gepaart mit inhärenter Unsicherheit. Beispiele für solche Mechanismen sind Crowdsourcing, automatisierte Wissensinferenz, Tracking, und Daten aus sozialen Medien. Derartige Daten, angereichert mit mit zusätzlichen Informationen aus verschiedenen Quellen wie Wahrscheinlichkeitsverteilungen, Text- oder numerische Attribute, sozialem Kontext, oder Multimediainhalten, werden als multi-enriched bezeichnet. Suche und Datamining in dieser weiten Datenmenge hält viele Herausforderungen bereit, wenn das gesamte Potenzial der Daten genutzt werden soll. Diese Arbeit geht auf mehrere große Fragestellungen in diesem Feld ein, insbesondere Pfadanfragen in multi-enriched Daten, Trend-mining in Daten aus sozialen Netzwerken, und die Beherrschung von Unsicherheit in räumlichen Daten. In all diesen Fällen haben die entwickelten Methoden signifikante Forschungsbeiträge geleistet und wurden veröffentlicht oder angenommen zu diversen renommierten internationalen, von Experten begutachteten Konferenzen und Journals. Ein gängiges Anwendungsgebiet räumlicher Daten sind Pfadanfragen in Straßennetzwerken, wo traditionelle Methoden die Resultate anhand absoluter und oft auch singulärer Maße optimieren, d.h., der kürzeste Pfad in Bezug auf die Distanz oder der beste Kompromiss zwischen Distanz und Reisezeit. Durch die Integration zusätzlicher Aspekte wie qualitativer Daten oder Daten aus sozialen Netzwerken als Anreicherung des Datenmodells mit aus diesen Quellen abgeleitetem Wissen werden Anfragen möglich, die ein breiteres Spektrum an Anforderungen oder Präferenzen erfüllen. Diese Arbeit präsentiert zwei Ansätze, solche multi-enriched Daten in Straßennetze einzufügen. Zum einen wird eine Reihe qualitativer Datenquellen ausgewertet, um Wissen über Nutzerpräferenzen zu generieren, welches darauf mit Örtlichkeiten im Straßennetz abgeglichen und in das Netz integriert wird. Diverse Methoden werden präsentiert, die stark personalisierbare Pfadanfragen ermöglichen, die ein weites Spektrum an Daten mit einbeziehen. Im zweiten Fall wird ein Framework präsentiert, das eine Ressourcenverteilung im Straßennetzwerk modelliert, bei der einmal verbrauchte Ressourcen erneut auftauchen können. Resultierende Pfade ergeben einen maximalen Ertrag basieren auf einer probabilistischen Evaluation der verfügbaren Ressourcen. Eine Anwendung ist die Suche nach Parkplätzen. Trends in sozialen Medien sind ein entstehendes Forscchungsgebiet, das Einblicke in Benutzerverhalten und wichtige Themen zulässt. Solche Trends bestehen aus großen Mengen an Nachrichten zu einem bestimmten Thema innerhalb eines Zeitfensters, so dass die Auftrittsfrequenz signifikant über den durchschnittlichen Level liegt. Durch die Untersuchung der Fortpflanzung solcher Trends in Raum und Zeit präsentiert diese Arbeit Methoden, um Trends nach Archetypen zu klassifizieren und ihren zukünftigen Weg vorherzusagen. Die Anfragebearbeitung und Datamining in unsicheren Daten ist besonders herausfordernd, insbesondere im Hinblick auf das notwendige Zusatzwissen, um Resultate mit probabilistischen Garantien zu erzielen. Solches Wissen ist nicht immer verfügbar und Anfragen lassen sich aufgrund der \P-Vollständigkeit des Problems nicht ohne Weiteres auf größere Datensätze skalieren. Dennoch kann Datenunsicherheit wertvollen Einblick in die Struktur der Daten liefern, der mit deterministischen Methoden nicht erreichbar wäre. Diese Arbeit präsentiert Techniken zum Clustering unsicherer Daten sowie zur Anfragebearbeitung, die die Zusatzinformation aus dem Unsicherheitsmodell in Betracht ziehen, jedoch gleichzeitig die Skalierbarkeit des Ansatzes auf große Datenmengen sicherstellen

    RouteKG: A knowledge graph-based framework for route prediction on road networks

    Full text link
    Short-term route prediction on road networks allows us to anticipate the future trajectories of road users, enabling a plethora of intelligent transportation applications such as dynamic traffic control or personalized route recommendation. Despite recent advances in this area, existing methods focus primarily on learning sequential transition patterns, neglecting the inherent spatial structural relations in road networks that can affect human routing decisions. To fill this gap, this paper introduces RouteKG, a novel Knowledge Graph-based framework for route prediction. Specifically, we construct a Knowledge Graph on the road network, thereby learning and leveraging spatial relations, especially moving directions, which are crucial for human navigation. Moreover, an n-ary tree-based algorithm is introduced to efficiently generate top-K routes in a batch mode, enhancing scalability and computational efficiency. To further optimize the prediction performance, a rank refinement module is incorporated to fine-tune the candidate route rankings. The model performance is evaluated using two real-world vehicle trajectory datasets from two Chinese cities, Chengdu and Shanghai, under various practical scenarios. The results demonstrate a significant improvement in accuracy over baseline methods.We further validate our model through a case study that utilizes the pre-trained model as a simulator for real-time traffic flow estimation at the link level. The proposed RouteKG promises wide-ranging applications in vehicle navigation, traffic management, and other intelligent transportation tasks

    A Parallel FP-Growth Mining Algorithm with Load Balancing Constraints for Traffic Crash Data

    Get PDF
    Traffic safety is an important part of the roadway in sustainable development. Freeway traffic crashes typically cause serious casualties and property losses, being a serious threat to public safety. Figuring out the potential correlation between various risk factors and revealing their coupling mechanisms are of effective ways to explore and identity freeway crash causes. However, the existing association rule mining algorithms still have some limitations in both efficiency and accuracy. Based on this consideration, using the freeway traffic crash data obtained from WDOT (Washington Department of Transportation), this research constructed a multi-dimensional multilevel system for traffic crash analysis. Considering the load balancing, the FP-Growth (Frequent Pattern- Growth) algorithm was optimized parallelly based on Hadoop platform, to achieve an efficient and accurate association rule mining calculation for massive amounts of traffic crash data; then, according to the results of the coupling mechanism among the crash precursors, the causes of freeway traffic crashes were identified and revealed. The results show that the parallel FPgrowth algorithm with load balancing constraints has a better operating speed than both the conventional FP-growth algorithm and parallel FP-growth algorithm towards processing big data. This improved algorithm makes full use of Hadoop cluster resources and is more suitable for large traffic crash data sets mining while retaining the original advantages of conventional association rule mining algorithm. In addition, the mining association rules model with the improvement of multi-dimensional interaction proposed in this research can catch the occurrence mechanism of freeway traffic crash with serious consequences (lower support degree probably) accurately and efficiently

    A Simple Baseline for Travel Time Estimation using Large-Scale Trip Data

    Full text link
    The increased availability of large-scale trajectory data around the world provides rich information for the study of urban dynamics. For example, New York City Taxi Limousine Commission regularly releases source-destination information about trips in the taxis they regulate. Taxi data provide information about traffic patterns, and thus enable the study of urban flow -- what will traffic between two locations look like at a certain date and time in the future? Existing big data methods try to outdo each other in terms of complexity and algorithmic sophistication. In the spirit of "big data beats algorithms", we present a very simple baseline which outperforms state-of-the-art approaches, including Bing Maps and Baidu Maps (whose APIs permit large scale experimentation). Such a travel time estimation baseline has several important uses, such as navigation (fast travel time estimates can serve as approximate heuristics for A search variants for path finding) and trip planning (which uses operating hours for popular destinations along with travel time estimates to create an itinerary).Comment: 12 page

    Enhancing Road Infrastructure Monitoring: Integrating Drones for Weather-Aware Pothole Detection

    Get PDF
    The abstract outlines the research proposal focused on the utilization of Unmanned Aerial Vehicles (UAVs) for monitoring potholes in road infrastructure affected by various weather conditions. The study aims to investigate how different materials used to fill potholes, such as water, grass, sand, and snow-ice, are impacted by seasonal weather changes, ultimately affecting the performance of pavement structures. By integrating weather-aware monitoring techniques, the research seeks to enhance the rigidity and resilience of road surfaces, thereby contributing to more effective pavement management systems. The proposed methodology involves UAV image-based monitoring combined with advanced super-resolution algorithms to improve image refinement, particularly at high flight altitudes. Through case studies and experimental analysis, the study aims to assess the geometric precision of 3D models generated from aerial images, with a specific focus on road pavement distress monitoring. Overall, the research aims to address the challenges of traditional road failure detection methods by exploring cost-effective 3D detection techniques using UAV technology, thereby ensuring safer roadways for all users
    corecore