    Uncertain voronoi cell computation based on space decomposition

    LNCS v. 9239 entitled: Advances in Spatial and Temporal Databases: 14th International Symposium, SSTD 2015 ... ProceedingsThe problem of computing Voronoi cells for spatial objects whose locations are not certain has been recently studied. In this work, we propose a new approach to compute Voronoi cells for the case of objects having rectangular uncertainty regions. Since exact computation of Voronoi cells is hard, we propose an approximate solution. The main idea of this solution is to apply hierarchical access methods for both data and object space. Our space index is used to efficiently find spatial regions which must (not) be inside a Voronoi cell. Our object index is used to efficiently identify Delauny relations, i.e., data objects which affect the shape of a Voronoi cell. We develop three algorithms to explore index structures and show that the approach that descends both index structures in parallel yields fast query processing times. Our experiments show that we are able to approximate uncertain Voronoi cells much more effectively than the state-of-the-art, and at the same time, improve run-time performance.postprin

    Non-zero probability of nearest neighbor searching

    Nearest Neighbor (NN) searching is a challenging problem in data management and has been widely studied in data mining, pattern recognition and computational geometry. The goal of NN searching is efficiently reporting the nearest data to a given object as a query. In most of the studies both the data and query are assumed to be precise, however, due to the real applications of NN searching, such as tracking and locating services, GIS and data mining, it is possible both of them are imprecise. So, in this situation, a natural way to handle the issue is to report the data have a nonzero probability —called nonzero nearest neighbor— to be the nearest neighbor of a given query. Formally, let P be a set of n uncertain points modeled by some regions. We first consider the following variation of NN searching problem under uncertainty. If both the query and the data are uncertain points modeled by distinct unit segments parallel to the x-axis, we propose an efficient algorithm that reports nonzero nearest neighbors under Manhattan metric in O(n^2 α(n^2 )) preprocessing and O(log⁥n+k) query time, where α(.) is the extremely slowly growing functional inverse of Ackermann’s function. Finally, for the arbitrarily length segments parallel to the x-axis, we propose an approximation algorithm that reports nonzero nearest neighbor with maximum error L in O(n^2 α(n^2 )) preprocessing and O(log⁥n+k) query time, where L is the length of the query

    Voronoi classfied and clustered constellation data structure for three-dimensional urban buildings

    In the past few years, the growth of urban area has been increasing and has resulted immense number of urban datasets. This situation contributes to the difficulties in handling and managing issues related to urban area. Huge and massive datasets can degrade the performance of data retrieval and information analysis. In addition, urban environments are very difficult to manage because they involved with various types of data, such as multiple types of zoning themes in urban mixeduse development. Thus, a special technique for efficient data handling and management is necessary. In this study, a new three-dimensional (3D) spatial access method, the Voronoi Classified and Clustered Data Constellation (VOR-CCDC) is introduced. The VOR-CCDC data structure operates on the basis of two filters, classification and clustering. To boost up the performance of data retrieval, VORCCDC offers a minimal percentage of overlap among nodes and a minimal coverage area in order to avoid repetitive data entry and multi-path queries. Besides that, VOR-CCDC data structure is supplemented with an extra element of nearest neighbour information. Encoded neighbouring information in the Voronoi diagram allows VOR-CCDC to optimally explore the data. There are three types of nearest neighbour queries that are presented in this study to verify the VOR-CCDC’s ability in finding the nearest neighbour information. The queries are Single Search Nearest Neighbour query, k Nearest Neighbour (kNN) query and Reverse k Nearest Neighbour (RkNN) query. Each query is tested with two types of 3D datasets; single layer and multi-layer. The test demonstrated that VOR-CCDC performs the least amount of input/output than their best competitor, the 3D R-Tree. Besides that, VOR-CCDC is also tested for performance evaluation. The results indicate that VOR-CCDC outperforms its competitor by responding 60 to 80 percent faster to the query operation. In the future, VOR-CCDC structure is expected to be expanded for temporal and dynamic objects. Besides that, VOR-CCDC structure can also be used in other applications such as brain cell database for analysing the spatial arrangement of neurons or analysing the protein chain reaction in bioinformatics applications

    Voronoi-based nearest neighbor search for multi-dimensional uncertain databases

    ï»żIn Voronoi-based nearest neighbor search, the Voronoi cell of every point p in a database can be used to check whether p is the closest to some query point q. We extend the notion of Voronoi cells to support uncertain objects, whose attribute values are inexact. Particularly, we propose the Possible Voronoi cell (or PV-cell). A PV-cell of a multi-dimensional uncertain object o is a region R, such that for any point p ∈ R, o may be the nearest neighbor of p. If the PV-cells of all objects in a database S are known, they can be used to identify objects that have a chance to be the nearest neighbor of q. However, there is no efficient algorithm for computing an exact PV-cell. We hence study how to derive an axis-parallel hyper-rectangle (called the Uncertain Bounding Rectangle, or UBR) that tightly contains a PV-cell. We further develop the PV-index, a structure that stores UBRs, to evaluate probabilistic nearest neighbor queries over uncertain data. An advantage of the PV-index is that upon updates on S, it can be incrementally updated. Extensive experiments on both synthetic and real datasets are carried out to validate the performance of the PV-index.published_or_final_versionComputer ScienceMasterMaster of Philosoph

    Voronoi-based nearest neighbor search for multi-dimensional uncertain databases

    Processing Incomplete k Nearest Neighbor Search

    Searching and mining in enriched geo-spatial data

    The emergence of new data collection mechanisms in geo-spatial applications paired with a heightened tendency of users to volunteer information provides an ever-increasing flow of data of high volume, complex nature, and often associated with inherent uncertainty. Such mechanisms include crowdsourcing, automated knowledge inference, tracking, and social media data repositories. Such data bearing additional information from multiple sources like probability distributions, text or numerical attributes, social context, or multimedia content can be called multi-enriched. Searching and mining this abundance of information holds many challenges, if all of the data's potential is to be released. This thesis addresses several major issues arising in that field, namely path queries using multi-enriched data, trend mining in social media data, and handling uncertainty in geo-spatial data. In all cases, the developed methods have made significant contributions and have appeared in or were accepted into various renowned international peer-reviewed venues. A common use of geo-spatial data is path queries in road networks where traditional methods optimise results based on absolute and ofttimes singular metrics, i.e., finding the shortest paths based on distance or the best trade-off between distance and travel time. Integrating additional aspects like qualitative or social data by enriching the data model with knowledge derived from sources as mentioned above allows for queries that can be issued to fit a broader scope of needs or preferences. This thesis presents two implementations of incorporating multi-enriched data into road networks. In one case, a range of qualitative data sources is evaluated to gain knowledge about user preferences which is subsequently matched with locations represented in a road network and integrated into its components. Several methods are presented for highly customisable path queries that incorporate a wide spectrum of data. In a second case, a framework is described for resource distribution with reappearance in road networks to serve one or more clients, resulting in paths that provide maximum gain based on a probabilistic evaluation of available resources. Applications for this include finding parking spots. Social media trends are an emerging research area giving insight in user sentiment and important topics. Such trends consist of bursts of messages concerning a certain topic within a time frame, significantly deviating from the average appearance frequency of the same topic. By investigating the dissemination of such trends in space and time, this thesis presents methods to classify trend archetypes to predict future dissemination of a trend. Processing and querying uncertain data is particularly demanding given the additional knowledge required to yield results with probabilistic guarantees. Since such knowledge is not always available and queries are not easily scaled to larger datasets due to the #P-complete nature of the problem, many existing approaches reduce the data to a deterministic representation of its underlying model to eliminate uncertainty. However, data uncertainty can also provide valuable insight into the nature of the data that cannot be represented in a deterministic manner. This thesis presents techniques for clustering uncertain data as well as query processing, that take the additional information from uncertainty models into account while preserving scalability using a sampling-based approach, while previous approaches could only provide one of the two. The given solutions enable the application of various existing clustering techniques or query types to a framework that manages the uncertainty.Das Erscheinen neuer Methoden zur Datenerhebung in rĂ€umlichen Applikationen gepaart mit einer erhöhten Bereitschaft der Nutzer, Daten ĂŒber sich preiszugeben, generiert einen stetig steigenden Fluss von Daten in großer Menge, komplexer Natur, und oft gepaart mit inhĂ€renter Unsicherheit. Beispiele fĂŒr solche Mechanismen sind Crowdsourcing, automatisierte Wissensinferenz, Tracking, und Daten aus sozialen Medien. Derartige Daten, angereichert mit mit zusĂ€tzlichen Informationen aus verschiedenen Quellen wie Wahrscheinlichkeitsverteilungen, Text- oder numerische Attribute, sozialem Kontext, oder Multimediainhalten, werden als multi-enriched bezeichnet. Suche und Datamining in dieser weiten Datenmenge hĂ€lt viele Herausforderungen bereit, wenn das gesamte Potenzial der Daten genutzt werden soll. Diese Arbeit geht auf mehrere große Fragestellungen in diesem Feld ein, insbesondere Pfadanfragen in multi-enriched Daten, Trend-mining in Daten aus sozialen Netzwerken, und die Beherrschung von Unsicherheit in rĂ€umlichen Daten. In all diesen FĂ€llen haben die entwickelten Methoden signifikante ForschungsbeitrĂ€ge geleistet und wurden veröffentlicht oder angenommen zu diversen renommierten internationalen, von Experten begutachteten Konferenzen und Journals. Ein gĂ€ngiges Anwendungsgebiet rĂ€umlicher Daten sind Pfadanfragen in Straßennetzwerken, wo traditionelle Methoden die Resultate anhand absoluter und oft auch singulĂ€rer Maße optimieren, d.h., der kĂŒrzeste Pfad in Bezug auf die Distanz oder der beste Kompromiss zwischen Distanz und Reisezeit. Durch die Integration zusĂ€tzlicher Aspekte wie qualitativer Daten oder Daten aus sozialen Netzwerken als Anreicherung des Datenmodells mit aus diesen Quellen abgeleitetem Wissen werden Anfragen möglich, die ein breiteres Spektrum an Anforderungen oder PrĂ€ferenzen erfĂŒllen. Diese Arbeit prĂ€sentiert zwei AnsĂ€tze, solche multi-enriched Daten in Straßennetze einzufĂŒgen. Zum einen wird eine Reihe qualitativer Datenquellen ausgewertet, um Wissen ĂŒber NutzerprĂ€ferenzen zu generieren, welches darauf mit Örtlichkeiten im Straßennetz abgeglichen und in das Netz integriert wird. Diverse Methoden werden prĂ€sentiert, die stark personalisierbare Pfadanfragen ermöglichen, die ein weites Spektrum an Daten mit einbeziehen. Im zweiten Fall wird ein Framework prĂ€sentiert, das eine Ressourcenverteilung im Straßennetzwerk modelliert, bei der einmal verbrauchte Ressourcen erneut auftauchen können. Resultierende Pfade ergeben einen maximalen Ertrag basieren auf einer probabilistischen Evaluation der verfĂŒgbaren Ressourcen. Eine Anwendung ist die Suche nach ParkplĂ€tzen. Trends in sozialen Medien sind ein entstehendes Forscchungsgebiet, das Einblicke in Benutzerverhalten und wichtige Themen zulĂ€sst. Solche Trends bestehen aus großen Mengen an Nachrichten zu einem bestimmten Thema innerhalb eines Zeitfensters, so dass die Auftrittsfrequenz signifikant ĂŒber den durchschnittlichen Level liegt. Durch die Untersuchung der Fortpflanzung solcher Trends in Raum und Zeit prĂ€sentiert diese Arbeit Methoden, um Trends nach Archetypen zu klassifizieren und ihren zukĂŒnftigen Weg vorherzusagen. Die Anfragebearbeitung und Datamining in unsicheren Daten ist besonders herausfordernd, insbesondere im Hinblick auf das notwendige Zusatzwissen, um Resultate mit probabilistischen Garantien zu erzielen. Solches Wissen ist nicht immer verfĂŒgbar und Anfragen lassen sich aufgrund der \P-VollstĂ€ndigkeit des Problems nicht ohne Weiteres auf grĂ¶ĂŸere DatensĂ€tze skalieren. Dennoch kann Datenunsicherheit wertvollen Einblick in die Struktur der Daten liefern, der mit deterministischen Methoden nicht erreichbar wĂ€re. Diese Arbeit prĂ€sentiert Techniken zum Clustering unsicherer Daten sowie zur Anfragebearbeitung, die die Zusatzinformation aus dem Unsicherheitsmodell in Betracht ziehen, jedoch gleichzeitig die Skalierbarkeit des Ansatzes auf große Datenmengen sicherstellen

    Coping with distance and location dependencies in spatial, temporal and uncertain data

