419 research outputs found

    Reverse k Nearest Neighbor Search over Trajectories

    Full text link
    GPS enables mobile devices to continuously provide new opportunities to improve our daily lives. For example, the data collected in applications created by Uber or Public Transport Authorities can be used to plan transportation routes, estimate capacities, and proactively identify low coverage areas. In this paper, we study a new kind of query-Reverse k Nearest Neighbor Search over Trajectories (RkNNT), which can be used for route planning and capacity estimation. Given a set of existing routes DR, a set of passenger transitions DT, and a query route Q, a RkNNT query returns all transitions that take Q as one of its k nearest travel routes. To solve the problem, we first develop an index to handle dynamic trajectory updates, so that the most up-to-date transition data are available for answering a RkNNT query. Then we introduce a filter refinement framework for processing RkNNT queries using the proposed indexes. Next, we show how to use RkNNT to solve the optimal route planning problem MaxRkNNT (MinRkNNT), which is to search for the optimal route from a start location to an end location that could attract the maximum (or minimum) number of passengers based on a pre-defined travel distance threshold. Experiments on real datasets demonstrate the efficiency and scalability of our approaches. To the best of our best knowledge, this is the first work to study the RkNNT problem for route planning.Comment: 12 page

    Towards a Scalable Dynamic Spatial Database System

    Get PDF
    With the rise of GPS-enabled smartphones and other similar mobile devices, massive amounts of location data are available. However, no scalable solutions for soft real-time spatial queries on large sets of moving objects have yet emerged. In this paper we explore and measure the limits of actual algorithms and implementations regarding different application scenarios. And finally we propose a novel distributed architecture to solve the scalability issues.Comment: (2012

    Continuous Spatial Query Processing:A Survey of Safe Region Based Techniques

    Get PDF
    In the past decade, positioning system-enabled devices such as smartphones have become most prevalent. This functionality brings the increasing popularity of location-based services in business as well as daily applications such as navigation, targeted advertising, and location-based social networking. Continuous spatial queries serve as a building block for location-based services. As an example, an Uber driver may want to be kept aware of the nearest customers or service stations. Continuous spatial queries require updates to the query result as the query or data objects are moving. This poses challenges to the query efficiency, which is crucial to the user experience of a service. A large number of approaches address this efficiency issue using the concept of safe region . A safe region is a region within which arbitrary movement of an object leaves the query result unchanged. Such a region helps reduce the frequency of query result update and hence improves query efficiency. As a result, safe region-based approaches have been popular for processing various types of continuous spatial queries. Safe regions have interesting theoretical properties and are worth in-depth analysis. We provide a comparative study of safe region-based approaches. We describe how safe regions are computed for different types of continuous spatial queries, showing how they improve query efficiency. We compare the different safe region-based approaches and discuss possible further improvements

    Improving Performance of Spatial Network Queries

    Get PDF
    Spatial network queries, for example KNN or range, operate on systems where objects are constrained to locations on a network. Current spatial network query algorithms rely on forms of network traversal which have a high complexity proportional to the size of the network making, them poor for large real-world networks. In this thesis, an alternative method of approximating the results of spatial network queries with a high level of accuracy is introduced. Distances between network points are stored in an M-Tree index, a balanced tree index where metric distance determines data ordering. The M-Tree uses the chessboard metric on network points embedded in a higher dimensional space using tRNE. Using the M-Tree both KNN and range queries are computed more efficiently than network traversal. Error rates of the M-Tree are low, with accuracies of 97% possible on KNN queries and perfect accuracy with 2% extra results on range queries

    At-Most-Hexa Meshes

    Get PDF
    AbstractVolumetric polyhedral meshes are required in many applications, especially for solving partial differential equations on finite element simulations. Still, their construction bears several additional challenges compared to boundary‐based representations. Tetrahedral meshes and (pure) hex‐meshes are two popular formats in scenarios like CAD applications, offering opposite advantages and disadvantages. Hex‐meshes are more intricate to construct due to the global structure of the meshing, but feature much better regularity, alignment, are more expressive, and offer the same simulation accuracy with fewer elements. Hex‐dominant meshes, where most but not all cell elements have a hexahedral structure, constitute an attractive compromise, potentially unlocking benefits from both structures, but their generality makes their employment in downstream applications difficult. In this work, we introduce a strict subset of general hex‐dominant meshes, which we term 'at‐most‐hexa meshes', in which most cells are still hexahedral, but no cell has more than six boundary faces, and no face has more than four sides. We exemplify the ease of construction of at‐most‐hexa meshes by proposing a frugal and straightforward method to generate high‐quality meshes of this kind, starting directly from hulls or point clouds, for example, from a 3D scan. In contrast to existing methods for (pure) hexahedral meshing, ours does not require an intermediate parameterization of other costly pre‐computations and can start directly from surfaces or samples. We leverage a Lloyd relaxation process to exploit the synergistic effects of aligning an orientation field in a modified 3D Voronoi diagram using the norm for cubical cells. The extracted geometry incorporates regularity as well as feature alignment, following sharp edges and curved boundary surfaces. We introduce specialized operations on the three‐dimensional graph structure to enforce consistency during the relaxation. The resulting algorithm allows for an efficient evaluation with parallel algorithms on GPU hardware and completes even large reconstructions within minutes

    A Systematic Survey of Classification Algorithms for Cancer Detection

    Get PDF
    Cancer is a fatal disease induced by the occurrence of a count of inherited issues and also a count of pathological changes. Malignant cells are dangerous abnormal areas that could develop in any part of the human body, posing a life-threatening threat. To establish what treatment options are available, cancer, also referred as a tumor, should be detected early and precisely. The classification of images for cancer diagnosis is a complex mechanism that is influenced by a diverse of parameters. In recent years, artificial vision frameworks have focused attention on the classification of images as a key problem. Most people currently rely on hand-made features to demonstrate an image in a specific manner. Learning classifiers such as random forest and decision tree were used to determine a final judgment. When there are a vast number of images to consider, the difficulty occurs. Hence, in this paper, weanalyze, review, categorize, and discuss current breakthroughs in cancer detection utilizing machine learning techniques for image recognition and classification. We have reviewed the machine learning approaches like logistic regression (LR), Naïve Bayes (NB), K-nearest neighbors (KNN), decision tree (DT), and Support Vector Machines (SVM)

    Outlier Detection In Big Data

    Get PDF
    The dissertation focuses on scaling outlier detection to work both on huge static as well as on dynamic streaming datasets. Outliers are patterns in the data that do not conform to the expected behavior. Outlier detection techniques are broadly applied in applications ranging from credit fraud prevention, network intrusion detection to stock investment tactical planning. For such mission critical applications, a timely response often is of paramount importance. Yet the processing of outlier detection requests is of high algorithmic complexity and resource consuming. In this dissertation we investigate the challenges of detecting outliers in big data -- in particular caused by the high velocity of streaming data, the big volume of static data and the large cardinality of the input parameter space for tuning outlier mining algorithms. Effective optimization techniques are proposed to assure the responsiveness of outlier detection in big data. In this dissertation we first propose a novel optimization framework called LEAP to continuously detect outliers over data streams. The continuous discovery of outliers is critical for a large range of online applications that monitor high volume continuously evolving streaming data. LEAP encompasses two general optimization principles that utilize the rarity of the outliers and the temporal priority relationships among stream data points. Leveraging these two principles LEAP not only is able to continuously deliver outliers with respect to a set of popular outlier models, but also provides near real-time support for processing powerful outlier analytics workloads composed of large numbers of outlier mining requests with various parameter settings. Second, we develop a distributed approach to efficiently detect outliers over massive-scale static data sets. In this big data era, as the volume of the data advances to new levels, the power of distributed compute clusters must be employed to detect outliers in a short turnaround time. In this research, our approach optimizes key factors determining the efficiency of distributed data analytics, namely, communication costs and load balancing. In particular we prove the traditional frequency-based load balancing assumption is not effective. We thus design a novel cost-driven data partitioning strategy that achieves load balancing. Furthermore, we abandon the traditional one detection algorithm for all compute nodes approach and instead propose a novel multi-tactic methodology which adaptively selects the most appropriate algorithm for each node based on the characteristics of the data partition assigned to it. Third, traditional outlier detection systems process each individual outlier detection request instantiated with a particular parameter setting one at a time. This is not only prohibitively time-consuming for large datasets, but also tedious for analysts as they explore the data to hone in on the most appropriate parameter setting or on the desired results. We thus design an interactive outlier exploration paradigm that is not only able to answer traditional outlier detection requests in near real-time, but also offers innovative outlier analytics tools to assist analysts to quickly extract, interpret and understand the outliers of interest. Our experimental studies including performance evaluation and user studies conducted on real world datasets including stock, sensor, moving object, and Geolocation datasets confirm both the effectiveness and efficiency of the proposed approaches
    corecore