78 research outputs found

    Enhancing In-Memory Spatial Indexing with Learned Search

    Get PDF
    Spatial data is ubiquitous. Massive amounts of data are generated every day from a plethora of sources such as billions of GPS-enableddevices (e.g., cell phones, cars, and sensors), consumer-based applications (e.g., Uber and Strava), and social media platforms (e.g.,location-tagged posts on Facebook, Twitter, and Instagram). This exponential growth in spatial data has led the research communityto build systems and applications for efficient spatial data processing.In this study, we apply a recently developed machine-learned search technique for single-dimensional sorted data to spatial indexing.Specifically, we partition spatial data using six traditional spatial partitioning techniques and employ machine-learned search withineach partition to support point, range, distance, and spatial join queries. Adhering to the latest research trends, we tune the partitioningtechniques to be instance-optimized. By tuning each partitioning technique for optimal performance, we demonstrate that: (i) grid-basedindex structures outperform tree-based index structures (from 1.23× to 2.47×), (ii) learning-enhanced variants of commonly used spatialindex structures outperform their original counterparts (from 1.44× to 53.34× faster), (iii) machine-learned search within a partitionis faster than binary search by 11.79% - 39.51% when filtering on one dimension, (iv) the benefit of machine-learned search diminishesin the presence of other compute-intensive operations (e.g. scan costs in higher selectivity queries, Haversine distance computation, andpoint-in-polygon tests), and (v) index lookup is the bottleneck for tree-based structures, which could potentially be reduced by linearizingthe indexed partitions.Additional Key Words and Phrases: spatial data, indexing, machine-learning, spatial queries, geospatia

    Mutual information based feature subset selection in multivariate time series classification

    Get PDF
    This paper deals with supervised classification of multivariate time se- ries. In particular, the goal is to propose a filter method to select a subset of time series. Consequently, we adopt the framework proposed by Brown et al. [10]. The key point in this framework is the computation of the mutual information between the features, which allows us to measure the relevance of each feature subset. In our case, where the features are a time series, we use an adaptation of existing nonparametric mutual infor- mation estimators based on the k-nearest neighbor. Specifically, for the purpose of bringing these methods to the time series scenario, we rely on the use of dynamic time warping dissimilarity. Our experimental results show that our method is able to strongly reduce the number of time series while keeping or increasing the classification accuracy.Grant agreement no. KK-2019/00095 IT1244-19 TIN2016-78365-R PID2019-104966GB-I0

    Efficient Sampling Algorithms for Approximate Motif Counting in Temporal Graph Streams

    Full text link
    A great variety of complex systems, from user interactions in communication networks to transactions in financial markets, can be modeled as temporal graphs consisting of a set of vertices and a series of timestamped and directed edges. Temporal motifs are generalized from subgraph patterns in static graphs which consider edge orderings and durations in addition to topologies. Counting the number of occurrences of temporal motifs is a fundamental problem for temporal network analysis. However, existing methods either cannot support temporal motifs or suffer from performance issues. Moreover, they cannot work in the streaming model where edges are observed incrementally over time. In this paper, we focus on approximate temporal motif counting via random sampling. We first propose two sampling algorithms for temporal motif counting in the offline setting. The first is an edge sampling (ES) algorithm for estimating the number of instances of any temporal motif. The second is an improved edge-wedge sampling (EWS) algorithm that hybridizes edge sampling with wedge sampling for counting temporal motifs with 33 vertices and 33 edges. Furthermore, we propose two algorithms to count temporal motifs incrementally in temporal graph streams by extending the ES and EWS algorithms referred to as SES and SEWS. We provide comprehensive analyses of the theoretical bounds and complexities of our proposed algorithms. Finally, we perform extensive experimental evaluations of our proposed algorithms on several real-world temporal graphs. The results show that ES and EWS have higher efficiency, better accuracy, and greater scalability than state-of-the-art sampling methods for temporal motif counting in the offline setting. Moreover, SES and SEWS achieve up to three orders of magnitude speedups over ES and EWS while having comparable estimation errors for temporal motif counting in the streaming setting.Comment: 27 pages, 11 figures; overlapped with arXiv:2007.1402

    Effectively Counting s-t Simple Paths in Directed Graphs

    Full text link
    An important tool in analyzing complex social and information networks is s-t simple path counting, which is known to be #P-complete. In this paper, we study efficient s-t simple path counting in directed graphs. For a given pair of vertices s and t in a directed graph, first we propose a pruning technique that can efficiently and considerably reduce the search space. Then, we discuss how this technique can be adjusted with exact and approximate algorithms, to improve their efficiency. In the end, by performing extensive experiments over several networks from different domains, we show high empirical efficiency of our proposed technique. Our algorithm is not a competitor of existing methods, rather, it is a friend that can be used as a fast pre-processing step, before applying any existing algorithm

    Exploring Task-agnostic, ShapeNet-based Object Recognition for Mobile Robots

    Get PDF
    This position paper presents an attempt to improve the scalability of existing object recognition methods, which largely rely on supervision and imply a huge availability of manually-labelled data points. Moreover, in the context of mobile robotics, data sets and experimental settings are often handcrafted based on the specific task the object recognition is aimed at, e.g. object grasping. In this work, we argue instead that publicly available open data such as ShapeNet can be used for object classification first, and then to link objects to their related concepts, leading to task-agnostic knowledge acquisition practices. To this aim, we evaluated five pipelines for object recognition, where target classes were all entities collected from ShapeNet and matching was based on: (i) shape-only features, (ii) RGB histogram comparison, (iii) a combination of shape and colour matching, (iv) image feature descriptors, and (v) inexact, normalised cross-correlation, resembling the Deep, Siamese-like NN architecture of Submariam et al. (2016). We discussed the relative impact of shape-derived and colour-derived features, as well as suitability of all tested solutions for future application to real-life use cases

    Extracting Contextualized Quantity Facts from Web Tables

    Get PDF
    • …
    corecore