78 research outputs found
Enhancing In-Memory Spatial Indexing with Learned Search
Spatial data is ubiquitous. Massive amounts of data are generated every day from a plethora of sources such as billions of GPS-enableddevices (e.g., cell phones, cars, and sensors), consumer-based applications (e.g., Uber and Strava), and social media platforms (e.g.,location-tagged posts on Facebook, Twitter, and Instagram). This exponential growth in spatial data has led the research communityto build systems and applications for efficient spatial data processing.In this study, we apply a recently developed machine-learned search technique for single-dimensional sorted data to spatial indexing.Specifically, we partition spatial data using six traditional spatial partitioning techniques and employ machine-learned search withineach partition to support point, range, distance, and spatial join queries. Adhering to the latest research trends, we tune the partitioningtechniques to be instance-optimized. By tuning each partitioning technique for optimal performance, we demonstrate that: (i) grid-basedindex structures outperform tree-based index structures (from 1.23× to 2.47×), (ii) learning-enhanced variants of commonly used spatialindex structures outperform their original counterparts (from 1.44× to 53.34× faster), (iii) machine-learned search within a partitionis faster than binary search by 11.79% - 39.51% when filtering on one dimension, (iv) the benefit of machine-learned search diminishesin the presence of other compute-intensive operations (e.g. scan costs in higher selectivity queries, Haversine distance computation, andpoint-in-polygon tests), and (v) index lookup is the bottleneck for tree-based structures, which could potentially be reduced by linearizingthe indexed partitions.Additional Key Words and Phrases: spatial data, indexing, machine-learning, spatial queries, geospatia
Mutual information based feature subset selection in multivariate time series classification
This paper deals with supervised classification of multivariate time se- ries. In particular, the goal is to propose a filter method to select a subset of time series. Consequently, we adopt the framework proposed by Brown et al. [10]. The key point in this framework is the computation of the mutual information between the features, which allows us to measure the relevance of each feature subset. In our case, where the features are a time series, we use an adaptation of existing nonparametric mutual infor- mation estimators based on the k-nearest neighbor. Specifically, for the purpose of bringing these methods to the time series scenario, we rely on the use of dynamic time warping dissimilarity. Our experimental results show that our method is able to strongly reduce the number of time series while keeping or increasing the classification accuracy.Grant agreement no. KK-2019/00095
IT1244-19
TIN2016-78365-R
PID2019-104966GB-I0
Efficient Sampling Algorithms for Approximate Motif Counting in Temporal Graph Streams
A great variety of complex systems, from user interactions in communication
networks to transactions in financial markets, can be modeled as temporal
graphs consisting of a set of vertices and a series of timestamped and directed
edges. Temporal motifs are generalized from subgraph patterns in static graphs
which consider edge orderings and durations in addition to topologies. Counting
the number of occurrences of temporal motifs is a fundamental problem for
temporal network analysis. However, existing methods either cannot support
temporal motifs or suffer from performance issues. Moreover, they cannot work
in the streaming model where edges are observed incrementally over time. In
this paper, we focus on approximate temporal motif counting via random
sampling. We first propose two sampling algorithms for temporal motif counting
in the offline setting. The first is an edge sampling (ES) algorithm for
estimating the number of instances of any temporal motif. The second is an
improved edge-wedge sampling (EWS) algorithm that hybridizes edge sampling with
wedge sampling for counting temporal motifs with vertices and edges.
Furthermore, we propose two algorithms to count temporal motifs incrementally
in temporal graph streams by extending the ES and EWS algorithms referred to as
SES and SEWS. We provide comprehensive analyses of the theoretical bounds and
complexities of our proposed algorithms. Finally, we perform extensive
experimental evaluations of our proposed algorithms on several real-world
temporal graphs. The results show that ES and EWS have higher efficiency,
better accuracy, and greater scalability than state-of-the-art sampling methods
for temporal motif counting in the offline setting. Moreover, SES and SEWS
achieve up to three orders of magnitude speedups over ES and EWS while having
comparable estimation errors for temporal motif counting in the streaming
setting.Comment: 27 pages, 11 figures; overlapped with arXiv:2007.1402
Effectively Counting s-t Simple Paths in Directed Graphs
An important tool in analyzing complex social and information networks is s-t
simple path counting, which is known to be #P-complete. In this paper, we study
efficient s-t simple path counting in directed graphs. For a given pair of
vertices s and t in a directed graph, first we propose a pruning technique that
can efficiently and considerably reduce the search space. Then, we discuss how
this technique can be adjusted with exact and approximate algorithms, to
improve their efficiency. In the end, by performing extensive experiments over
several networks from different domains, we show high empirical efficiency of
our proposed technique. Our algorithm is not a competitor of existing methods,
rather, it is a friend that can be used as a fast pre-processing step, before
applying any existing algorithm
Exploring Task-agnostic, ShapeNet-based Object Recognition for Mobile Robots
This position paper presents an attempt to improve the scalability of existing object recognition methods, which largely rely on supervision and imply a huge availability of manually-labelled data points. Moreover, in the context of mobile robotics, data sets and experimental settings are often handcrafted based on the specific task the object recognition is aimed at, e.g. object grasping. In this work, we argue instead that publicly available open data such as ShapeNet can be used for object classification first, and then to link objects to their related concepts, leading to task-agnostic knowledge acquisition practices. To this aim, we evaluated five pipelines for object recognition, where target classes were all entities collected from ShapeNet and matching was based on: (i) shape-only features, (ii) RGB histogram comparison, (iii) a combination of shape and colour matching, (iv) image feature descriptors, and (v) inexact, normalised cross-correlation, resembling the Deep, Siamese-like NN architecture of Submariam et al. (2016). We discussed the relative impact of shape-derived and colour-derived features, as well as suitability of all tested solutions for future application to real-life use cases
- …