19,359 research outputs found

    Efficient Processing of k Nearest Neighbor Joins using MapReduce

    Full text link
    k nearest neighbor join (kNN join), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operation widely adopted by many data mining applications. As a combination of the k nearest neighbor query and the join operation, kNN join is an expensive operation. Given the increasing volume of data, it is difficult to perform a kNN join on a centralized machine efficiently. In this paper, we investigate how to perform kNN join using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers. In brief, the mappers cluster objects into groups; the reducers perform the kNN join on each group of objects separately. We design an effective mapping mechanism that exploits pruning rules for distance filtering, and hence reduces both the shuffling and computational costs. To reduce the shuffling cost, we propose two approximate algorithms to minimize the number of replicas. Extensive experiments on our in-house cluster demonstrate that our proposed methods are efficient, robust and scalable.Comment: VLDB201

    Potential Gains from Mergers in Local Public Transport: An Efficiency Analysis Applied to Germany

    Get PDF
    We analyze potential gains from hypothetical mergers in local public transport using the non-parametric Data Envelopment Analysis with bias corrections by means of bootstrapping. Our sample consists of 41 public transport companies from Germany's most densely populated region, North Rhine-Westphalia. We merge them into geographically meaningful, larger units that operate partially on a joint tram network. Merger gains are then decomposed into individual technical efficiency, synergy and size effects following the methodology of Bogetoft and Wang [Bogetoft, P., Wang, D., 2005. Estimating the Potential Gains from Mergers. Journal of Productivity Analysis, 23(2), 145-171]. Our empirical findings suggest that substantial gains up to 16 percent of factor inputs are present, mainly resulting from synergy effects.Merger, Public Transport, Efficiency, Data Envelopment Analysis

    From Finite Automata to Regular Expressions and Back--A Summary on Descriptional Complexity

    Full text link
    The equivalence of finite automata and regular expressions dates back to the seminal paper of Kleene on events in nerve nets and finite automata from 1956. In the present paper we tour a fragment of the literature and summarize results on upper and lower bounds on the conversion of finite automata to regular expressions and vice versa. We also briefly recall the known bounds for the removal of spontaneous transitions (epsilon-transitions) on non-epsilon-free nondeterministic devices. Moreover, we report on recent results on the average case descriptional complexity bounds for the conversion of regular expressions to finite automata and brand new developments on the state elimination algorithm that converts finite automata to regular expressions.Comment: In Proceedings AFL 2014, arXiv:1405.527
    corecore