Search CORE

19,359 research outputs found

Efficient Processing of k Nearest Neighbor Joins using MapReduce

Author: Chen Su
Lu Wei
Ooi Beng Chin
Shen Yanyan
Publication venue
Publication date: 01/01/2012
Field of study

k nearest neighbor join (kNN join), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operation widely adopted by many data mining applications. As a combination of the k nearest neighbor query and the join operation, kNN join is an expensive operation. Given the increasing volume of data, it is difficult to perform a kNN join on a centralized machine efficiently. In this paper, we investigate how to perform kNN join using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers. In brief, the mappers cluster objects into groups; the reducers perform the kNN join on each group of objects separately. We design an effective mapping mechanism that exploits pruning rules for distance filtering, and hence reduces both the shuffling and computational costs. To reduce the shuffling cost, we propose two approximate algorithms to minimize the number of replicas. Extensive experiments on our in-house cluster demonstrate that our proposed methods are efficient, robust and scalable.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

ScholarBank@NUS

Potential Gains from Mergers in Local Public Transport: An Efficiency Analysis Applied to Germany

Author: Astrid Cullmann
Matthias Walter
Publication venue
Publication date
Field of study

We analyze potential gains from hypothetical mergers in local public transport using the non-parametric Data Envelopment Analysis with bias corrections by means of bootstrapping. Our sample consists of 41 public transport companies from Germany's most densely populated region, North Rhine-Westphalia. We merge them into geographically meaningful, larger units that operate partially on a joint tram network. Merger gains are then decomposed into individual technical efficiency, synergy and size effects following the methodology of Bogetoft and Wang [Bogetoft, P., Wang, D., 2005. Estimating the Potential Gains from Mergers. Journal of Productivity Analysis, 23(2), 145-171]. Our empirical findings suggest that substantial gains up to 16 percent of factor inputs are present, mainly resulting from synergy effects.Merger, Public Transport, Efficiency, Data Envelopment Analysis

Research Papers in Economics

From Finite Automata to Regular Expressions and Back--A Summary on Descriptional Complexity

Author: Gruber Hermann
Holzer Markus
Publication venue: 'Open Publishing Association'
Publication date: 01/05/2014
Field of study

The equivalence of finite automata and regular expressions dates back to the seminal paper of Kleene on events in nerve nets and finite automata from 1956. In the present paper we tour a fragment of the literature and summarize results on upper and lower bounds on the conversion of finite automata to regular expressions and vice versa. We also briefly recall the known bounds for the removal of spontaneous transitions (epsilon-transitions) on non-epsilon-free nondeterministic devices. Moreover, we report on recent results on the average case descriptional complexity bounds for the conversion of regular expressions to finite automata and brand new developments on the state elimination algorithm that converts finite automata to regular expressions.Comment: In Proceedings AFL 2014, arXiv:1405.527

arXiv.org e-Print Archive

Directory of Open Access Journals