Search CORE

24,324 research outputs found

Distributed top-k aggregation queries at large

Author: A. Marian
Gerhard Weikum
H. David
I.F. Ilyas
K. Church
K. Schnaitter
Matthias Bender
N. Bruno
Peter Triantafillou
R. Akbarinia
R. Fagin
Ralf Schenkel
S. Chaudhuri
S. Madden
Sebastian Michel
T. Cormen
Thomas Neumann
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Springer - Publisher Connector

Enlighten

MPG.PuRe

Variable Selection in General Multinomial Logit Models

Author: Pößnecker Wolfgang
Tutz Gerhard
Uhlmann Lorenz
Publication venue
Publication date: 21/06/2012
Field of study

The use of the multinomial logit model is typically restricted to applications with few predictors, because in high-dimensional settings maximum likelihood estimates tend to deteriorate. In this paper we are proposing a sparsity-inducing penalty that accounts for the special structure of multinomial models. In contrast to existing methods, it penalizes the parameters that are linked to one variable in a grouped way and thus yields variable selection instead of parameter selection. We develop a proximal gradient method that is able to efficiently compute stable estimates. In addition, the penalization is extended to the important case of predictors that vary across response categories. We apply our estimator to the modeling of party choice of voters in Germany including voter-specific variables like age and gender but also party-specific features like stance on nuclear energy and immigration

Open Access LMU

Finding groups in data: Cluster analysis with ants

Author: Berger
Bonabeau
Bonabeau
Brito
Brucker
Chu
Deneubourg
Deneubourg
Dorigo
Dubes
Ester
Franks
Ganti
Gibson
Guha
Halkidi
Handl
Hansen
Jain
Karypis
Kaufman
Kennedy
Lee
Lumer
MacQueen
Ng
Oprisan
Rijsbergen
Urszula Boryczka
Welch
Zait
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

Wepresent in this paper a modification of Lumer and Faieta’s algorithm for data clustering. This approach mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on the final clustering by using during the classification differentmetrics of dissimilarity: Euclidean, Cosine, and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant clustering algorithms have received special attention, especially because they still require much investigation to improve performance, stability and other key features that would make such algorithms mature tools for data mining. As a case study, this paper focus on the behavior of clustering procedures in those new approaches. The proposed algorithm and its modifications are evaluated in a number of well-known benchmark datasets. Empirical results clearly show that ant-based clustering algorithms performs well when compared to another techniques

Crossref

Bournemouth University Research Online

Recommended from our members

Finding High-Dimensional D-OptimalDesigns for Logistic Models via Differential Evolution

Author: Tan Kay Chen
Wong Weng Kee
Xu Jianxin
Xu Weinan
Publication venue: eScholarship, University of California
Publication date: 31/01/2020
Field of study

D-optimal designs are frequently used in controlled experiments to obtain the most accurateestimate of model parameters at minimal cost. Finding them can be a challenging task, especially whenthere are many factors in a nonlinear model. As the number of factors becomes large and interact withone another, there are many more variables to optimize and the D-optimal design problem becomes highdimensionaland non-separable. Consequently, premature convergence issues arise. Candidate solutions gettrapped in local optima and the classical gradient-based optimization approaches to search for the D-optimaldesigns rarely succeed. We propose a specially designed version of differential evolution (DE) which is arepresentative gradient-free optimization approach to solve such high-dimensional optimization problems.The proposed specially designed DE uses a new novelty-based mutation strategy to explore the variousregions in the search space. The exploration of the regions will be carried out differently from the previouslyexplored regions and the diversity of the population can be preserved. The proposed novelty-based mutationstrategy is collaborated with two common DE mutation strategies to balance exploration and exploitationat the early or medium stage of the evolution. Additionally, we adapt the control parameters of DE as theevolution proceeds. Using logistic models with several factors on various design spaces as examples, oursimulation results show our algorithm can find D-optimal designs efficiently and the algorithm outperformsits competitors. As an application, we apply our algorithm and re-design a 10-factor car refueling experimentwith discrete and continuous factors and selected pairwise interactions. Our proposed algorithm was able toconsistently outperform the other algorithms and find a more efficient D-optimal design for the problem

eScholarship - University of California