Search CORE

36,503 research outputs found

Incorporating Road Networks into Territory Design

Author: Ahuja Nitin
Bender Matthias
Sanders Peter
Schulz Christian
Wagner Andreas
Publication venue
Publication date: 05/05/2015
Field of study

Given a set of basic areas, the territory design problem asks to create a predefined number of territories, each containing at least one basic area, such that an objective function is optimized. Desired properties of territories often include a reasonable balance, compact form, contiguity and small average journey times which are usually encoded in the objective function or formulated as constraints. We address the territory design problem by developing graph theoretic models that also consider the underlying road network. The derived graph models enable us to tackle the territory design problem by modifying graph partitioning algorithms and mixed integer programming formulations so that the objective of the planning problem is taken into account. We test and compare the algorithms on several real world instances

arXiv.org e-Print Archive

Crossref

Efficient regularized isotonic regression with application to gene--gene interaction search

Author: Luss Ronny
Rosset Saharon
Shahar Moni
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 20/03/2012
Field of study

Isotonic regression is a nonparametric approach for fitting monotonic models to data that has been widely studied from both theoretical and practical perspectives. However, this approach encounters computational and statistical overfitting issues in higher dimensions. To address both concerns, we present an algorithm, which we term Isotonic Recursive Partitioning (IRP), for isotonic regression based on recursively partitioning the covariate space through solution of progressively smaller "best cut" subproblems. This creates a regularized sequence of isotonic models of increasing model complexity that converges to the global isotonic regression solution. The models along the sequence are often more accurate than the unregularized isotonic regression model because of the complexity control they offer. We quantify this complexity control through estimation of degrees of freedom along the path. Success of the regularized models in prediction and IRPs favorable computational properties are demonstrated through a series of simulated and real data experiments. We discuss application of IRP to the problem of searching for gene--gene interactions and epistasis, and demonstrate it on data from genome-wide association studies of three common diseases.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS504 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Improving Table Compression with Combinatorial Optimization

Author: Buchsbaum Adam L.
Fowler Glenn S.
Giancarlo Raffaele
Publication venue
Publication date: 01/01/2002
Field of study

We study the problem of compressing massive tables within the partition-training paradigm introduced by Buchsbaum et al. [SODA'00], in which a table is partitioned by an off-line training procedure into disjoint intervals of columns, each of which is compressed separately by a standard, on-line compressor like gzip. We provide a new theory that unifies previous experimental observations on partitioning and heuristic observations on column permutation, all of which are used to improve compression rates. Based on the theory, we devise the first on-line training algorithms for table compression, which can be applied to individual files, not just continuously operating sources; and also a new, off-line training algorithm, based on a link to the asymmetric traveling salesman problem, which improves on prior work by rearranging columns prior to partitioning. We demonstrate these results experimentally. On various test files, the on-line algorithms provide 35-55% improvement over gzip with negligible slowdown; the off-line reordering provides up to 20% further improvement over partitioning alone. We also show that a variation of the table compression problem is MAX-SNP hard.Comment: 22 pages, 2 figures, 5 tables, 23 references. Extended abstract appears in Proc. 13th ACM-SIAM SODA, pp. 213-222, 200

arXiv.org e-Print Archive

CiteSeerX

Flexible Grid service management through resource partitioning

Author: De Leenheer Marc
De Turck Filip
Demeester Piet
Dhoedt Bart
Thysebaert Pieter
Volckaert Bruno
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2006
Field of study

Ghent University Academic Bibliography

An intelligent assistant for exploratory data analysis

Author: B. S. Everitt
C. A. O'Muircheartaigh
D. E. Goldberg
J. A. Davis
J. A. Davis
J. Dougherty
J. Fox
J. Healey
J. R. Quinlan
J. R. Quinlan
J. W. Tukey
K. M. Ho
L. Breiman
L. Davis
P. D. Scott
P. D. Scott
U. M. Fayyad
W. J. Frawley
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1997
Field of study

In this paper we present an account of the main features of SNOUT, an intelligent assistant for exploratory data analysis (EDA) of social science survey data that incorporates a range of data mining techniques. EDA has much in common with existing data mining techniques: its main objective is to help an investigator reach an understanding of the important relationships ina data set rather than simply develop predictive models for selectd variables. Brief descriptions of a number of novel techniques developed for use in SNOUT are presented. These include heuristic variable level inference and classification, automatic category formation, the use of similarity trees to identify groups of related variables, interactive decision tree construction and model selection using a genetic algorithm

Crossref

Kent Academic Repository