221,244 research outputs found

    On dual model-free variable selection with two groups of variables

    Get PDF
    In the presence of two groups of variables, existing model-free variable selection methods only reduce the dimensionality of the predictors. We extend the popular marginal coordinate hypotheses [3] in the sufficient dimension reduction literature and consider the dual marginal coordinate hypotheses, where the role of the predictor and the response is not important. Motivated by canonical correlation analysis (CCA), we propose a CCA-based test for the dual marginal coordinate hypotheses, and devise a joint backward selection algorithm for dual model-free variable selection. The performances of the proposed test and the variable selection procedure are evaluated through synthetic examples and a real data analysis

    Pairwise MRF Calibration by Perturbation of the Bethe Reference Point

    Get PDF
    We investigate different ways of generating approximate solutions to the pairwise Markov random field (MRF) selection problem. We focus mainly on the inverse Ising problem, but discuss also the somewhat related inverse Gaussian problem because both types of MRF are suitable for inference tasks with the belief propagation algorithm (BP) under certain conditions. Our approach consists in to take a Bethe mean-field solution obtained with a maximum spanning tree (MST) of pairwise mutual information, referred to as the \emph{Bethe reference point}, for further perturbation procedures. We consider three different ways following this idea: in the first one, we select and calibrate iteratively the optimal links to be added starting from the Bethe reference point; the second one is based on the observation that the natural gradient can be computed analytically at the Bethe point; in the third one, assuming no local field and using low temperature expansion we develop a dual loop joint model based on a well chosen fundamental cycle basis. We indeed identify a subclass of planar models, which we refer to as \emph{Bethe-dual graph models}, having possibly many loops, but characterized by a singly connected dual factor graph, for which the partition function and the linear response can be computed exactly in respectively O(N) and O(N2)O(N^2) operations, thanks to a dual weight propagation (DWP) message passing procedure that we set up. When restricted to this subclass of models, the inverse Ising problem being convex, becomes tractable at any temperature. Experimental tests on various datasets with refined L0L_0 or L1L_1 regularization procedures indicate that these approaches may be competitive and useful alternatives to existing ones.Comment: 54 pages, 8 figure. section 5 and refs added in V

    Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

    Get PDF
    For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the l1-norm or the block l1-norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance

    Proactive Assessment of Accident Risk to Improve Safety on a System of Freeways, Research Report 11-15

    Get PDF
    This report describes the development and evaluation of real-time crash risk-assessment models for four freeway corridors: U.S. Route 101 NB (northbound) and SB (southbound) and Interstate 880 NB and SB. Crash data for these freeway segments for the 16-month period from January 2010 through April 2011 are used to link historical crash occurrences with real-time traffic patterns observed through loop-detector data. \u27The crash risk-assessment models are based on a binary classification approach (crash and non-crash outcomes), with traffic parameters measured at surrounding vehicle detection station (VDS) locations as the independent variables. The analysis techniques used in this study are logistic regression and classification trees. Prior to developing the models, some data-related issues such as data cleaning and aggregation were addressed. The modeling efforts revealed that the turbulence resulting from speed variation is significantly associated with crash risk on the U.S. 101 NB corridor. The models estimated with data from U.S. 101 NB were evaluated on the basis of their classification performance, not only on U.S. 101 NB, but also on the other three freeway segments for transferability assessment. It was found that the predictive model derived from one freeway can be readily applied to other freeways, although the classification performance decreases. The models that transfer best to other roadways were determined to be those that use the least number of VDSs–that is, those that use one upstream or downstream station rather than two or three.\ The classification accuracy of the models is discussed in terms of how the models can be used for real-time crash risk assessment. The models can be applied to developing and testing variable speed limits (VSLs) and ramp-metering strategies that proactively attempt to reduce crash risk

    Optimization bounds from the branching dual

    Full text link
    We present a general method for obtaining strong bounds for discrete optimization problems that is based on a concept of branching duality. It can be applied when no useful integer programming model is available, and we illustrate this with the minimum bandwidth problem. The method strengthens a known bound for a given problem by formulating a dual problem whose feasible solutions are partial branching trees. It solves the dual problem with a “worst-bound” local search heuristic that explores neighboring partial trees. After proving some optimality properties of the heuristic, we show that it substantially improves known combinatorial bounds for the minimum bandwidth problem with a modest amount of computation. It also obtains significantly tighter bounds than depth-first and breadth-first branching, demonstrating that the dual perspective can lead to better branching strategies when the object is to find valid bounds.Accepted manuscrip
    • …
    corecore