35 research outputs found

    Subspace Selection via DR-Submodular Maximization on Lattices

    Full text link
    The subspace selection problem seeks a subspace that maximizes an objective function under some constraint. This problem includes several important machine learning problems such as the principal component analysis and sparse dictionary selection problem. Often, these problems can be solved by greedy algorithms. Here, we are interested in why these problems can be solved by greedy algorithms, and what classes of objective functions and constraints admit this property. To answer this question, we formulate the problems as optimization problems on lattices. Then, we introduce a new class of functions, directional DR-submodular functions, to characterize the approximability of problems. We see that the principal component analysis, sparse dictionary selection problem, and these generalizations have directional DR-submodularities. We show that, under several constraints, the directional DR-submodular function maximization problem can be solved efficiently with provable approximation factors

    Discrete Optimization Methods for Segmentation and Matching

    Get PDF
    This dissertation studies discrete optimization methods for several computer vision problems. In the first part, a new objective function for superpixel segmentation is proposed. This objective function consists of two components: entropy rate of a random walk on a graph and a balancing term. The entropy rate favors formation of compact and homogeneous clusters, while the balancing function encourages clusters with similar sizes. I present a new graph construction for images and show that this construction induces a matroid. The segmentation is then given by the graph topology which maximizes the objective function under the matroid constraint. By exploiting submodular and monotonic properties of the objective function, I develop an efficient algorithm with a worst-case performance bound of 12\frac{1}{2} for the superpixel segmentation problem. Extensive experiments on the Berkeley segmentation benchmark show the proposed algorithm outperforms the state of the art in all the standard evaluation metrics. Next, I propose a video segmentation algorithm by maximizing a submodular objective function subject to a matroid constraint. This function is similar to the standard energy function in computer vision with unary terms, pairwise terms from the Potts model, and a novel higher-order term based on appearance histograms. I show that the standard Potts model prior, which becomes non-submodular for multi-label problems, still induces a submodular function in a maximization framework. A new higher-order prior further enforces consistency in the appearance histograms both spatially and temporally across the video. The matroid constraint leads to a simple algorithm with a performance bound of 12\frac{1}{2}. A branch and bound procedure is also presented to improve the solution computed by the algorithm. The last part of the dissertation studies the object localization problem in images given a single hand-drawn example or a gallery of shapes as the object model. Although many shape matching algorithms have been proposed for the problem, chamfer matching remains to be the preferred method when speed and robustness are considered. In this dissertation, I significantly improve the accuracy of chamfer matching while reducing the computational time from linear to sublinear (shown empirically). It is achieved by incorporating edge orientation information in the matching algorithm so the resulting cost function is piecewise smooth and the cost variation is tightly bounded. Moreover, I present a sublinear time algorithm for exact computation of the directional chamfer matching score using techniques from 3D distance transforms and directional integral images. In addition, the smooth cost function allows one to bound the cost distribution of large neighborhoods and skip the bad hypotheses. Experiments show that the proposed approach improves the speed of the original chamfer matching up to an order of 45 times, and it is much faster than many state of art techniques while the accuracy is comparable. I further demonstrate the application of the proposed algorithm in providing seamless operation for a robotic bin picking system


    Get PDF
    One of the most challenging tasks in the Machine Learning context is the feature selection. It consists in selecting the best set of features to use in the training and prediction processes. There are several benefits from pruning the set of actually operational features: the consequent reduction of the computation time, often a better quality of the prediction, the possibility to use less data to create a good predictor. In its most common form, the problem is called single-view feature selection problem, to distinguish it from the feature selection task in Multi-view learning. In the latter, each view corresponds to a set of features and one would like to enact feature selection on each view, subject to some global constraints. A related problem in the context of Multi-View Learning, is Feature Partitioning: it consists in splitting the set of features of a single large view into two or more views so that it becomes possible to create a good predictor based on each view. In this case, the best features must be distributed between the views, each view should contain synergistic features, while features that interfere disruptively must be placed in different views. In the semi-supervised multi-view task known as Co-training, one requires also that each predictor trained on an individual view is able to teach something to the other views: in classification tasks for instance, one view should learn to classify unlabelled examples based on the guess provided by the other views. There are several ways to address these problems. A set of techniques is inspired by Coalitional Game Theory. Such theory defines several useful concepts, among which two are of high practical importance: the concept of power index and the concept of interaction index. When used in the context of feature selection, they take the following meaning: the power index is a (context-dependent) synthesis measure of the prediction\u2019s capability of a feature, the interaction index is a (context-dependent) synthesis measure of the interaction (constructive/disruptive interference) between two features: it can be used to quantify how the collaboration between two features enhances their prediction capabilities. An important point is that the powerindex of a feature is different from the predicting power of the feature in isolation: it takes into account, by a suitable averaging, the context, i.e. the fact that the feature is acting, together with other features, to train a model. Similarly, the interaction index between two features takes into account the context, by suitably averaging the interaction with all the other features. In this work we address both the single-view and the multi-view problems as follows. The single-view feature selection problem, is formalized as the problem of maximization of a pseudo-boolean function, i.e. a real valued set function (that maps sets of features into a performance metric). Since one has to enact a search over (a considerable portion of) the Boolean lattice (without any special guarantees, except, perhaps, positivity) the problem is in general NP-hard. We address the problem producing candidate maximum coalitions through the selection of the subset of features characterized by the highest power indices and using the coalition to approximate the actual maximum. Although the exact computation of the power indices is an exponential task, the estimates of the power indices for the purposes of the present problem can be achieved in polynomial time. The multi-view feature selection problem is formalized as the generalization of the above set-up to the case of multi-variable pseudo-boolean functions. The multi-view splitting problem is formalized instead as the problem of maximization of a real function defined over the partition lattice. Also this problem is typically NP-hard. However, candidate solutions can be found by suitably partitioning the top power-index features and keeping in different views the pairs of features that are less interactive or negatively interactive. The sum of the power indices of the participating features can be used to approximate the prediction capability of the view (i.e. they can be used as a proxy for the predicting power). The sum of the feature pair interactivity across views can be used as proxy for the orthogonality of the views. Also the capability of a view to pass information (to teach) to other views, within a co-training procedure can benefit from the use of power indices based on a suitable definition of information transfer (a set of features { a coalition { classifies examples that are subsequently used in the training of a second set of features). As to the feature selection task, not only we demonstrate the use of state of the art power index concepts (e.g. Shapley Value and Banzhaf along the 2lines described above Value), but we define new power indices, within the more general class of probabilistic power indices, that contains the Shapley and the Banzhaf Values as special cases. Since the number of features to select is often a predefined parameter of the problem, we also introduce some novel power indices, namely k-Power Index (and its specializations k-Shapley Value, k-Banzhaf Value): they help selecting the features in a more efficient way. For the feature partitioning, we use the more general class of probabilistic interaction indices that contains the Shapley and Banzhaf Interaction Indices as members. We also address the problem of evaluating the teaching ability of a view, introducing a suitable teaching capability index. The last contribution of the present work consists in comparing the Game Theory approach to the classical Greedy Forward Selection approach for feature selection. In the latter the candidate is obtained by aggregating one feature at time to the current maximal coalition, by choosing always the feature with the maximal marginal contribution. In this case we show that in typical cases the two methods are complementary, and that when used in conjunction they reduce one another error in the estimate of the maximum value. Moreover, the approach based on game theory has two advantages: it samples the space of all possible features\u2019 subsets, while the greedy algorithm scans a selected subspace excluding totally the rest of it, and it is able, for each feature, to assign a score that describes a context-aware measure of importance in the prediction process

    Proceedings of the 10th Japanese-Hungarian Symposium on Discrete Mathematics and Its Applications

    Get PDF

    Extending data mining techniques for frequent pattern discovery : trees, low-entropy sets, and crossmining

    Get PDF
    The idea of frequent pattern discovery is to find frequently occurring events in large databases. Such data mining techniques can be useful in various domains. For instance, in recommendation and e-commerce systems frequently occurring product purchase combinations are essential in user preference modeling. In the ecological domain, patterns of frequently occurring groups of species can be used to reveal insight into species interaction dynamics. Over the past few years, most frequent pattern mining research has concentrated on efficiency (speed) of mining algorithms. However, it has been argued within the community that while efficiency of the mining task is no longer a bottleneck, there is still an urgent need for methods that derive compact, yet high quality results with good application properties. The aim of this thesis is to address this need. The first part of the thesis discusses a new type of tree pattern class for expressing hierarchies of general and more specific attributes in unstructured binary data. The new pattern class is shown to have advantageous properties, and to discover relationships in data that cannot be expressed alone with the more traditional frequent itemset or association rule patterns. The second and third parts of the thesis discuss the use of entropy as a score measure for frequent pattern mining. A new pattern class is defined, low-entropy sets, which allow to express more general types of occurrence structure than with frequent itemsets. The concept can also be easily applied to tree types of pattern. Furthermore, by applying minimum description length in pattern selection for low-entropy sets it is shown experimentally that in most cases the collections of selected patterns are much smaller than by using frequent itemsets. The fourth part of the thesis examines the idea of crossmining itemsets, that is, relating itemsets to numerical variables in a database of mixed data types. The problem is formally defined and turns out to be NP-hard, although it is approximately solvable within a constant-factor of the optimum solution. Experiments show that the algorithm finds itemsets that convey structure in both the binary and the numerical part of the data

    Graph Theoretic Algorithms Adaptable to Quantum Computing

    Full text link
    Computational methods are rapidly emerging as an essential tool for understanding and solving complex engineering problems, which complement the traditional tools of experimentation and theory. When considered in a discrete computational setting, many engineering problems can be reduced to a graph coloring problem. Examples range from systems design, airline scheduling, image segmentation to pattern recognition, where energy cost functions with discrete variables are extremized. However, using discrete variables over continuous variables introduces some complications when defining differential quantities, such as gradients and Hessians involved in scientific computations within solid and fluid mechanics. Consequently, graph techniques are under-utilized in this important domain. However, we have recently witnessed great developments in quantum computing where physical devices can solve discrete optimization problems faster than most well-known classical algorithms. This warrants further investigation into the re-formulation of scientific computation problems into graph-theoretic problems, thus enabling rapid engineering simulations in a soon-to-be quantum computing world. The computational techniques developed in this thesis allow the representation of surface scalars, such as perimeter and area, using discrete variables in a graph. Results from integral geometry, specifically Cauchy-Crofton relations, are used to estimate these scalars via submodular functions. With this framework, several quantities important to engineering applications can be represented in graph-based algorithms. These include the surface energy of cracks for fracture prediction, grain boundary energy to model microstructure evolution, and surface area estimates (of grains and fibers) for generating conformal meshes. Combinatorial optimization problems for these applications are presented first. The last two chapters describe two new graph coloring algorithms implemented on a physical quantum computing device: the D-wave quantum annealer. The first algorithm describes a functional minimization approach to solve differential equations. The second algorithm describes a realization of the Boltzmann machine learning algorithm on a quantum annealer. The latter allows generative and discriminative learning of data, which has vast applications in many fields. Theoretical aspects and the implementation of these problems are outlined with a focus on engineering applications.PHDAerospace EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168116/1/sidsriva_1.pd