2,457 research outputs found

    Selecting Near-Optimal Learners via Incremental Data Allocation

    Full text link
    We study a novel machine learning (ML) problem setting of sequentially allocating small subsets of training data amongst a large set of classifiers. The goal is to select a classifier that will give near-optimal accuracy when trained on all data, while also minimizing the cost of misallocated samples. This is motivated by large modern datasets and ML toolkits with many combinations of learning algorithms and hyper-parameters. Inspired by the principle of "optimism under uncertainty," we propose an innovative strategy, Data Allocation using Upper Bounds (DAUB), which robustly achieves these objectives across a variety of real-world datasets. We further develop substantial theoretical support for DAUB in an idealized setting where the expected accuracy of a classifier trained on nn samples can be known exactly. Under these conditions we establish a rigorous sub-linear bound on the regret of the approach (in terms of misallocated data), as well as a rigorous bound on suboptimality of the selected classifier. Our accuracy estimates using real-world datasets only entail mild violations of the theoretical scenario, suggesting that the practical behavior of DAUB is likely to approach the idealized behavior.Comment: AAAI-2016: The Thirtieth AAAI Conference on Artificial Intelligenc

    Integrating Economic Knowledge in Data Mining Algorithms

    Get PDF
    The assessment of knowledge derived from databases depends on many factors. Decision makers often need to convince others about the correctness and effectiveness of knowledge induced from data.The current data mining techniques do not contribute much to this process of persuasion.Part of this limitation can be removed by integrating knowledge from experts in the field, encoded in some accessible way, with knowledge derived form patterns stored in the database.In this paper we will in particular discuss methods for implementing monotonicity constraints in economic decision problems.This prior knowledge is combined with data mining algorithms based on decision trees and neural networks.The method is illustrated in a hedonic price model.knowledge;neural network;data mining;decision trees

    Network Inference from Co-Occurrences

    Full text link
    The recovery of network structure from experimental data is a basic and fundamental problem. Unfortunately, experimental data often do not directly reveal structure due to inherent limitations such as imprecision in timing or other observation mechanisms. We consider the problem of inferring network structure in the form of a directed graph from co-occurrence observations. Each observation arises from a transmission made over the network and indicates which vertices carry the transmission without explicitly conveying their order in the path. Without order information, there are an exponential number of feasible graphs which agree with the observed data equally well. Yet, the basic physical principles underlying most networks strongly suggest that all feasible graphs are not equally likely. In particular, vertices that co-occur in many observations are probably closely connected. Previous approaches to this problem are based on ad hoc heuristics. We model the experimental observations as independent realizations of a random walk on the underlying graph, subjected to a random permutation which accounts for the lack of order information. Treating the permutations as missing data, we derive an exact expectation-maximization (EM) algorithm for estimating the random walk parameters. For long transmission paths the exact E-step may be computationally intractable, so we also describe an efficient Monte Carlo EM (MCEM) algorithm and derive conditions which ensure convergence of the MCEM algorithm with high probability. Simulations and experiments with Internet measurements demonstrate the promise of this approach.Comment: Submitted to IEEE Transactions on Information Theory. An extended version is available as University of Wisconsin Technical Report ECE-06-

    Multi-agents adaptive estimation and coverage control using Gaussian regression

    Full text link
    We consider a scenario where the aim of a group of agents is to perform the optimal coverage of a region according to a sensory function. In particular, centroidal Voronoi partitions have to be computed. The difficulty of the task is that the sensory function is unknown and has to be reconstructed on line from noisy measurements. Hence, estimation and coverage needs to be performed at the same time. We cast the problem in a Bayesian regression framework, where the sensory function is seen as a Gaussian random field. Then, we design a set of control inputs which try to well balance coverage and estimation, also discussing convergence properties of the algorithm. Numerical experiments show the effectivness of the new approach
    • …
    corecore