1,266 research outputs found

    On the Bayes-optimality of F-measure maximizers

    Get PDF
    The F-measure, which has originally been introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure is a statistically and computationally challenging problem, since no closed-form solution exists. Adopting a decision-theoretic perspective, this article provides a formal and experimental analysis of different approaches for maximizing the F-measure. We start with a Bayes-risk analysis of related loss functions, such as Hamming loss and subset zero-one loss, showing that optimizing such losses as a surrogate of the F-measure leads to a high worst-case regret. Subsequently, we perform a similar type of analysis for F-measure maximizing algorithms, showing that such algorithms are approximate, while relying on additional assumptions regarding the statistical distribution of the binary response variables. Furthermore, we present a new algorithm which is not only computationally efficient but also Bayes-optimal, regardless of the underlying distribution. To this end, the algorithm requires only a quadratic (with respect to the number of binary responses) number of parameters of the joint distribution. We illustrate the practical performance of all analyzed methods by means of experiments with multi-label classification problems

    Rough Sets and Near Sets in Medical Imaging: A Review

    Full text link

    On the Bayes-optimality of F-measure maximizers

    Get PDF
    The F-measure, which has originally been introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure is a statistically and computationally challenging problem, since no closed-form solution exists. Adopting a decision-theoretic perspective, this article provides a formal and experimental analysis of different approaches for maximizing the F-measure. We start with a Bayes-risk analysis of related loss functions, such as Hamming loss and subset zero-one loss, showing that optimizing such losses as a surrogate of the F-measure leads to a high worst-case regret. Subsequently, we perform a similar type of analysis for F-measure maximizing algorithms, showing that such algorithms are approximate, while relying on additional assumptions regarding the statistical distribution of the binary response variables. Furthermore, we present a new algorithm which is not only computationally efficient but also Bayes-optimal, regardless of the underlying distribution. To this end, the algorithm requires only a quadratic (with respect to the number of binary responses) number of parameters of the joint distribution. We illustrate the practical performance of all analyzed methods by means of experiments with multi-label classification problems

    Coverage and Time-optimal Motion Planning for Autonomous Vehicles

    Get PDF
    Autonomous vehicles are rapidly advancing with a variety of applications, such as area surveillance, environment mapping, and intelligent transportation. These applications require coverage and/or time-optimal motion planning, where the major challenges include uncertainties in the environment, motion constraints of vehicles, limited energy resources and potential failures. While dealing with these challenges in various capacities, this dissertation addresses three fundamental motion planning problems: (1) single-robot complete coverage in unknown environment, (2) multi-robot resilient and efficient coverage in unknown environment, and (3) time-optimal risk-aware motion planning for curvature-constrained vehicles. First, the ε* algorithm is developed for online coverage path planning in unknown environment using a single autonomous vehicle. It is computationally efficient, and can generate the desired back-and-forth path with less turns and overlappings. ε* prevents the local extrema problem, thus can guarantee complete coverage. Second, the CARE algorithm is developed which extends ε* for multi-robot resilient and efficient coverage in unknown environment. In case of failures, CARE guarantees complete coverage via dynamic task reallocations of other vehicles, hence provides resilience. Moreover, it reallocates idling vehicles to support others in their tasks, hence improves efficiency. Finally, the T* algorithm is developed to find the time-optimal risk-aware path for curvature-constrained vehicles. We present a novel risk function based on the concept of collision time, and integrate it with the time cost for optimization. The above-mentioned algorithms have been validated via simulations in complex scenarios and/or real experiments, and the results have shown clear advantages over existing popular approaches

    Learning in the Real World: Constraints on Cost, Space, and Privacy

    Get PDF
    The sheer demand for machine learning in fields as varied as: healthcare, web-search ranking, factory automation, collision prediction, spam filtering, and many others, frequently outpaces the intended use-case of machine learning models. In fact, a growing number of companies hire machine learning researchers to rectify this very problem: to tailor and/or design new state-of-the-art models to the setting at hand. However, we can generalize a large set of the machine learning problems encountered in practical settings into three categories: cost, space, and privacy. The first category (cost) considers problems that need to balance the accuracy of a machine learning model with the cost required to evaluate it. These include problems in web-search, where results need to be delivered to a user in under a second and be as accurate as possible. The second category (space) collects problems that require running machine learning algorithms on low-memory computing devices. For instance, in search-and-rescue operations we may opt to use many small unmanned aerial vehicles (UAVs) equipped with machine learning algorithms for object detection to find a desired search target. These algorithms should be small to fit within the physical memory limits of the UAV (and be energy efficient) while reliably detecting objects. The third category (privacy) considers problems where one wishes to run machine learning algorithms on sensitive data. It has been shown that seemingly innocuous analyses on such data can be exploited to reveal data individuals would prefer to keep private. Thus, nearly any algorithm that runs on patient or economic data falls under this set of problems. We devise solutions for each of these problem categories including (i) a fast tree-based model for explicitly trading off accuracy and model evaluation time, (ii) a compression method for the k-nearest neighbor classifier, and (iii) a private causal inference algorithm that protects sensitive data
    corecore