31 research outputs found

    DMLR: Data-centric Machine Learning Research -- Past, Present and Future

    Full text link
    Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods towards positive scientific, societal and business impact.Comment: This editorial report accompanies the inaugural Data-centric Machine Learning Research (DMLR) Workshop that took place at ICML 2023 https://dmlr.ai

    DataPerf: Benchmarks for Data-Centric AI Development

    Full text link
    Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing dataset benchmarks. In response, we present DataPerf, a community-led benchmark suite for evaluating ML datasets and data-centric algorithms. We aim to foster innovation in data-centric AI through competition, comparability, and reproducibility. We enable the ML community to iterate on datasets, instead of just architectures, and we provide an open, online platform with multiple rounds of challenges to support this iterative development. The first iteration of DataPerf contains five benchmarks covering a wide spectrum of data-centric techniques, tasks, and modalities in vision, speech, acquisition, debugging, and diffusion prompting, and we support hosting new contributed benchmarks from the community. The benchmarks, online evaluation platform, and baseline implementations are open source, and the MLCommons Association will maintain DataPerf to ensure long-term benefits to academia and industry.Comment: NeurIPS 2023 Datasets and Benchmarks Trac

    The Heuristic Reasoning Manifesto

    No full text
    We argue for heuristic reasoning as a solution to the brittleness problem. Heuristic reasoning methods exploit the information processing structure of the reasoning system and the structure of the environment to produce reasonable answers when knowledge and/or computational resources for finding the perfect correct answer might not exist. Capturing all the heuristics to generate reasonable answers might not be as colossal of a project as it might first seem: we conjecture that there are about fifteen heuristic domains, and each of them have approximately ten heuristic methods.

    Symbolizing Quantity

    No full text
    Quantities are ubiquitous and an important part of our understanding about the world – we talk of engine horsepower, size, mileage, price of cars; GDP, population, area of countries; wingspan, weight, surface area of birds, and so on. In this paper, we present cognitively plausible symbolic representations of quantity and principles for generating those representations. Bringing together evidence in linguistics and psychology, we argue that our representations must make two kinds of distinctions – dimensional, those that denote changes of quantity, e.g., large and small; and structural, those that denote changes of quality, e.g. boiling point and poverty line. We present results of a pilot experiment that suggests that there is a significant agreement between people about the dimensional distinctions. We then describe a computational model CARVE, which is a system that learns to make dimensional and structural distinctions on quantities by being exposed to examples.

    A Sketch of a Theory of Quantity

    No full text
    Quantities are ubiquitous and an important part of our understanding about the world – we talk of engine horsepower, size, mileage, price of cars; GDP, population, area of countries; wingspan, weight, surface area of birds, and so on. In this paper, we present a sketch of a theory of quantity – cognitively sound representations and principles for generating those representations. We present evidence from psychology, natural language, and ecological constraints to argue for a cognitively plausible representation of quantities. We then propose a general principle of how to make the necessary and relevant distinctions. Structured models of retrieval, similarity, and generalization, and in general models involving symbolic representations, do not handle quantities adequately. That is an artifact of poor representations of quantity, and we believe that the representations proposed here will make these models more quantity-aware. This investigation is at the intersection of qualitative reasoning, cognitive psychology, and linguistics, and builds on existing evidence in these fields to potentially contribute to the understanding of quantities in all the three.
    corecore