205,327 research outputs found

    Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning

    Get PDF
    Humans have an inherent ability to learn novel concepts from only a few samples and generalize these concepts to different situations. Even though today's machine learning models excel with a plethora of training data on standard recognition tasks, a considerable gap exists between machine-level pattern recognition and human-level concept learning. To narrow this gap, the Bongard Problems (BPs) were introduced as an inspirational challenge for visual cognition in intelligent systems. Albeit new advances in representation learning and learning to learn, BPs remain a daunting challenge for modern AI. Inspired by the original one hundred BPs, we propose a new benchmark Bongard-LOGO for human-level concept learning and reasoning. We develop a program-guided generation technique to produce a large set of human-interpretable visual cognition problems in action-oriented LOGO language. Our benchmark captures three core properties of human cognition: 1) context-dependent perception, in which the same object may have disparate interpretations given different contexts; 2) analogy-making perception, in which some meaningful concepts are traded off for other meaningful concepts; and 3) perception with a few samples but infinite vocabulary. In experiments, we show that the state-of-the-art deep learning methods perform substantially worse than human subjects, implying that they fail to capture core human cognition properties. Finally, we discuss research directions towards a general architecture for visual reasoning to tackle this benchmark

    Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning

    Get PDF
    Humans have an inherent ability to learn novel concepts from only a few samples and generalize these concepts to different situations. Even though today's machine learning models excel with a plethora of training data on standard recognition tasks, a considerable gap exists between machine-level pattern recognition and human-level concept learning. To narrow this gap, the Bongard problems (BPs) were introduced as an inspirational challenge for visual cognition in intelligent systems. Despite new advances in representation learning and learning to learn, BPs remain a daunting challenge for modern AI. Inspired by the original one hundred BPs, we propose a new benchmark Bongard-LOGO for human-level concept learning and reasoning. We develop a program-guided generation technique to produce a large set of human-interpretable visual cognition problems in action-oriented LOGO language. Our benchmark captures three core properties of human cognition: 1) context-dependent perception, in which the same object may have disparate interpretations given different contexts; 2) analogy-making perception, in which some meaningful concepts are traded off for other meaningful concepts; and 3) perception with a few samples but infinite vocabulary. In experiments, we show that the state-of-the-art deep learning methods perform substantially worse than human subjects, implying that they fail to capture core human cognition properties. Finally, we discuss research directions towards a general architecture for visual reasoning to tackle this benchmark.Comment: 22 pages, NeurIPS 202

    Machine Vision, Not Human Vision, Guided Compression Towards Low-Latency and Robust Deep Learning Systems

    Get PDF
    Deep Neural Networks (DNNs) have been achieving extraordinary performance across many exciting real-world applications, including image classification, speech recognition, natural language processing, medical diagnosis, self-driving cars, drones, anomaly detection and recognition of voice commands. However, the de facto DNN technique in real life exposes to two critical issues: First, the ever-increasing amounts of data generated from mobile devices, sensors, and the Internet of Things (IoT) challenge the performance of the DNN system. there lack efficient solutions to reduce the power-hungry data offloading and storage on terminal devices like edge sensors, especially in face of the stringent constraints on communication bandwidth, energy, and hardware resources. Second, DNN models are inherently vulnerable to adversarial examples (AEs), i.e.malicious inputs crafted by adding small and human-imperceptible perturbations to normal inputs, strongly fooling the cognitive function of DNNs. Though image compression technique has been explored to mitigate the adversarial examples, however, existing solutions are unable to offer a good balance between the efficiency of removing adversarial perturbation on malicious inputs and classification accuracy on benign samples. This dissertation makes solid strides towards developing low-latency and robust deep learning systems by for the first time leveraging the deep understandings of the image perception difference between human vision and deep learning systems (a.k.a. machine vision in this dissertation). In the first part, we propose to develop three types of “machine vision guided image compression frameworks, dedicated to accelerating both cloud-based deep learning image classification and 3D medical image segmentation with almost zero accuracy drop, by embracing the nature of deep cascaded information process mechanism of DNN architecture. To the best of our knowledge, this is the first effort to systematically re-architecture existing data compression techniques that are centered around the human vision to be machine vision favorable, thereby achieving significant service speed-up. In the second part, we propose a JPEG-based defensive compression framework, namely “feature-distillation”, to effectively rectify adversarial examples without impacting classification accuracy on benign images. Experimental results show that the very low cost “feature-distillation can deliver the best defense efficiency with negligible accuracy reduction among existing input pre-processing based defense techniques, serving as a new baseline and reference design for future defense methods development

    Computational exploration of molecular receptive fields in the olfactory bulb reveals a glomerulus-centric chemical map

    Get PDF
    © The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.Progress in olfactory research is currently hampered by incomplete knowledge about chemical receptive ranges of primary receptors. Moreover, the chemical logic underlying the arrangement of computational units in the olfactory bulb has still not been resolved. We undertook a large-scale approach at characterising molecular receptive ranges (MRRs) of glomeruli in the dorsal olfactory bulb (dOB) innervated by the MOR18-2 olfactory receptor, also known as Olfr78, with human ortholog OR51E2. Guided by an iterative approach that combined biological screening and machine learning, we selected 214 odorants to characterise the response of MOR18-2 and its neighbouring glomeruli. We found that a combination of conventional physico-chemical and vibrational molecular descriptors performed best in predicting glomerular responses using nonlinear Support-Vector Regression. We also discovered several previously unknown odorants activating MOR18-2 glomeruli, and obtained detailed MRRs of MOR18-2 glomeruli and their neighbours. Our results confirm earlier findings that demonstrated tunotopy, that is, glomeruli with similar tuning curves tend to be located in spatial proximity in the dOB. In addition, our results indicate chemotopy, that is, a preference for glomeruli with similar physico-chemical MRR descriptions being located in spatial proximity. Together, these findings suggest the existence of a partial chemical map underlying glomerular arrangement in the dOB. Our methodology that combines machine learning and physiological measurements lights the way towards future high-throughput studies to deorphanise and characterise structure-activity relationships in olfaction.Peer reviewe

    RL-LIM: Reinforcement Learning-based Locally Interpretable Modeling

    Full text link
    Understanding black-box machine learning models is important towards their widespread adoption. However, developing globally interpretable models that explain the behavior of the entire model is challenging. An alternative approach is to explain black-box models through explaining individual prediction using a locally interpretable model. In this paper, we propose a novel method for locally interpretable modeling - Reinforcement Learning-based Locally Interpretable Modeling (RL-LIM). RL-LIM employs reinforcement learning to select a small number of samples and distill the black-box model prediction into a low-capacity locally interpretable model. Training is guided with a reward that is obtained directly by measuring agreement of the predictions from the locally interpretable model with the black-box model. RL-LIM near-matches the overall prediction performance of black-box models while yielding human-like interpretability, and significantly outperforms state of the art locally interpretable models in terms of overall prediction performance and fidelity.Comment: 18 pages, 7 figures, 7 table

    Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning

    Full text link
    Dancing to music is one of human's innate abilities since ancient times. In machine learning research, however, synthesizing dance movements from music is a challenging problem. Recently, researchers synthesize human motion sequences through autoregressive models like recurrent neural network (RNN). Such an approach often generates short sequences due to an accumulation of prediction errors that are fed back into the neural network. This problem becomes even more severe in the long motion sequence generation. Besides, the consistency between dance and music in terms of style, rhythm and beat is yet to be taken into account during modeling. In this paper, we formalize the music-driven dance generation as a sequence-to-sequence learning problem and devise a novel seq2seq architecture to efficiently process long sequences of music features and capture the fine-grained correspondence between music and dance. Furthermore, we propose a novel curriculum learning strategy to alleviate error accumulation of autoregressive models in long motion sequence generation, which gently changes the training process from a fully guided teacher-forcing scheme using the previous ground-truth movements, towards a less guided autoregressive scheme mostly using the generated movements instead. Extensive experiments show that our approach significantly outperforms the existing state-of-the-arts on automatic metrics and human evaluation. We also make a demo video in the supplementary material to demonstrate the superior performance of our proposed approach.Comment: Accepted by ICLR 202

    Socially guided machine learning

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2006.Includes bibliographical references (p. 139-146).Social interaction will be key to enabling robots and machines in general to learn new tasks from ordinary people (not experts in robotics or machine learning). Everyday people who need to teach their machines new things will find it natural for to rely on their interpersonal interaction skills. This thesis provides several contributions towards the understanding of this Socially Guided Machine Learning scenario. While the topic of human input to machine learning algorithms has been explored to some extent, prior works have not gone far enough to understand what people will try to communicate when teaching a machine and how algorithms and learning systems can be modified to better accommodate a human partner. Interface techniques have been based on intuition and assumptions rather than grounded in human behavior, and often techniques are not demonstrated or evaluated with everyday people. Using a computer game, Sophie's Kitchen, an experiment with human subjects provides several insights about how people approach the task of teaching a machine. In particular, people want to direct and guide an agent's exploration process, they quickly use the behavior of the agent to infer a mental model of the learning process, and they utilize positive and negative feedback in asymmetric ways.(cont.) Using a robotic platform, Leonardo, and 200 people in follow-up studies of modified versions of the Sophie's Kitchen game, four research themes are developed. The use of human guidance in a machine learning exploration can be successfully incorporated to improve learning performance. Novel learning approaches demonstrate aspects of goal-oriented learning. The transparency of the machine learner can have significant effects on the nature of the instruction received from the human teacher, which in turn positively impacts the learning process. Utilizing asymmetric interpretations of positive and negative feedback from a human partner, can result in a more efficient and robust learning experience.by Andrea Lockerd Thomaz.Ph.D
    corecore