183 research outputs found

    Emergent dimensions underlying human understanding of the reachable world

    Get PDF
    Near-scale, reach-relevant environments, like work desks, restaurant place settings or lab benches, are the interface of our hand-based interactions with the world. How are our conceptual representations of these environments organized? For navigable-scale scenes, global properties such as openness, depth or naturalness have been identified, but the analogous organizing principles for reach-scale environments are not known. To uncover such principles, we obtained 1.25 million odd-one-out behavioral judgments on image triplets assembled from 990 reachspace images. Images were selected to comprehensively sample the variation both between and within reachspace categories. Using data-driven modeling, we generated a 30-dimensional embedding which predicts human similarity judgments among the images. First, examination of the embedding dimensions revealed key properties that distinguish among reachspaces, relating to their structural layout, affordances, visual appearances and functional roles. Second, clustering analyses performed over the embedding revealed four distinct interpretable classes of reachspaces, with separate clusters for spaces related to food, electronics, analog activities, and storage or display. Finally, we found that the similarity structure among reachspace images was better predicted by the function of the spaces than their locations, suggesting that reachspaces are largely conceptualized in terms of the actions they are designed to support. Altogether, these results reveal the behaviorally-relevant principles that that structure our internal representations of reach-relevant environments

    Computer vision based classification of fruits and vegetables for self-checkout at supermarkets

    Get PDF
    The field of machine learning, and, in particular, methods to improve the capability of machines to perform a wider variety of generalised tasks are among the most rapidly growing research areas in today’s world. The current applications of machine learning and artificial intelligence can be divided into many significant fields namely computer vision, data sciences, real time analytics and Natural Language Processing (NLP). All these applications are being used to help computer based systems to operate more usefully in everyday contexts. Computer vision research is currently active in a wide range of areas such as the development of autonomous vehicles, object recognition, Content Based Image Retrieval (CBIR), image segmentation and terrestrial analysis from space (i.e. crop estimation). Despite significant prior research, the area of object recognition still has many topics to be explored. This PhD thesis focuses on using advanced machine learning approaches to enable the automated recognition of fresh produce (i.e. fruits and vegetables) at supermarket self-checkouts. This type of complex classification task is one of the most recently emerging applications of advanced computer vision approaches and is a productive research topic in this field due to the limited means of representing the features and machine learning techniques for classification. Fruits and vegetables offer significant inter and intra class variance in weight, shape, size, colour and texture which makes the classification challenging. The applications of effective fruit and vegetable classification have significant importance in daily life e.g. crop estimation, fruit classification, robotic harvesting, fruit quality assessment, etc. One potential application for this fruit and vegetable classification capability is for supermarket self-checkouts. Increasingly, supermarkets are introducing self-checkouts in stores to make the checkout process easier and faster. However, there are a number of challenges with this as all goods cannot readily be sold with packaging and barcodes, for instance loose fresh items (e.g. fruits and vegetables). Adding barcodes to these types of items individually is impractical and pre-packaging limits the freedom of choice when selecting fruits and vegetables and creates additional waste, hence reducing customer satisfaction. The current situation, which relies on customers correctly identifying produce themselves leaves open the potential for incorrect billing either due to inadvertent error, or due to intentional fraudulent misclassification resulting in financial losses for the store. To address this identified problem, the main goals of this PhD work are: (a) exploring the types of visual and non-visual sensors that could be incorporated into a self-checkout system for classification of fruits and vegetables, (b) determining a suitable feature representation method for fresh produce items available at supermarkets, (c) identifying optimal machine learning techniques for classification within this context and (d) evaluating our work relative to the state-of-the-art object classification results presented in the literature. An in-depth analysis of related computer vision literature and techniques is performed to identify and implement the possible solutions. A progressive process distribution approach is used for this project where the task of computer vision based fruit and vegetables classification is divided into pre-processing and classification techniques. Different classification techniques have been implemented and evaluated as possible solution for this problem. Both visual and non-visual features of fruit and vegetables are exploited to perform the classification. Novel classification techniques have been carefully developed to deal with the complex and highly variant physical features of fruit and vegetables while taking advantages of both visual and non-visual features. The capability of classification techniques is tested in individual and ensemble manner to achieved the higher effectiveness. Significant results have been obtained where it can be concluded that the fruit and vegetables classification is complex task with many challenges involved. It is also observed that a larger dataset can better comprehend the complex variant features of fruit and vegetables. Complex multidimensional features can be extracted from the larger datasets to generalise on higher number of classes. However, development of a larger multiclass dataset is an expensive and time consuming process. The effectiveness of classification techniques can be significantly improved by subtracting the background occlusions and complexities. It is also worth mentioning that ensemble of simple and less complicated classification techniques can achieve effective results even if applied to less number of features for smaller number of classes. The combination of visual and nonvisual features can reduce the struggle of a classification technique to deal with higher number of classes with similar physical features. Classification of fruit and vegetables with similar physical features (i.e. colour and texture) needs careful estimation and hyper-dimensional embedding of visual features. Implementing rigorous classification penalties as loss function can achieve this goal at the cost of time and computational requirements. There is a significant need to develop larger datasets for different fruit and vegetables related computer vision applications. Considering more sophisticated loss function penalties and discriminative hyper-dimensional features embedding techniques can significantly improve the effectiveness of the classification techniques for the fruit and vegetables applications

    Texture-based latent space disentanglement for enhancement of a training dataset for ANN-based classification of fruit and vegetables

    Get PDF
    The capability of Convolutional Neural Networks (CNNs) for sparse representation has significant application to complex tasks like Representation Learning (RL). However, labelled datasets of sufficient size for learning this representation are not easily obtainable. The unsupervised learning capability of Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) provide a promising solution to this issue through their capacity to learn representations for novel data samples and classification tasks. In this research, a texture-based latent space disentanglement technique is proposed to enhance learning of representations for novel data samples. A comparison is performed among different VAEs and GANs with the proposed approach for synthesis of new data samples. Two different VAE architectures are considered, a single layer dense VAE and a convolution based VAE, to compare the effectiveness of different architectures for learning of the representations. The GANs are selected based on the distance metric for disjoint distribution divergence estimation of complex representation learning tasks. The proposed texture-based disentanglement has been shown to provide a significant improvement for disentangling the process of representation learning by conditioning the random noise and synthesising texture rich images of fruit and vegetables

    Learning Sequential Acquisition Policies for Robot-Assisted Feeding

    Full text link
    A robot providing mealtime assistance must perform specialized maneuvers with various utensils in order to pick up and feed a range of food items. Beyond these dexterous low-level skills, an assistive robot must also plan these strategies in sequence over a long horizon to clear a plate and complete a meal. Previous methods in robot-assisted feeding introduce highly specialized primitives for food handling without a means to compose them together. Meanwhile, existing approaches to long-horizon manipulation lack the flexibility to embed highly specialized primitives into their frameworks. We propose Visual Action Planning OveR Sequences (VAPORS), a framework for long-horizon food acquisition. VAPORS learns a policy for high-level action selection by leveraging learned latent plate dynamics in simulation. To carry out sequential plans in the real world, VAPORS delegates action execution to visually parameterized primitives. We validate our approach on complex real-world acquisition trials involving noodle acquisition and bimanual scooping of jelly beans. Across 38 plates, VAPORS acquires much more efficiently than baselines, generalizes across realistic plate variations such as toppings and sauces, and qualitatively appeals to user feeding preferences in a survey conducted across 49 individuals. Code, datasets, videos, and supplementary materials can be found on our website: https://sites.google.com/view/vaporsbot

    HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models

    Full text link
    In recent years, Text-to-Image (T2I) models have been extensively studied, especially with the emergence of diffusion models that achieve state-of-the-art results on T2I synthesis tasks. However, existing benchmarks heavily rely on subjective human evaluation, limiting their ability to holistically assess the model's capabilities. Furthermore, there is a significant gap between efforts in developing new T2I architectures and those in evaluation. To address this, we introduce HRS-Bench, a concrete evaluation benchmark for T2I models that is Holistic, Reliable, and Scalable. Unlike existing bench-marks that focus on limited aspects, HRS-Bench measures 13 skills that can be categorized into five major categories: accuracy, robustness, generalization, fairness, and bias. In addition, HRS-Bench covers 50 scenarios, including fashion, animals, transportation, food, and clothes. We evaluate nine recent large-scale T2I models using metrics that cover a wide range of skills. A human evaluation aligned with 95% of our evaluations on average was conducted to probe the effectiveness of HRS-Bench. Our experiments demonstrate that existing models often struggle to generate images with the desired count of objects, visual text, or grounded emotions. We hope that our benchmark help ease future text-to-image generation research. The code and data are available at https://eslambakr.github.io/hrsbench.github.i

    Next Generation Internet of Things – Distributed Intelligence at the Edge and Human-Machine Interactions

    Get PDF
    This book provides an overview of the next generation Internet of Things (IoT), ranging from research, innovation, development priorities, to enabling technologies in a global context. It is intended as a standalone in a series covering the activities of the Internet of Things European Research Cluster (IERC), including research, technological innovation, validation, and deployment.The following chapters build on the ideas put forward by the European Research Cluster, the IoT European Platform Initiative (IoT–EPI), the IoT European Large-Scale Pilots Programme and the IoT European Security and Privacy Projects, presenting global views and state-of-the-art results regarding the next generation of IoT research, innovation, development, and deployment.The IoT and Industrial Internet of Things (IIoT) are evolving towards the next generation of Tactile IoT/IIoT, bringing together hyperconnectivity (5G and beyond), edge computing, Distributed Ledger Technologies (DLTs), virtual/ andaugmented reality (VR/AR), and artificial intelligence (AI) transformation.Following the wider adoption of consumer IoT, the next generation of IoT/IIoT innovation for business is driven by industries, addressing interoperability issues and providing new end-to-end security solutions to face continuous treats.The advances of AI technology in vision, speech recognition, natural language processing and dialog are enabling the development of end-to-end intelligent systems encapsulating multiple technologies, delivering services in real-time using limited resources. These developments are focusing on designing and delivering embedded and hierarchical AI solutions in IoT/IIoT, edge computing, using distributed architectures, DLTs platforms and distributed end-to-end security, which provide real-time decisions using less data and computational resources, while accessing each type of resource in a way that enhances the accuracy and performance of models in the various IoT/IIoT applications.The convergence and combination of IoT, AI and other related technologies to derive insights, decisions and revenue from sensor data provide new business models and sources of monetization. Meanwhile, scalable, IoT-enabled applications have become part of larger business objectives, enabling digital transformation with a focus on new services and applications.Serving the next generation of Tactile IoT/IIoT real-time use cases over 5G and Network Slicing technology is essential for consumer and industrial applications and support reducing operational costs, increasing efficiency and leveraging additional capabilities for real-time autonomous systems.New IoT distributed architectures, combined with system-level architectures for edge/fog computing, are evolving IoT platforms, including AI and DLTs, with embedded intelligence into the hyperconnectivity infrastructure.The next generation of IoT/IIoT technologies are highly transformational, enabling innovation at scale, and autonomous decision-making in various application domains such as healthcare, smart homes, smart buildings, smart cities, energy, agriculture, transportation and autonomous vehicles, the military, logistics and supply chain, retail and wholesale, manufacturing, mining and oil and gas

    Next Generation Internet of Things – Distributed Intelligence at the Edge and Human-Machine Interactions

    Get PDF
    This book provides an overview of the next generation Internet of Things (IoT), ranging from research, innovation, development priorities, to enabling technologies in a global context. It is intended as a standalone in a series covering the activities of the Internet of Things European Research Cluster (IERC), including research, technological innovation, validation, and deployment.The following chapters build on the ideas put forward by the European Research Cluster, the IoT European Platform Initiative (IoT–EPI), the IoT European Large-Scale Pilots Programme and the IoT European Security and Privacy Projects, presenting global views and state-of-the-art results regarding the next generation of IoT research, innovation, development, and deployment.The IoT and Industrial Internet of Things (IIoT) are evolving towards the next generation of Tactile IoT/IIoT, bringing together hyperconnectivity (5G and beyond), edge computing, Distributed Ledger Technologies (DLTs), virtual/ andaugmented reality (VR/AR), and artificial intelligence (AI) transformation.Following the wider adoption of consumer IoT, the next generation of IoT/IIoT innovation for business is driven by industries, addressing interoperability issues and providing new end-to-end security solutions to face continuous treats.The advances of AI technology in vision, speech recognition, natural language processing and dialog are enabling the development of end-to-end intelligent systems encapsulating multiple technologies, delivering services in real-time using limited resources. These developments are focusing on designing and delivering embedded and hierarchical AI solutions in IoT/IIoT, edge computing, using distributed architectures, DLTs platforms and distributed end-to-end security, which provide real-time decisions using less data and computational resources, while accessing each type of resource in a way that enhances the accuracy and performance of models in the various IoT/IIoT applications.The convergence and combination of IoT, AI and other related technologies to derive insights, decisions and revenue from sensor data provide new business models and sources of monetization. Meanwhile, scalable, IoT-enabled applications have become part of larger business objectives, enabling digital transformation with a focus on new services and applications.Serving the next generation of Tactile IoT/IIoT real-time use cases over 5G and Network Slicing technology is essential for consumer and industrial applications and support reducing operational costs, increasing efficiency and leveraging additional capabilities for real-time autonomous systems.New IoT distributed architectures, combined with system-level architectures for edge/fog computing, are evolving IoT platforms, including AI and DLTs, with embedded intelligence into the hyperconnectivity infrastructure.The next generation of IoT/IIoT technologies are highly transformational, enabling innovation at scale, and autonomous decision-making in various application domains such as healthcare, smart homes, smart buildings, smart cities, energy, agriculture, transportation and autonomous vehicles, the military, logistics and supply chain, retail and wholesale, manufacturing, mining and oil and gas

    Generating referring expressions in a domain of objects and processes

    Get PDF
    This thesis presents a collection of algorithms and data structures for the generation of pronouns, anaphoric definite noun phrases, and one-anaphoric phrases. After a close analysis of the particular kinds of referring expressions that appear in a particular domain -that of cookery recipes -the thesis presents an appropriate ontology and a corresponding representation language. This ontology is then integrated into a wider framework for language generation as a whole, whereupon we show how the representation language can be successfully used to produce appropriate referring expressions for a range of complex object types.Amongst the more important ideas explored in the thesis are the following:• We introduce the notion of a generalized physical object as a way of representing singular entities, mass entities, and entities which are sets.• We adopt the view that planning operators are essentially underspecified events, and use this, in conjunction with a simple model of the hearer, to allow us to determine the appropriate level of detail at which a given plan should be described.• We make use of a discourse model that distinguishes local and global focus, and is closely tied to a notion of discourse structure; and we introduce a notion of DISCRIMINATORY POWER as a means to choosing the content of a referring expression.• We present a model of the generation of referring expressions that makes use of two levels of intermediate representation, and integrate this model with the use of a linguistically- founded grammar for noun phrases.The thesis ends by making some suggestions for further extensions to the work reported here
    • …
    corecore