Search CORE

12,490 research outputs found

Food places classification in egocentric images using Siamese neural networks.

Author: Abdel-Nasser Mohamed
Banu Syeda Furruka
Chambon Sylvie
Puig Domenec
Radeva Petia
Rashwan Hatem A.
Sarker Md Mostafa Kamal
Singh Vivek Kumar
Publication venue: 'IOS Press'
Publication date: 31/12/2019
Field of study

Wearable cameras have become more popular in recent years for capturing unscripted moments in the first-person, which help in analysis of the user's lifestyle. In this work, we aim to identify the daily food patterns of a person through recognition of places relating to food in person-focused images ("selfies"). This has the potential for a system that can assist with improvements to eating habits and prevention of diet-related conditions. In this paper, we use Siamese Neural Networks (SNN) to learn similarities between images with one-shot "food places" classification. We tested our proposed method with "MiniEgoFoodPlaces", using 15 food-related locations. The proposed SNN model with MobileNet achieved an overall classification accuracy of 76.74% and 77.53% on the validation and test sets of the "MiniEgoFoodPlaces" dataset, outperforming the base models such as ResNet50, InceptionV3 and InceptionResNetV2

Open Access Institutional Repository at Robert Gordon University

Deep Learning for Logo Detection: A Survey

Author: Hou Qiang
Hou Sujuan
Jiang Shuqiang
Li Jiacheng
Min Weiqing
Zhao Yanna
Zheng Yuanjie
Publication venue
Publication date: 09/10/2022
Field of study

When logos are increasingly created, logo detection has gradually become a research hotspot across many domains and tasks. Recent advances in this area are dominated by deep learning-based solutions, where many datasets, learning strategies, network architectures, etc. have been employed. This paper reviews the advance in applying deep learning techniques to logo detection. Firstly, we discuss a comprehensive account of public datasets designed to facilitate performance evaluation of logo detection algorithms, which tend to be more diverse, more challenging, and more reflective of real life. Next, we perform an in-depth analysis of the existing logo detection strategies and the strengths and weaknesses of each learning strategy. Subsequently, we summarize the applications of logo detection in various fields, from intelligent transportation and brand monitoring to copyright and trademark compliance. Finally, we analyze the potential challenges and present the future directions for the development of logo detection to complete this survey

arXiv.org e-Print Archive

Inner Monologue: Embodied Reasoning through Planning with Language Models

Author: Brown Noah
Chan Harris
Chebotar Yevgen
Florence Pete
Hausman Karol
Huang Wenlong
Ichter Brian
Jackson Tomas
Levine Sergey
Liang Jacky
Luu Linda
Mordatch Igor
Sermanet Pierre
Tompson Jonathan
Xia Fei
Xiao Ted
Zeng Andy
Publication venue
Publication date: 12/07/2022
Field of study

Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots. These embodied problems require an agent to understand many semantic aspects of the world: the repertoire of skills available, how these skills influence the world, and how changes to the world map back to the language. LLMs planning in embodied environments need to consider not just what skills to do, but also how and when to do them - answers that change over time in response to the agent's own choices. In this work, we investigate to what extent LLMs used in such embodied contexts can reason over sources of feedback provided through natural language, without any additional training. We propose that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios. We investigate a variety of sources of feedback, such as success detection, scene description, and human interaction. We find that closed-loop language feedback significantly improves high-level instruction completion on three domains, including simulated and real table top rearrangement tasks and long-horizon mobile manipulation tasks in a kitchen environment in the real world.Comment: Project website: https://innermonologue.github.i

arXiv.org e-Print Archive

AGI for Agriculture

Author: Chai Lilong
Dai Haixing
Li Changying
Li Changying
Li Sheng
Liu Ninghao
Liu Tianming
Lu Guoyu
Mai Gengchen
Petti Daniel
Sun Haijian
Sun Jin
Wang Xianqiao
Xu Rui
Zhu Dajiang
Publication venue
Publication date: 12/04/2023
Field of study

Artificial General Intelligence (AGI) is poised to revolutionize a variety of sectors, including healthcare, finance, transportation, and education. Within healthcare, AGI is being utilized to analyze clinical medical notes, recognize patterns in patient data, and aid in patient management. Agriculture is another critical sector that impacts the lives of individuals worldwide. It serves as a foundation for providing food, fiber, and fuel, yet faces several challenges, such as climate change, soil degradation, water scarcity, and food security. AGI has the potential to tackle these issues by enhancing crop yields, reducing waste, and promoting sustainable farming practices. It can also help farmers make informed decisions by leveraging real-time data, leading to more efficient and effective farm management. This paper delves into the potential future applications of AGI in agriculture, such as agriculture image processing, natural language processing (NLP), robotics, knowledge graphs, and infrastructure, and their impact on precision livestock and precision crops. By leveraging the power of AGI, these emerging technologies can provide farmers with actionable insights, allowing for optimized decision-making and increased productivity. The transformative potential of AGI in agriculture is vast, and this paper aims to highlight its potential to revolutionize the industry

arXiv.org e-Print Archive

Egocentric Vision-based Action Recognition: A survey

Author: Arganda Carreras Ignacio
Azkune Galparsoro Gorka
Núñez Marcos Adrián
Publication venue: 'Elsevier BV'
Publication date: 01/02/2022
Field of study

[EN] The egocentric action recognition EAR field has recently increased its popularity due to the affordable and lightweight wearable cameras available nowadays such as GoPro and similars. Therefore, the amount of egocentric data generated has increased, triggering the interest in the understanding of egocentric videos. More specifically, the recognition of actions in egocentric videos has gained popularity due to the challenge that it poses: the wild movement of the camera and the lack of context make it hard to recognise actions with a performance similar to that of third-person vision solutions. This has ignited the research interest on the field and, nowadays, many public datasets and competitions can be found in both the machine learning and the computer vision communities. In this survey, we aim to analyse the literature on egocentric vision methods and algorithms. For that, we propose a taxonomy to divide the literature into various categories with subcategories, contributing a more fine-grained classification of the available methods. We also provide a review of the zero-shot approaches used by the EAR community, a methodology that could help to transfer EAR algorithms to real-world applications. Finally, we summarise the datasets used by researchers in the literature.We gratefully acknowledge the support of the Basque Govern-ment's Department of Education for the predoctoral funding of the first author. This work has been supported by the Spanish Government under the FuturAAL-Context project (RTI2018-101045-B-C21) and by the Basque Government under the Deustek project (IT-1078-16-D)

Archivo Digital para la Docencia y la Investigación

Fourteenth Biennial Status Report: März 2017 - February 2019

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2019
Field of study

MPG.PuRe

Automatic Food Intake Assessment Using Camera Phones

Author: Kong Fanyu
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2012
Field of study

Obesity is becoming an epidemic phenomenon in most developed countries. The fundamental cause of obesity and overweight is an energy imbalance between calories consumed and calories expended. It is essential to monitor everyday food intake for obesity prevention and management. Existing dietary assessment methods usually require manually recording and recall of food types and portions. Accuracy of the results largely relies on many uncertain factors such as user\u27s memory, food knowledge, and portion estimations. As a result, the accuracy is often compromised. Accurate and convenient dietary assessment methods are still blank and needed in both population and research societies. In this thesis, an automatic food intake assessment method using cameras, inertial measurement units (IMUs) on smart phones was developed to help people foster a healthy life style. With this method, users use their smart phones before and after a meal to capture images or videos around the meal. The smart phone will recognize food items and calculate the volume of the food consumed and provide the results to users. The technical objective is to explore the feasibility of image based food recognition and image based volume estimation. This thesis comprises five publications that address four specific goals of this work: (1) to develop a prototype system with existing methods to review the literature methods, find their drawbacks and explore the feasibility to develop novel methods; (2) based on the prototype system, to investigate new food classification methods to improve the recognition accuracy to a field application level; (3) to design indexing methods for large-scale image database to facilitate the development of new food image recognition and retrieval algorithms; (4) to develop novel convenient and accurate food volume estimation methods using only smart phones with cameras and IMUs. A prototype system was implemented to review existing methods. Image feature detector and descriptor were developed and a nearest neighbor classifier were implemented to classify food items. A reedit card marker method was introduced for metric scale 3D reconstruction and volume calculation. To increase recognition accuracy, novel multi-view food recognition algorithms were developed to recognize regular shape food items. To further increase the accuracy and make the algorithm applicable to arbitrary food items, new food features, new classifiers were designed. The efficiency of the algorithm was increased by means of developing novel image indexing method in large-scale image database. Finally, the volume calculation was enhanced through reducing the marker and introducing IMUs. Sensor fusion technique to combine measurements from cameras and IMUs were explored to infer the metric scale of the 3D model as well as reduce noises from these sensors

Michigan Technological University