Search CORE

163 research outputs found

Spectral Subtraction of Robot Motion Noise for Improved Event Detection in Tactile Acceleration Signals

Author: D.A. Kontarinis
H.Y. Yao
I.M.L.C. Vogels
J. Allen
J. Bell
K.J. Kuchenbecker
K.J. Waldron
N. Landin
R.S. Johansson
S. Boll
S. Okamoto
W. McMahan
Y. Ephraim
Publication venue: ScholarlyCommons
Publication date: 01/01/2012
Field of study

New robots for teleoperation and autonomous manipulation are increasingly being equipped with high-bandwidth accelerometers for measuring the transient vibrational cues that occur during con- tact with objects. Unfortunately, the robot\u27s own internal mechanisms often generate significant high-frequency accelerations, which we term ego-vibrations. This paper presents an approach to characterizing and removing these signals from acceleration measurements. We adapt the audio processing technique of spectral subtraction over short time windows to remove the noise that is estimated to occur at the robot\u27s present joint velocities. Implementation for the wrist roll and gripper joints on a Willow Garage PR2 robot demonstrates that spectral subtraction significantly increases signal-to-noise ratio, which should improve vibrotactile event detection in both teleoperation and autonomous robotics

CiteSeerX

Crossref

ScholarlyCommons@Penn

GPT Models Meet Robotic Applications: Co-Speech Gesturing Chat System

Author: Ikeuchi Katsushi
Kanehira Atsushi
Sasabuchi Kazuhiro
Takamatsu Jun
Wake Naoki
Publication venue
Publication date: 10/05/2023
Field of study

This technical paper introduces a chatting robot system that utilizes recent advancements in large-scale language models (LLMs) such as GPT-3 and ChatGPT. The system is integrated with a co-speech gesture generation system, which selects appropriate gestures based on the conceptual meaning of speech. Our motivation is to explore ways of utilizing the recent progress in LLMs for practical robotic applications, which benefits the development of both chatbots and LLMs. Specifically, it enables the development of highly responsive chatbot systems by leveraging LLMs and adds visual effects to the user interface of LLMs as an additional value. The source code for the system is available on GitHub for our in-house robot (https://github.com/microsoft/LabanotationSuite/tree/master/MSRAbotChatSimulation) and GitHub for Toyota HSR (https://github.com/microsoft/GPT-Enabled-HSR-CoSpeechGestures)

arXiv.org e-Print Archive

Réduction de l'égo-bruit de robots

Author: Lagacé Pierre-Olivier
Publication venue: 'Universite de Sherbrooke'
Publication date: 01/01/2023
Field of study

En robotique, il est désirable d’équiper les robots du sens de l’audition afin de mieux interagir avec les utilisateurs et l’environnement. Cependant, le bruit causé par les actionneurs des robots, nommé égo-bruit, réduit considérablement la qualité des segments audios. Conséquemment, la performance des techniques de reconnaissance de la parole et de détection d’évènements sonores est limitée par la quantité de bruit que le robot produit durant ses mouvements. Le bruit généré par les robots diffère considérablement selon l’environnement, les moteurs, les matériaux utilisés et même selon l’intégrité des différentes composantes mécaniques. L’objectif du projet est de concevoir un modèle de réduction d’égo-bruit robuste utilisant plusieurs microphones et d’être capable de le calibrer rapidement sur un robot mobile. Ce mémoire présente une méthode de réduction de l’égo-bruit combinant l’apprentissage de gabarit de matrice de covariance du bruit à un algorithme de formation de faisceau de réponses à variance minimum sans distorsion. L’approche utilisée pour l’apprentissage des matrices de covariances permet d’enregistrer les caractéristiques spatiales de l’égo-bruit en moins de deux minutes pour chaque nouvel environnement. L’algorithme de faisceau permet, quant à lui, de réduire l’égo-bruit du signal bruité sans l’ajout de distorsion nonlinéaire dans le signal résultant. La méthode est implémentée sous Robot Operating System pour une utilisation simple et rapide sur différents robots. L’évaluation de cette nouvelle méthode a été effectuée sur un robot réel dans trois environnements différents : une petite salle, une grande salle et un corridor de bureau. L’augmentation du ratio signal-bruit est d’environ 10 dB et est constante entre les trois salles. La réduction du taux d’erreur des mots de la reconnaissance vocale se situe entre 30 % et 55 %. Le modèle a aussi été testé pour la détection d’évènements sonores. Une augmentation de 7 % à 20 % de la précision moyenne a été mesurée pour la détection de la musique, mais aucune augmentation significative pour la parole, les cris, les portes qui ferment et les alarmes. La méthode proposée permet une utilisation plus accessible de la reconnaissance vocale sur des robots bruyants. De plus, une analyse des principaux paramètres a permis de valider leurs impacts sur la performance du système. Les performances sont meilleures lorsque le système est calibré avec plus de bruit du robot et lorsque la longueur des segments utilisés est plus longue. La taille de la Transformée de Fourier rapide à court terme (Short-Time Fourier Transform) peut être réduite pour réduire le temps de traitement du système. Cependant, la taille de cette transformée impacte aussi la résolution des caractéristiques du signal résultant. Un compromis doit être faire entre un faible temps de traitement et la qualité du signal en sortie du système

Savoirs UdeS

A Blind Source Separation Framework for Ego-Noise Reduction on Multi-Rotor Drones

Author: Cavallaro A
Wang L
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Crossref

Queen Mary Research Online

Audio-Motor Integration for Robot Audition

Author: Aharon
Argentieri
Aytekin
Barfuss
Berglund
Bernard
Bernard
Braasch
Bustamante
Cooke
Davis
Deleforge
Deleforge
Deleforge
Deleforge
Deleforge
Deleforge
Deleforge
Evers
Furukawa
Gannot
Gaultier
Gouaillier
Haykin
Hofman
Hofman
Hornstein
Huang
Huggins-Daines
Ince
Ito
Kato
Kneip
Kreković
Li
Löllmann
Löllmann
Ma
Ma
Magassouba
May
Middlebrooks
Nakadai
Nakadai
Nakadai
Nakadai
Naylor
Nguyen
O'Regan
Otani
Otsuka
Ozerov
Perrett
Poincaré
Portello
Prasad
Rascon
Sanchez-Riera
Sawada
Schmidt
Schmidt
Schölkopf
Smaragdis
Strutt (Lord Rayleigh)
Talmon
Thurlow
Tourbabin
Tropp
Valin
Vincent
Vincent
Virtanen
Wallach
Wang
Wang
Wightman
Wright
Xiao
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 19/11/2018
Field of study

International audienceIn the context of robotics, audio signal processing in the wild amounts to dealing with sounds recorded by a system that moves and whose actuators produce noise. This creates additional challenges in sound source localization, signal enhancement and recognition. But the speci-ficity of such platforms also brings interesting opportunities: can information about the robot actuators' states be meaningfully integrated in the audio processing pipeline to improve performance and efficiency? While robot audition grew to become an established field, methods that explicitly use motor-state information as a complementary modality to audio are scarcer. This chapter proposes a unified view of this endeavour, referred to as audio-motor integration. A literature review and two learning-based methods for audio-motor integration in robot audition are presented, with application to single-microphone sound source localization and ego-noise reduction on real data

Crossref

INRIA a CCSD electronic archive server

ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application

Author: Ikeuchi Katsushi
Kanehira Atsushi
Sasabuchi Kazuhiro
Takamatsu Jun
Wake Naoki
Publication venue
Publication date: 01/01/2023
Field of study

This paper demonstrates how OpenAI's ChatGPT can be used in a few-shot setting to convert natural language instructions into an executable robot action sequence. The paper proposes easy-to-customize input prompts for ChatGPT that meet common requirements in practical applications, such as easy integration with robot execution systems and applicability to various environments while minimizing the impact of ChatGPT's token limit. The prompts encourage ChatGPT to output a sequence of predefined robot actions, represent the operating environment in a formalized style, and infer the updated state of the operating environment. Experiments confirmed that the proposed prompts enable ChatGPT to act according to requirements in various environments, and users can adjust ChatGPT's output with natural language feedback for safe and robust operation. The proposed prompts and source code are open-source and publicly available at https://github.com/microsoft/ChatGPT-Robot-Manipulation-PromptsComment: 17 figures. Last updated April 11th, 202

arXiv.org e-Print Archive

Directory of Open Access Journals

Recommended from our members

Recognizing human activity using RGBD data

Author: Xia Lu, active 21st century
Publication venue
Publication date: 03/07/2014
Field of study

textTraditional computer vision algorithms try to understand the world using visible light cameras. However, there are inherent limitations of this type of data source. First, visible light images are sensitive to illumination changes and background clutter. Second, the 3D structural information of the scene is lost when projecting the 3D world to 2D images. Recovering the 3D information from 2D images is a challenging problem. Range sensors have existed for over thirty years, which capture 3D characteristics of the scene. However, earlier range sensors were either too expensive, difficult to use in human environments, slow at acquiring data, or provided a poor estimation of distance. Recently, the easy access to the RGBD data at real-time frame rate is leading to a revolution in perception and inspired many new research using RGBD data. I propose algorithms to detect persons and understand the activities using RGBD data. I demonstrate the solutions to many computer vision problems may be improved with the added depth channel. The 3D structural information may give rise to algorithms with real-time and view-invariant properties in a faster and easier fashion. When both data sources are available, the features extracted from the depth channel may be combined with traditional features computed from RGB channels to generate more robust systems with enhanced recognition abilities, which may be able to deal with more challenging scenarios. As a starting point, the first problem is to find the persons of various poses in the scene, including moving or static persons. Localizing humans from RGB images is limited by the lighting conditions and background clutter. Depth image gives alternative ways to find the humans in the scene. In the past, detection of humans from range data is usually achieved by tracking, which does not work for indoor person detection. In this thesis, I propose a model based approach to detect the persons using the structural information embedded in the depth image. I propose a 2D head contour model and a 3D head surface model to look for the head-shoulder part of the person. Then, a segmentation scheme is proposed to segment the full human body from the background and extract the contour. I also give a tracking algorithm based on the detection result. I further research on recognizing human actions and activities. I propose two features for recognizing human activities. The first feature is drawn from the skeletal joint locations estimated from a depth image. It is a compact representation of the human posture called histograms of 3D joint locations (HOJ3D). This representation is view-invariant and the whole algorithm runs at real-time. This feature may benefit many applications to get a fast estimation of the posture and action of the human subject. The second feature is a spatio-temporal feature for depth video, which is called Depth Cuboid Similarity Feature (DCSF). The interest points are extracted using an algorithm that effectively suppresses the noise and finds salient human motions. DCSF is extracted centered on each interest point, which forms the description of the video contents. This descriptor can be used to recognize the activities with no dependence on skeleton information or pre-processing steps such as motion segmentation, tracking, or even image de-noising or hole-filling. It is more flexible and widely applicable to many scenarios. Finally, all the features herein developed are combined to solve a novel problem: first-person human activity recognition using RGBD data. Traditional activity recognition algorithms focus on recognizing activities from a third-person perspective. I propose to recognize activities from a first-person perspective with RGBD data. This task is very novel and extremely challenging due to the large amount of camera motion either due to self exploration or the response of the interaction. I extracted 3D optical flow features as the motion descriptor, 3D skeletal joints features as posture descriptors, spatio-temporal features as local appearance descriptors to describe the first-person videos. To address the ego-motion of the camera, I propose an attention mask to guide the recognition procedures and separate the features on the ego-motion region and independent-motion region. The 3D features are very useful at summarizing the discerning information of the activities. In addition, the combination of the 3D features with existing 2D features brings more robust recognition results and make the algorithm capable of dealing with more challenging cases.Electrical and Computer Engineerin

Texas ScholarWorks

Developing a Noise-Robust Beat Learning Algorithm for Music-Information Retrieval

Author: Grunberg David Kurt
Publication venue: Drexel University
Publication date
Field of study

The field of Music-Information Retrieval (Music-IR) involves the development of algorithms that can analyze musical audio and extract various high-level musical features. Many such algorithms have been developed, and systems now exist that can reliably identify features such as beat locations, tempo, and rhythm from musical sources. These features in turn are used to assist in a variety of music-related tasks ranging from automatically creating playlists that match specified criteria to synchronizing various elements, such as computer graphics, with a performance. These Music-IR systems thus help humans to enjoy and interact with music. While current systems for identifying beats in music are have found widespread utility, most of them have been developed on music that is relatively free of acoustic noise. Much of the music that humans listen to, though, is performed in noisy environments. People often enjoy music in crowded clubs and noisy rooms, but this music is much more challenging for Music-IR systems to analyze, and current beat trackers generally perform poorly on musical audio heard in such conditions. If our algorithms could accurately process this music, though, it would enable this music too to be used in applications such as automatic song selection, which are currently limited to music taken directly from professionally-produced digital files that have little acoustic noise. Noise-robust beat learning algorithms would also allow for additional types of performance augmentation which create noise and thus cannot be used with current algorithms. Such a system, for instance, could aid robots in performing synchronously with music, whereas current systems are generally unable to accurately process audio heard in conjunction with noisy robot motors. This work aims to present a new approach for learning beats and identifying both their temporal locations and their spectral characteristics for music recorded in the presence of noise. First, datasets of musical audio recorded in environments with multiple types of noise were collected and annotated. Noise sources used for these datasets included HVAC sounds from a room, chatter from a crowded bar, and fans and motor noises from a moving robot. Second, an algorithm for learning and locating musical beats was developed which incorporates signal processing and machine learning techniques such as Harmonic-Percussive Source Separation and Probabilistic Latent Component Analysis. A representation of the musical signal called the stacked spectrogram was also utilized in order to better represent the time-varying nature of the beats. Unlike many current systems, which assume that the beat locations will be correlated with some hand-crafted features, this system learns the beats directly from the acoustic signal. Finally, the algorithm was tested against several state-of-the-art beat trackers on the audio datasets. The resultant system was found to significantly outperform the state-of-the-art when evaluated on audio played in realistically noisy conditions.Ph.D., Electrical Engineering -- Drexel University, 201

Drexel Libraries E-Repository and Archives