4,303 research outputs found

    Ultra high frequency (UHF) radio-frequency identification (RFID) for robot perception and mobile manipulation

    Get PDF
    Personal robots with autonomy, mobility, and manipulation capabilities have the potential to dramatically improve quality of life for various user populations, such as older adults and individuals with motor impairments. Unfortunately, unstructured environments present many challenges that hinder robot deployment in ordinary homes. This thesis seeks to address some of these challenges through a new robotic sensing modality that leverages a small amount of environmental augmentation in the form of Ultra High Frequency (UHF) Radio-Frequency Identification (RFID) tags. Previous research has demonstrated the utility of infrastructure tags (affixed to walls) for robot localization; in this thesis, we specifically focus on tagging objects. Owing to their low-cost and passive (battery-free) operation, users can apply UHF RFID tags to hundreds of objects throughout their homes. The tags provide two valuable properties for robots: a unique identifier and receive signal strength indicator (RSSI, the strength of a tag's response). This thesis explores robot behaviors and radio frequency perception techniques using robot-mounted UHF RFID readers that enable a robot to efficiently discover, locate, and interact with UHF RFID tags applied to objects and people of interest. The behaviors and algorithms explicitly rely on the robot's mobility and manipulation capabilities to provide multiple opportunistic views of the complex electromagnetic landscape inside a home environment. The electromagnetic properties of RFID tags change when applied to common household objects. Objects can have varied material properties, can be placed in diverse orientations, and be relocated to completely new environments. We present a new class of optimization-based techniques for RFID sensing that are robust to the variation in tag performance caused by these complexities. We discuss a hybrid global-local search algorithm where a robot employing long-range directional antennas searches for tagged objects by maximizing expected RSSI measurements; that is, the robot attempts to position itself (1) near a desired tagged object and (2) oriented towards it. The robot first performs a sparse, global RFID search to locate a pose in the neighborhood of the tagged object, followed by a series of local search behaviors (bearing estimation and RFID servoing) to refine the robot's state within the local basin of attraction. We report on RFID search experiments performed in Georgia Tech's Aware Home (a real home). Our optimization-based approach yields superior performance compared to state of the art tag localization algorithms, does not require RF sensor models, is easy to implement, and generalizes to other short-range RFID sensor systems embedded in a robot's end effector. We demonstrate proof of concept applications, such as medication delivery and multi-sensor fusion, using these techniques. Through our experimental results, we show that UHF RFID is a complementary sensing modality that can assist robots in unstructured human environments.PhDCommittee Chair: Kemp, Charles C.; Committee Member: Abowd, Gregory; Committee Member: Howard, Ayanna; Committee Member: Ingram, Mary Ann; Committee Member: Reynolds, Matt; Committee Member: Tentzeris, Emmanoui

    Unmasking Clever Hans Predictors and Assessing What Machines Really Learn

    Full text link
    Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly "intelligent" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to well-informed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner.Comment: Accepted for publication in Nature Communication

    Towards Unifying Grounded and Distributional Semantics Using the Words-as-Classifiers Model of Lexical Semantics

    Get PDF
    Automated systems that make use of language, such as personal assistants, need some means of representing words such that 1) the representation is computable and 2) captures form and meaning. Recent advancements in the field of natural language processing have resulted in useful approaches to representing computable word meanings. In this thesis, I consider two such approaches: distributional embeddings and grounded models. Distributional embeddings are represented as high-dimensional vectors; words with similar meanings tend to cluster together in embedding space. Embeddings are easily learned using large amounts of text data. However, embeddings suffer from a lack of real world knowledge; for example, the knowledge of identifying colors or objects as they appear. In contrast to embeddings, grounded models learn a mapping between language and the physical world, such as visual information in pictures. Grounded models, however, tend to focus only on the mapping between language and the physical world and lack the knowledge that could be gained from considering abstract information found in text. In this thesis, I evaluate wac2vec, a model that brings together grounded and distributional semantics to work towards leveraging the relative strengths of both, and use empirical analysis to explore whether wac2vec adds semantic information to traditional embeddings. Starting with the words-as-classifiers (WAC) model of grounded semantics, I use a large repository of images and the keywords that were used to retrieve those images. From the grounded model, I extract classifier coefficients as word-level vector embeddings (hence, wac2vec), then combine those with embeddings from distributional word representations. I show that combining grounded embeddings with traditional embeddings results in improved performance in a visual task, demonstrating the viability of using the wac2vec model to enrich traditional embeddings, and showing that wac2vec provides important semantic information that these embeddings do not have on their own

    Confluence of Vision and Natural Language Processing for Cross-media Semantic Relations Extraction

    Get PDF
    In this dissertation, we focus on extracting and understanding semantically meaningful relationships between data items of various modalities; especially relations between images and natural language. We explore the ideas and techniques to integrate such cross-media semantic relations for machine understanding of large heterogeneous datasets, made available through the expansion of the World Wide Web. The datasets collected from social media websites, news media outlets and blogging platforms usually contain multiple modalities of data. Intelligent systems are needed to automatically make sense out of these datasets and present them in such a way that humans can find the relevant pieces of information or get a summary of the available material. Such systems have to process multiple modalities of data such as images, text, linguistic features, and structured data in reference to each other. For example, image and video search and retrieval engines are required to understand the relations between visual and textual data so that they can provide relevant answers in the form of images and videos to the users\u27 queries presented in the form of text. We emphasize the automatic extraction of semantic topics or concepts from the data available in any form such as images, free-flowing text or metadata. These semantic concepts/topics become the basis of semantic relations across heterogeneous data types, e.g., visual and textual data. A classic problem involving image-text relations is the automatic generation of textual descriptions of images. This problem is the main focus of our work. In many cases, large amount of text is associated with images. Deep exploration of linguistic features of such text is required to fully utilize the semantic information encoded in it. A news dataset involving images and news articles is an example of this scenario. We devise frameworks for automatic news image description generation based on the semantic relations of images, as well as semantic understanding of linguistic features of the news articles

    Deep Architectures for Visual Recognition and Description

    Get PDF
    In recent times, digital media contents are inherently of multimedia type, consisting of the form text, audio, image and video. Several of the outstanding computer Vision (CV) problems are being successfully solved with the help of modern Machine Learning (ML) techniques. Plenty of research work has already been carried out in the field of Automatic Image Annotation (AIA), Image Captioning and Video Tagging. Video Captioning, i.e., automatic description generation from digital video, however, is a different and complex problem altogether. This study compares various existing video captioning approaches available today and attempts their classification and analysis based on different parameters, viz., type of captioning methods (generation/retrieval), type of learning models employed, the desired output description length generated, etc. This dissertation also attempts to critically analyze the existing benchmark datasets used in various video captioning models and the evaluation metrics for assessing the final quality of the resultant video descriptions generated. A detailed study of important existing models, highlighting their comparative advantages as well as disadvantages are also included. In this study a novel approach for video captioning on the Microsoft Video Description (MSVD) dataset and Microsoft Video-to-Text (MSR-VTT) dataset is proposed using supervised learning techniques to train a deep combinational framework, for achieving better quality video captioning via predicting semantic tags. We develop simple shallow CNN (2D and 3D) as feature extractors, Deep Neural Networks (DNNs and Bidirectional LSTMs (BiLSTMs) as tag prediction models and Recurrent Neural Networks (RNNs) (LSTM) model as the language model. The aim of the work was to provide an alternative narrative to generating captions from videos via semantic tag predictions and deploy simpler shallower deep model architectures with lower memory requirements as solution so that it is not very memory extensive and the developed models prove to be stable and viable options when the scale of the data is increased. This study also successfully employed deep architectures like the Convolutional Neural Network (CNN) for speeding up automation process of hand gesture recognition and classification of the sign languages of the Indian classical dance form, ‘Bharatnatyam’. This hand gesture classification is primarily aimed at 1) building a novel dataset of 2D single hand gestures belonging to 27 classes that were collected from (i) Google search engine (Google images), (ii) YouTube videos (dynamic and with background considered) and (iii) professional artists under staged environment constraints (plain backgrounds). 2) exploring the effectiveness of CNNs for identifying and classifying the single hand gestures by optimizing the hyperparameters, and 3) evaluating the impacts of transfer learning and double transfer learning, which is a novel concept explored for achieving higher classification accuracy
    • …
    corecore