10 research outputs found

    A Real-Time ASL Recognition System Using Leap Motion Sensors

    Get PDF
    2015-2016 > Academic research: refereed > Refereed conference paperAccepted ManuscriptPublishe

    Visual recognition of American sign language using hidden Markov models

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1995.Includes bibliographical references (leaves 48-52).by Thad Eugene Starner.M.S

    Recognizing classical ballet steps using plase space constraints

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1995.Includes bibliographical references (leaves 64-68).by Lee Winston Campbell.M.S

    On-line handwriting recognition using hidden Markov models

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (p. 84-86).by Han Shu.M.Eng

    Incorporation of relational information in feature representation for online handwriting recognition of Arabic characters

    Get PDF
    Interest in online handwriting recognition is increasing due to market demand for both improved performance and for extended supporting scripts for digital devices. Robust handwriting recognition of complex patterns of arbitrary scale, orientation and location is elusive to date because reaching a target recognition rate is not trivial for most of the applications in this field. Cursive scripts such as Arabic and Persian with complex character shapes make the recognition task even more difficult. Challenges in the discrimination capability of handwriting recognition systems depend heavily on the effectiveness of the features used to represent the data, the types of classifiers deployed and inclusive databases used for learning and recognition which cover variations in writing styles that introduce natural deformations in character shapes. This thesis aims to improve the efficiency of online recognition systems for Persian and Arabic characters by presenting new formal feature representations, algorithms, and a comprehensive database for online Arabic characters. The thesis contains the development of the first public collection of online handwritten data for the Arabic complete-shape character set. New ideas for incorporating relational information in a feature representation for this type of data are presented. The proposed techniques are computationally efficient and provide compact, yet representative, feature vectors. For the first time, a hybrid classifier is used for recognition of online Arabic complete-shape characters based on the idea of decomposing the input data into variables representing factors of the complete-shape characters and the combined use of the Bayesian network inference and support vector machines. We advocate the usefulness and practicality of the features and recognition methods with respect to the recognition of conventional metrics, such as accuracy and timeliness, as well as unconventional metrics. In particular, we evaluate a feature representation for different character class instances by its level of separation in the feature space. Our evaluation results for the available databases and for our own database of the characters' main shapes confirm a higher efficiency than previously reported techniques with respect to all metrics analyzed. For the complete-shape characters, our techniques resulted in a unique recognition efficiency comparable with the state-of-the-art results for main shape characters

    Advancement and application of sparse coding approaches for the analysis of arm movement trajectories

    Get PDF
    Eine von vielen Modalitäten zur Vermittlung von Information in Interaktion zwischen Mensch und Maschine ist die Gestik. Mit Hilfe dynamischer Gesten können sowohl Begriffe, als auch Emotionen kommuniziert werden. In dieser Arbeit wird der zeitliche Verlauf der Position einer Gliedmaße bei Ausführung der Geste, die sogenannte Bewegungstrajektorie, betrachtet. Damit eine Maschine Gesten wahrnehmen kann, müssen die Trajektorien mittels Sensoren aufgenommen werden und anschließend durch eine entsprechende Verarbeitung der Daten interpretiert werden. Dabei kommt ein mehrstufiger Mustererkennungsprozess zum Einsatz. Ein Schritt in diesem Prozess ist die Merkmalsextraktion, welche das aufgenommene Signal in einer kompakten Form darstellt. Diese Arbeit widmet sich einer Untersuchung zur Anwendung von Sparse Coding in der Merkmalsextraktion für Bewegungstrajektorien. Sparse Coding kann eine Datenmenge durch eine beschränkte Menge repräsentativer, wiederkehrender Merkmale darstellen. Diese Merkmale werden in einer Lernphase aus Trainingsdaten gelernt und in der Kannphase in einem unbekannten Signal detektiert. Dieses Konzept hat gegenüber konventionellen Methoden zur Merkmalsextraktion in zeitlichen Signalen den Vorteil, dass die Merkmale optimal an die Daten angepasst sind und so die charakteristischen Eigenschaften der Trainingsdaten beschreiben. In dieser Arbeit wird das Verfahren für die Anwendung auf Bewegungstrajektorien optimiert. Es wird untersucht, unter welchen Rahmenbedingungen Sparse Coding für Bewegungstrajektorien anwendbar ist und wie die aufgenommenen Daten vorverarbeitet werden müssen. Des Weiteren werden die Auswirkungen des Verfahrens auf nachgelagerte Verarbeitungsschritte im Mustererkennungsprozess, wie die Klassifikation und die Generierung von Bewegungstrajektorien, betrachtet. Die Leistungsfähigkeit des Verfahrens beim Einsatz in der Gestenerkennung wird in Experimenten anhand eines, im Rahmen dieser Arbeit selbst erstellten, Datensatzes demonstriert. Um die Generalisierbarkeit des Verfahrens auf andere Anwendungsdomänen zu untersuchen, wird es auf Benchmark-Datensätze aus den Bereichen der Activity Recognition und der Handschrifterkennung angewendet. Des Weiteren wird eine echtzeitfähige Implementierung des Verfahrens in einer Demonstrator-Applikation vorgestellt.One modality for the transmission of information during interaction between a human and a machine is gesticulation, by which concepts and emotions can be communicated. In this work, the temporal evolution of the position of one limb during the performance of the gesture, the so called movement trajectory, is analyzed. For a machine to be able to perceive a gesture, those trajectories must be recorded via sensors, and the data must be interpreted by means of a suitable data processing mechanism. This data processing is usually implemented by a pattern recognition pipeline, consisting of multiple processing steps. One of those steps is the feature extraction, the purpose of which is to represent the incoming data in a compact form. In this work, the applicability of Sparse Coding as a feature extraction step in the pattern recognition pipeline is investigated. The main motivation for this research is the ability of Sparse Coding to represent a dataset with a minimal set of representative and recurring features. Those features are learnt in a learning phase and are detected in an unknown signal in the application phase. Compared to conventional methods for feature extraction in temporal signals, this approach has the advantage that the features are adapted to the domain specific data and can thus capture optimally the characteristics of the training data. In this work, the a general Sparse Coding approach is adapted for the application to movement trajectories. It is investigated, under which preconditions Sparse Coding is applicable to movement trajectories, and how the data must be pre-processed. Further, the effects of the application of the approach for down-stream processing steps, like classification and generation of movement trajectories are examined. Particularly for classification, there are interesting advantages arising from the way Sparse Coding is representing the data. The feasibility of the approach for processing movement trajectories is demonstrated in experiments on a gesture dataset that has been recorded as part of this work. To show the generalizability of the approach to other application domains, it is applied to benchmark dataset from the fields of activity recognition and handwriting recognition. Further a real-time capable implementation of the approach in form of a demonstrator application is described

    Wearable computing and contextual awareness

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 1999.Includes bibliographical references (leaves 231-248).Computer hardware continues to shrink in size and increase in capability. This trend has allowed the prevailing concept of a computer to evolve from the mainframe to the minicomputer to the desktop. Just as the physical hardware changes, so does the use of the technology, tending towards more interactive and personal systems. Currently, another physical change is underway, placing computational power on the user's body. These wearable machines encourage new applications that were formerly infeasible and, correspondingly, will result in new usage patterns. This thesis suggests that the fundamental improvement offered by wearable computing is an increased sense of user context. I hypothesize that on-body systems can sense the user's context with little or no assistance from environmental infrastructure. These body-centered systems that "see" as the user sees and "hear" as the user hears, provide a unique "first-person" viewpoint of the user's environment. By exploiting models recovered by these systems, interfaces are created which require minimal directed action or attention by the user. In addition, more traditional applications are augmented by the contextual information recovered by these systems. To investigate these issues, I provide perceptually sensible tools for recovering and modeling user context in a mobile, everyday environment. These tools include a downward-facing, camera-based system for establishing the location of the user; a tag-based object recognition system for augmented reality; and several on-body gesture recognition systems to identify various user tasks in constrained environments. To address the practicality of contextually-aware wearable computers, issues of power recovery, heat dissipation, and weight distribution are examined. In addition, I have encouraged a community of wearable computer users at the Media Lab through design, management, and support of hardware and software infrastructure. This unique community provides a heightened awareness of the use and social issues of wearable computing. As much as possible, the lessons from this experience will be conveyed in the thesis.by Thad Eugene Starner.Ph.D

    Machine learning techniques for music information retrieval

    Get PDF
    Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2015The advent of digital music has changed the rules of music consumption, distribution and sales. With it has emerged the need to effectively search and manage vast music collections. Music information retrieval is an interdisciplinary field of research that focuses on the development of new techniques with that aim in mind. This dissertation addresses a specific aspect of this field: methods that automatically extract musical information exclusively based on the audio signal. We propose a method for automatic music-based classification, label inference, and music similarity estimation. Our method consist in representing the audio with a finite set of symbols and then modeling the symbols time evolution. The symbols are obtained via vector quantization in which a single codebook is used to quantize the audio descriptors. The symbols time evolution is modeled via a first order Markov process. Based on systematic evaluations we carried out on publicly available sets, we show that our method achieves performances on par with most techniques found in literature. We also present and discuss the problems that appear when computers try to classify or annotate songs using the audio as the only source of information. In our method, the separation of quantization process from the creation and training of classification models helped us in that analysis. It enabled us to examine how instantaneous sound attributes (henceforth features) are distributed in term of musical genre, and how designing codebooks specially tailored for these distributions affects the performance of ours and other classification systems commonly used for this task. On this issue, we show that there is no apparent benefit in seeking a thorough representation of the feature space. This is a bit unexpected since it goes against the assumption that features carry equally relevant information loads and somehow capture the specificities of musical facets, implicit in many genre recognition methods. Label inference is the task of automatically annotating songs with semantic words - this tasks is also known as autotagging. In this context, we illustrate the importance of a number of issues, that in our perspective, are often overlooked. We show that current techniques are fragile in the sense that small alterations in the set of labels may lead to dramatically different results. Furthermore, through a series of experiments, we show that autotagging systems fail to learn tag models capable to generalize to datasets of different origins. We also show that the performance achieved with these techniques is not sufficient to be able to take advantage of the correlations between tags.Fundação para a Ciência e a Tecnologia (FCT

    Un nouvel algorithme de sélection de caractéristiques : application à la lecture automatique de l'écriture manuscrite

    Get PDF
    La problématique abordée dans cette thèse est celle de la reconnaissance de l'écriture manuscrite hors-ligne, avec pour application industrielle le tri automatique du courrier. En effet le Service de Recherche Technique de La Poste (France) nous a donné pour mandat d'améliorer son système de reconnaissance de l'écriture manuscrite. Une analyse approfondie du système existant a permis de dégager une direction principale de recherche: l'amélioration de la représentation de l'information fournie au système de reconnaissance. Elle est caractérisée par deux ensembles finis de primitives, qui sont comnbinés avant intégration dans le système, au moyen d'un produit cartésien. L'amélioration de la représentation de l'information passe par l'extraction de nouvelles primitives. Dans cette optique, trois nouveaux espaces de représentation ont été développés. L'utilisation d'un algorithme de quantification vectorielle permet de construire plusieurs ensembles de primitives. Afin d'augmenter le pouvoir discriminant de ces dernières, différentes stratégies ont été évaluées: l'analyse discriminante linéaire, la technique de zoning et en association avec cette dernière stratégie de pondération des zones. La combinaison des espaces de représentation et des stratégies d'amélioration a conduit à la construction de plusieurs systèmes de reconnaissance obtenant de meilleures performances que système de base. La technique permettant de combiner les ensembles de primitives dans le système de base ne peut pas être utilisée. Un nouvel algorithme a été développé afin d'intégrer de nouveaux ensembles de primitives. L'idée de base est de remplacer les primitives les moins discriminantes d'un ensemble de départ par de nouvelles. Une stratégie effectuant des regroupements de primitives non-discriminantes permet de décomposer la tâche globale de reconnaissance en sous-problèmes. La définition et la sélection dynamique de nouvelles primitives est alors orientée par cette décomposition. L'application de l'algorithme aboutit à une représentation de l'information améliorée caractérisée par une hiérarchie de primitives. Son déroulement automatique permet une adaptation rapide à de nouvelles données ou à la disponibilité d'un nouvel espace de représentation. Les performances du système de base, utilisant la combinaison de deux ensembles de primitives est de 89,5% lors de l'utilisation d'un lexique de taille 1 000. L'amélioration d'un des deux ensembles conduit à une performance de 94,3%, tout en diminuant de 20% le nombre de primitives utilisées
    corecore