Search CORE

34 research outputs found

Discrete-Mixture HMMs-based Approach for Noisy Speech Recognition

Author: Masaharu Katoh
Masaki Kohda
Tetsuo Kosaka
Publication venue: 'IntechOpen'
Publication date: 01/06/2007
Field of study

Hidden-Markov-Models-Based Dynamic Hand Gesture Recognition

Author: Carlo Cattani
Huiwen Cai
Ming Xia
Xiaoyan Wang
Yong Gao
Publication venue
Publication date: 01/01/2012
Field of study

This paper is concerned with the recognition of dynamic hand gestures. A method based on Hidden Markov Models (HMMs) is presented for dynamic gesture trajectory modeling and recognition. Adaboost algorithm is used to detect the user's hand and a contour-based hand tracker is formed combining condensation and partitioned sampling. Cubic B-spline is adopted to approximately fit the trajectory points into a curve. Invariant curve moments as global features and orientation as local features are computed to represent the trajectory of hand gesture. The proposed method can achieve automatic hand gesture online recognition and can successfully reject atypical gestures. The experimental results show that the proposed algorithm can reach better recognition results than the traditional hand recognition method

Directory of Open Access Journals

Unitus DSpace

Archivio della Ricerca - Università di Salerno

Open Access Repository

Survey of error concealment schemes for real-time audio transmission systems

Author: Robles Moya Aránzazu
Publication venue
Publication date: 18/09/2012
Field of study

This thesis presents an overview of the main strategies employed for error detection and error concealment in different real-time transmission systems for digital audio. The “Adaptive Differential Pulse-Code Modulation (ADPCM)”, the “Audio Processing Technology Apt-x100”, the “Extended Adaptive Multi-Rate Wideband (AMR-WB+)”, the “Advanced Audio Coding (AAC)”, the “MPEG-1 Audio Layer II (MP2)”, the “MPEG-1 Audio Layer III (MP3)” and finally the “Adaptive Transform Coder 3 (AC3)” are considered. As an example of error management, a simulation of the AMR-WB+ codec is included. The simulation allows an evaluation of the mechanisms included in the codec definition and enables also an evaluation of the different bit error sensitivities of the encoded audio payload.Ingeniería Técnica en Telemátic

Universidad Carlos III de Madrid e-Archivo

Complex queries and complex data

Author: Niedermayer Johannes
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 30/10/2015
Field of study

With the widespread availability of wearable computers, equipped with sensors such as GPS or cameras, and with the ubiquitous presence of micro-blogging platforms, social media sites and digital marketplaces, data can be collected and shared on a massive scale. A necessary building block for taking advantage from this vast amount of information are efficient and effective similarity search algorithms that are able to find objects in a database which are similar to a query object. Due to the general applicability of similarity search over different data types and applications, the formalization of this concept and the development of strategies for evaluating similarity queries has evolved to an important field of research in the database community, spatio-temporal database community, and others, such as information retrieval and computer vision. This thesis concentrates on a special instance of similarity queries, namely k-Nearest Neighbor (kNN) Queries and their close relative, Reverse k-Nearest Neighbor (RkNN) Queries. As a first contribution we provide an in-depth analysis of the RkNN join. While the problem of reverse nearest neighbor queries has received a vast amount of research interest, the problem of performing such queries in a bulk has not seen an in-depth analysis so far. We first formalize the RkNN join, identifying its monochromatic and bichromatic versions and their self-join variants. After pinpointing the monochromatic RkNN join as an important and interesting instance, we develop solutions for this class, including a self-pruning and a mutual pruning algorithm. We then evaluate these algorithms extensively on a variety of synthetic and real datasets. From this starting point of similarity queries on certain data we shift our focus to uncertain data, addressing nearest neighbor queries in uncertain spatio-temporal databases. Starting from the traditional definition of nearest neighbor queries and a data model for uncertain spatio-temporal data, we develop efficient query mechanisms that consider temporal dependencies during query evaluation. We define intuitive query semantics, aiming not only at returning the objects closest to the query but also their probability of being a nearest neighbor. After theoretically evaluating these query predicates we develop efficient querying algorithms for the proposed query predicates. Given the findings of this research on nearest neighbor queries, we extend these results to reverse nearest neighbor queries. Finally we address the problem of querying large datasets containing set-based objects, namely image databases, where images are represented by (multi-)sets of vectors and additional metadata describing the position of features in the image. We aim at reducing the number of kNN queries performed during query processing and evaluate a modified pipeline that aims at optimizing the query accuracy at a small number of kNN queries. Additionally, as feature representations in object recognition are moving more and more from the real-valued domain to the binary domain, we evaluate efficient indexing techniques for binary feature vectors.Nicht nur durch die Verbreitung von tragbaren Computern, die mit einer Vielzahl von Sensoren wie GPS oder Kameras ausgestattet sind, sondern auch durch die breite Nutzung von Microblogging-Plattformen, Social-Media Websites und digitale Marktplätze wie Amazon und Ebay wird durch die User eine gigantische Menge an Daten veröffentlicht. Um aus diesen Daten einen Mehrwert erzeugen zu können bedarf es effizienter und effektiver Algorithmen zur Ähnlichkeitssuche, die zu einem gegebenen Anfrageobjekt ähnliche Objekte in einer Datenbank identifiziert. Durch die Allgemeinheit dieses Konzeptes der Ähnlichkeit über unterschiedliche Datentypen und Anwendungen hinweg hat sich die Ähnlichkeitssuche zu einem wichtigen Forschungsfeld, nicht nur im Datenbankumfeld oder im Bereich raum-zeitlicher Datenbanken, sondern auch in anderen Forschungsgebieten wie dem Information Retrieval oder dem Maschinellen Sehen entwickelt. In der vorliegenden Arbeit beschäftigen wir uns mit einem speziellen Anfrageprädikat im Bereich der Ähnlichkeitsanfragen, mit k-nächste Nachbarn (kNN) Anfragen und ihrem Verwandten, den Revers k-nächsten Nachbarn (RkNN) Anfragen. In einem ersten Beitrag analysieren wir den RkNN Join. Obwohl das Problem von reverse nächsten Nachbar Anfragen in den letzten Jahren eine breite Aufmerksamkeit in der Forschungsgemeinschaft erfahren hat, wurde das Problem eine Menge von RkNN Anfragen gleichzeitig auszuführen nicht ausreichend analysiert. Aus diesem Grund formalisieren wir das Problem des RkNN Joins mit seinen monochromatischen und bichromatischen Varianten. Wir identifizieren den monochromatischen RkNN Join als einen wichtigen und interessanten Fall und entwickeln entsprechende Anfragealgorithmen. In einer detaillierten Evaluation vergleichen wir die ausgearbeiteten Verfahren auf einer Vielzahl von synthetischen und realen Datensätzen. Nach diesem Kapitel über Ähnlichkeitssuche auf sicheren Daten konzentrieren wir uns auf unsichere Daten, speziell im Bereich raum-zeitlicher Datenbanken. Ausgehend von der traditionellen Definition von Nachbarschaftsanfragen und einem Datenmodell für unsichere raum-zeitliche Daten entwickeln wir effiziente Anfrageverfahren, die zeitliche Abhängigkeiten bei der Anfragebearbeitung beachten. Zu diesem Zweck definieren wir Anfrageprädikate die nicht nur die Objekte zurückzugeben, die dem Anfrageobjekt am nächsten sind, sondern auch die Wahrscheinlichkeit mit der sie ein nächster Nachbar sind. Wir evaluieren die definierten Anfrageprädikate theoretisch und entwickeln effiziente Anfragestrategien, die eine Anfragebearbeitung zu vertretbaren Laufzeiten gewährleisten. Ausgehend von den Ergebnissen für Nachbarschaftsanfragen erweitern wir unsere Ergebnisse auf Reverse Nachbarschaftsanfragen. Zuletzt behandeln wir das Problem der Anfragebearbeitung bei Mengen-basierten Objekten, die zum Beispiel in Bilddatenbanken Verwendung finden: Oft werden Bilder durch eine Menge von Merkmalsvektoren und zusätzliche Metadaten (zum Beispiel die Position der Merkmale im Bild) dargestellt. Wir evaluieren eine modifizierte Pipeline, die darauf abzielt, die Anfragegenauigkeit bei einer kleinen Anzahl an kNN-Anfragen zu maximieren. Da reellwertige Merkmalsvektoren im Bereich der Objekterkennung immer öfter durch Bitvektoren ersetzt werden, die sich durch einen geringeren Speicherplatzbedarf und höhere Laufzeiteffizienz auszeichnen, evaluieren wir außerdem Indexierungsverfahren für Binärvektoren

マルチモーダル音声対話システムでの先進的コミュニケーションのためのユーザ状態推定

Author: Chiba Yuya
Publication venue
Publication date: 19/12/2017
Field of study

Tohoku University伊藤彰則課

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Graphical models for visual object recognition and tracking

Author: Sudderth Erik B. (Erik Blaine), 1977-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2006
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 277-301).We develop statistical methods which allow effective visual detection, categorization, and tracking of objects in complex scenes. Such computer vision systems must be robust to wide variations in object appearance, the often small size of training databases, and ambiguities induced by articulated or partially occluded objects. Graphical models provide a powerful framework for encoding the statistical structure of visual scenes, and developing corresponding learning and inference algorithms. In this thesis, we describe several models which integrate graphical representations with nonparametric statistical methods. This approach leads to inference algorithms which tractably recover high-dimensional, continuous object pose variations, and learning procedures which transfer knowledge among related recognition tasks. Motivated by visual tracking problems, we first develop a nonparametric extension of the belief propagation (BP) algorithm. Using Monte Carlo methods, we provide general procedures for recursively updating particle-based approximations of continuous sufficient statistics. Efficient multiscale sampling methods then allow this nonparametric BP algorithm to be flexibly adapted to many different applications.(cont.) As a particular example, we consider a graphical model describing the hand's three-dimensional (3D) structure, kinematics, and dynamics. This graph encodes global hand pose via the 3D position and orientation of several rigid components, and thus exposes local structure in a high-dimensional articulated model. Applying nonparametric BP, we recover a hand tracking algorithm which is robust to outliers and local visual ambiguities. Via a set of latent occupancy masks, we also extend our approach to consistently infer occlusion events in a distributed fashion. In the second half of this thesis, we develop methods for learning hierarchical models of objects, the parts composing them, and the scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves accuracy when learning from few examples.(cont.) Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. Adapting these transformed Dirichlet processes to images taken with a binocular stereo camera, we learn integrated, 3D models of object geometry and appearance. This leads to a Monte Carlo algorithm which automatically infers 3D scene structure from the predictable geometry of known object categories.by Erik B. Sudderth.Ph.D

DSpace@MIT

Large-deviation analysis and applications Of learning tree-structured graphical models

Author: Tan Vincent Yan Fu
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2011
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 213-228).The design and analysis of complexity-reduced representations for multivariate data is important in many scientific and engineering domains. This thesis explores such representations from two different perspectives: deriving and analyzing performance measures for learning tree-structured graphical models and salient feature subset selection for discrimination. Graphical models have proven to be a flexible class of probabilistic models for approximating high-dimensional data. Learning the structure of such models from data is an important generic task. It is known that if the data are drawn from tree-structured distributions, then the algorithm of Chow and Liu (1968) provides an efficient algorithm for finding the tree that maximizes the likelihood of the data. We leverage this algorithm and the theory of large deviations to derive the error exponent of structure learning for discrete and Gaussian graphical models. We determine the extremal tree structures for learning, that is, the structures that lead to the highest and lowest exponents. We prove that the star minimizes the exponent and the chain maximizes the exponent, which means that among all unlabeled trees, the star and the chain are the worst and best for learning respectively. The analysis is also extended to learning foreststructured graphical models by augmenting the Chow-Liu algorithm with a thresholding procedure. We prove scaling laws on the number of samples and the number variables for structure learning to remain consistent in high-dimensions. The next part of the thesis is concerned with discrimination. We design computationally efficient tree-based algorithms to learn pairs of distributions that are specifically adapted to the task of discrimination and show that they perform well on various datasets vis-`a-vis existing tree-based algorithms. We define the notion of a salient set for discrimination using information-theoretic quantities and derive scaling laws on the number of samples so that the salient set can be recovered asymptotically.by Vincent Yan Fu Tan.Ph.D

DSpace@MIT

Complex queries and complex data

Author: Niedermayer Johannes
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 30/10/2015
Field of study

Digitale Hochschulschriften der LMU

Systematic hybrid analog/digital signal coding

Author: Barron Richard J. (Richard John)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2000
Field of study

Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 201-206).This thesis develops low-latency, low-complexity signal processing solutions for systematic source coding, or source coding with side information at the decoder. We consider an analog source signal transmitted through a hybrid channel that is the composition of two channels: a noisy analog channel through which the source is sent unprocessed and a secondary rate-constrained digital channel; the source is processed prior to transmission through the digital channel. The challenge is to design a digital encoder and decoder that provide a minimum-distortion reconstruction of the source at the decoder, which has observations of analog and digital channel outputs. The methods described in this thesis have importance to a wide array of applications. For example, in the case of in-band on-channel (IBOC) digital audio broadcast (DAB), an existing noisy analog communications infrastructure may be augmented by a low-bandwidth digital side channel for improved fidelity, while compatibility with existing analog receivers is preserved. Another application is a source coding scheme which devotes a fraction of available bandwidth to the analog source and the rest of the bandwidth to a digital representation. This scheme is applicable in a wireless communications environment (or any environment with unknown SNR), where analog transmission has the advantage of a gentle roll-off of fidelity with SNR. A very general paradigm for low-latency, low-complexity source coding is composed of three basic cascaded elements: 1) a space rotation, or transformation, 2) quantization, and 3) lossless bitstream coding. The paradigm has been applied with great success to conventional source coding, and it applies equally well to systematic source coding. Focusing on the case involving a Gaussian source, Gaussian channel and mean-squared distortion, we determine optimal or near-optimal components for each of the three elements, each of which has analogous components in conventional source coding. The space rotation can take many forms such as linear block transforms, lapped transforms, or subband decomposition, all for which we derive conditions of optimality. For a very general case we develop algorithms for the design of locally optimal quantizers. For the Gaussian case, we describe a low-complexity scalar quantizer, the nested lattice scalar quantizer, that has performance very near that of the optimal systematic scalar quantizer. Analogous to entropy coding for conventional source coding, Slepian-Wolf coding is shown to be an effective lossless bitstream coding stage for systematic source coding.by Richard J. Barron.Ph.D

DSpace@MIT