3,564 research outputs found

    A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"

    Full text link
    Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as "in-the-wild"). This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second authorshi

    Generalized Kernel-based Visual Tracking

    Full text link
    In this work we generalize the plain MS trackers and attempt to overcome standard mean shift trackers' two limitations. It is well known that modeling and maintaining a representation of a target object is an important component of a successful visual tracker. However, little work has been done on building a robust template model for kernel-based MS tracking. In contrast to building a template from a single frame, we train a robust object representation model from a large amount of data. Tracking is viewed as a binary classification problem, and a discriminative classification rule is learned to distinguish between the object and background. We adopt a support vector machine (SVM) for training. The tracker is then implemented by maximizing the classification score. An iterative optimization scheme very similar to MS is derived for this purpose.Comment: 12 page

    Accurate, Very Low Computational Complexity Spike Sorting Using Unsupervised Matched Subspace Learning

    Get PDF
    This paper presents an adaptable dictionary-based feature extraction approach for spike sorting offering high accuracy and low computational complexity for implantable applications. It extracts and learns identifiable features from evolving subspaces through matched unsupervised subspace filtering. To provide compatibility with the strict constraints in implantable devices such as the chip area and power budget, the dictionary contains arrays of {-1, 0 and 1} and the algorithm need only process addition and subtraction operations. Three types of such dictionary were considered. To quantify and compare the performance of the resulting three feature extractors with existing systems, a neural signal simulator based on several different libraries was developed. For noise levels σN\sigma_N between 0.05 and 0.3 and groups of 3 to 6 clusters, all three feature extractors provide robust high performance with average classification errors of less than 8% over five iterations, each consisting of 100 generated data segments. To our knowledge, the proposed adaptive feature extractors are the first able to classify reliably 6 clusters for implantable applications. An ASIC implementation of the best performing dictionary-based feature extractor was synthesized in a 65-nm CMOS process. It occupies an area of 0.09 mm2 and dissipates up to about 10.48 μW from a 1 V supply voltage, when operating with 8-bit resolution at 30 kHz operating frequency

    Joint cross-domain classification and subspace learning for unsupervised adaptation

    Get PDF
    Domain adaptation aims at adapting the knowledge acquired on a source domain to a new different but related target domain. Several approaches have beenproposed for classification tasks in the unsupervised scenario, where no labeled target data are available. Most of the attention has been dedicated to searching a new domain-invariant representation, leaving the definition of the prediction function to a second stage. Here we propose to learn both jointly. Specifically we learn the source subspace that best matches the target subspace while at the same time minimizing a regularized misclassification loss. We provide an alternating optimization technique based on stochastic sub-gradient descent to solve the learning problem and we demonstrate its performance on several domain adaptation tasks.Comment: Paper is under consideration at Pattern Recognition Letter

    Gender and gaze gesture recognition for human-computer interaction

    Get PDF
    © 2016 Elsevier Inc. The identification of visual cues in facial images has been widely explored in the broad area of computer vision. However theoretical analyses are often not transformed into widespread assistive Human-Computer Interaction (HCI) systems, due to factors such as inconsistent robustness, low efficiency, large computational expense or strong dependence on complex hardware. We present a novel gender recognition algorithm, a modular eye centre localisation approach and a gaze gesture recognition method, aiming to escalate the intelligence, adaptability and interactivity of HCI systems by combining demographic data (gender) and behavioural data (gaze) to enable development of a range of real-world assistive-technology applications. The gender recognition algorithm utilises Fisher Vectors as facial features which are encoded from low-level local features in facial images. We experimented with four types of low-level features: greyscale values, Local Binary Patterns (LBP), LBP histograms and Scale Invariant Feature Transform (SIFT). The corresponding Fisher Vectors were classified using a linear Support Vector Machine. The algorithm has been tested on the FERET database, the LFW database and the FRGCv2 database, yielding 97.7%, 92.5% and 96.7% accuracy respectively. The eye centre localisation algorithm has a modular approach, following a coarse-to-fine, global-to-regional scheme and utilising isophote and gradient features. A Selective Oriented Gradient filter has been specifically designed to detect and remove strong gradients from eyebrows, eye corners and self-shadows (which sabotage most eye centre localisation methods). The trajectories of the eye centres are then defined as gaze gestures for active HCI. The eye centre localisation algorithm has been compared with 10 other state-of-the-art algorithms with similar functionality and has outperformed them in terms of accuracy while maintaining excellent real-time performance. The above methods have been employed for development of a data recovery system that can be employed for implementation of advanced assistive technology tools. The high accuracy, reliability and real-time performance achieved for attention monitoring, gaze gesture control and recovery of demographic data, can enable the advanced human-robot interaction that is needed for developing systems that can provide assistance with everyday actions, thereby improving the quality of life for the elderly and/or disabled

    Automatic speech recognition: from study to practice

    Get PDF
    Today, automatic speech recognition (ASR) is widely used for different purposes such as robotics, multimedia, medical and industrial application. Although many researches have been performed in this field in the past decades, there is still a lot of room to work. In order to start working in this area, complete knowledge of ASR systems as well as their weak points and problems is inevitable. Besides that, practical experience improves the theoretical knowledge understanding in a reliable way. Regarding to these facts, in this master thesis, we have first reviewed the principal structure of the standard HMM-based ASR systems from technical point of view. This includes, feature extraction, acoustic modeling, language modeling and decoding. Then, the most significant challenging points in ASR systems is discussed. These challenging points address different internal components characteristics or external agents which affect the ASR systems performance. Furthermore, we have implemented a Spanish language recognizer using HTK toolkit. Finally, two open research lines according to the studies of different sources in the field of ASR has been suggested for future work
    • …
    corecore