608,245 research outputs found

    Integrated Inference and Learning of Neural Factors in Structural Support Vector Machines

    Get PDF
    Tackling pattern recognition problems in areas such as computer vision, bioinformatics, speech or text recognition is often done best by taking into account task-specific statistical relations between output variables. In structured prediction, this internal structure is used to predict multiple outputs simultaneously, leading to more accurate and coherent predictions. Structural support vector machines (SSVMs) are nonprobabilistic models that optimize a joint input-output function through margin-based learning. Because SSVMs generally disregard the interplay between unary and interaction factors during the training phase, final parameters are suboptimal. Moreover, its factors are often restricted to linear combinations of input features, limiting its generalization power. To improve prediction accuracy, this paper proposes: (i) Joint inference and learning by integration of back-propagation and loss-augmented inference in SSVM subgradient descent; (ii) Extending SSVM factors to neural networks that form highly nonlinear functions of input features. Image segmentation benchmark results demonstrate improvements over conventional SSVM training methods in terms of accuracy, highlighting the feasibility of end-to-end SSVM training with neural factors

    Extended LBP based Facial Expression Recognition System for Adaptive AI Agent Behaviour

    Get PDF
    Automatic facial expression recognition is widely used for various applications such as health care, surveillance and human-robot interaction. In this paper, we present a novel system which employs automatic facial emotion recognition technique for adaptive AI agent behaviour. The proposed system is equipped with kirsch operator based local binary patterns for feature extraction and diverse classifiers for emotion recognition. First, we nominate a novel variant of the local binary pattern (LBP) for feature extraction to deal with illumination changes, scaling and rotation variations. The features extracted are then used as input to the classifier for recognizing seven emotions. The detected emotion is then used to enhance the behaviour selection of the artificial intelligence (AI) agents in a shooter game. The proposed system is evaluated with multiple facial expression datasets and outperformed other state-of-the-art models by a significant margin

    Multi-Level Representation of Gesture as Command for Human Computer Interaction

    Get PDF
    oai:ojs.cai.ui.sav.sk:article/16The paper addresses the multiple forms of representation that human gesture takes at different levels for human computer interaction, ranging from gesture acquisition to mathematical model for analysis, pattern for recognition, record for database up to end-level application event triggers. A mathematical model for gesture as command is presented. We equally identify and provide particular models for four different types of gestures by considering both posture information and underlying motion trajectories. The problem of constructing gesture dictionaries is further addressed by taking into account similarity measures and dictionary discriminative features

    Signatures of Associative Memory Behavior in a Multimode Dicke Model

    Get PDF
    © 2020 American Physical Society. Dicke-like models can describe a variety of physical systems, such as atoms in a cavity or vibrating ion chains. In equilibrium these systems often feature a radical change in their behavior when switching from weak to strong spin-boson interaction. This usually manifests in a transition from a "dark"to a "superradiant"phase. However, understanding the out-of-equilibrium physics of these models is extremely challenging, and even more so for strong spin-boson coupling. Here we show that the nonequilibrium strongly interacting multimode Dicke model can mimic some fundamental properties of an associative memory - a system which permits the recognition of patterns, such as letters of an alphabet. Patterns are encoded in the couplings between spins and bosons, and we discuss the dynamics of the spins from the perspective of pattern retrieval in associative memory models. We identify two phases, a "paramagnetic"and a "ferromagnetic"one, and a crossover behavior between these regimes. The "ferromagnetic"phase is reminiscent of pattern retrieval. We highlight similarities and differences with the thermal dynamics of a Hopfield associative memory and show that indeed elements of "machine learning behavior"emerge in the strongly coupled multimode Dicke model

    Interactive handwriting recognition with limited user effort

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10032-013-0204-5[EN] Transcription of handwritten text in (old) documents is an important, time-consuming task for digital libraries. Although post-editing automatic recognition of handwritten text is feasible, it is not clearly better than simply ignoring it and transcribing the document from scratch. A more effective approach is to follow an interactive approach in which both the system is guided by the user, and the user is assisted by the system to complete the transcription task as efficiently as possible. Nevertheless, in some applications, the user effort available to transcribe documents is limited and fully supervision of the system output is not realistic. To circumvent these problems, we propose a novel interactive approach which efficiently employs user effort to transcribe a document by improving three different aspects. Firstly, the system employs a limited amount of effort to solely supervise recognised words that are likely to be incorrect. Thus, user effort is efficiently focused on the supervision of words for which the system is not confident enough. Secondly, it refines the initial transcription provided to the user by recomputing it constrained to user supervisions. In this way, incorrect words in unsupervised parts can be automatically amended without user supervision. Finally, it improves the underlying system models by retraining the system from partially supervised transcriptions. In order to prove these statements, empirical results are presented on two real databases showing that the proposed approach can notably reduce user effort in the transcription of handwritten text in (old) documents.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under Grant Agreement No 287755 (transLectures). Also supported by the Spanish Government (MICINN, MITyC, "Plan E", under Grants MIPRCV "Consolider Ingenio 2010", MITTRAL (TIN2009-14633-C03-01), erudito.com (TSI-020110-2009-439), iTrans2 (TIN2009-14511), and FPU (AP2007-02867), and the Generalitat Valenciana (Grants Prometeo/2009/014 and GV/2010/067).Serrano Martinez Santos, N.; Giménez Pastor, A.; Civera Saiz, J.; Sanchis Navarro, JA.; Juan Císcar, A. (2014). Interactive handwriting recognition with limited user effort. International Journal on Document Analysis and Recognition. 17(1):47-59. https://doi.org/10.1007/s10032-013-0204-5S4759171Agua, M., Serrano, N., Civera, J., Juan, A.: Character-based handwritten text recognition of multilingual documents. In: Proceedings of Advances in Speech and Language Technologies for Iberian Languages (IBERSPEECH 2012), Madrid (Spain), pp. 187–196 (2012)Ahn, L.V., Maurer, B., Mcmillen, C., Abraham, D., Blum, M.: reCAPTCHA: human-based character recognition via web security measures. Science 321, 1465–1468 (2008)Barrachina, S., Bender, O., Casacuberta, F., Civera, J., Cubel, E., Khadivi, S., Lagarda, A.L., Ney, H., Tomás, J., Vidal, E.: Statistical approaches to computer-assisted translation. Comput. Linguist. 35(1), 3–28 (2009)Bertolami, R., Bunke, H.: Hidden markov model-based ensemble methods for offline handwritten text line recognition. Pattern Recognit. 41, 3452–3460 (2008)Bunke, H., Bengio, S., Vinciarelli, A.: Offline recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 709–720 (2004)Dreuw, P., Jonas, S., Ney, H.: White-space models for offline Arabic handwriting recognition. In: Proceedings of the 19th International Conference on, Pattern Recognition, pp. 1–4 (2008)Efron, B., Tibshirani, R.J.: An introduction to bootstrap. Chapman and Hall/CRC, London (1994)Fischer, A., Wuthrich, M., Liwicki, M., Frinken, V., Bunke, H., Viehhauser, G., Stolz, M.: Automatic transcription of handwritten medieval documents. In: Proceedings of the 15th International Conference on Virtual Systems and Multimedia, pp. 137–142 (2009)Frinken, V., Bunke, H.: Evaluating retraining rules for semi-supervised learning in neural network based cursive word recognition. In: Proceedings of the 10th International Conference on Document Analysis and Recognition, Barcelona (Spain), pp. 31–35 (2009)Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)Hakkani-Tür, D., Riccardi, G., Tur, G.: An active approach to spoken language processing. ACM Trans. Speech Lang. Process. 3, 1–31 (2006)Kristjannson, T., Culotta, A., Viola, P., McCallum, A.: Interactive information extraction with constrained conditional random fields. In: Proceedings of the 19th Natural Conference on Artificial Intelligence, San Jose, CA (USA), pp. 412–418 (2004)Laurence Likforman-Sulem, A.Z., Taconet, B.: Text line segmentation of historical documents: a survey. Int. J. Doc. Anal. Recognit. 9, 123–138 (2007)Le Bourgeois, F., Emptoz, H.: Debora: digital access to books of the renaissance. Int. J. Doc. Anal. Recognit. 9, 193–221 (2007)Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)Neal, R.M., Hinton, G.E.: Learning in graphical models. In: A View of the EM Algorithm That Justifies Incremental, Sparse, and Other Variants, Chap. MIT Press, Cambridge, MA, USA, pp. 355–368 (1999)Pérez, D., Tarazón, L., Serrano, N., Ramos-Terrades, O., Juan, A.: The GERMANA database. In: Proceedings of the 10th International Conference on Document Analysis and Recognition, Barcelona (Spain), pp. 301–305 (2009)Plötz, T., Fink, G.A.: Markov models for offline handwriting recognition: a survey. Int. J. Doc. Anal. Recognit. 12(4), 269–298 (2009)Quiniou, S., Cheriet, M., Anquetil, E.: Error handling approach using characterization and correction steps for handwritten document analysis. Int. J. Doc. Anal. Recognit. 15(2), 125–141 (2012)Rodríguez, L., García-Varea, I., Vidal, E.: Multi-modal computer assisted speech transcription. In: International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ACM, New York, NY, USA, pp. 30:1–30:7 (2010)Serrano, N., Pérez, D., Sanchis, A., Juan, A.: Adaptation from partially supervised handwritten text transcriptions. In: Proceedings of the 11th International Conference on Multimodal Interfaces and the 6th Workshop on Machine Learning for Multimodal Interaction, Cambridge, MA (USA), pp. 289–292 (2009)Serrano, N., Castro, F., Juan, A.: The RODRIGO database. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, Valleta (Malta), pp. 2709–2712 (2010)Serrano, N., Giménez, A., Sanchis, A., Juan, A.: Active learning strategies for handwritten text transcription. In: Proceedings of the 12th International Conference on Multimodal Interfaces and the 7th Workshop on Machine Learning for Multimodal, Interaction, Beijing (China) (2010)Serrano, N., Sanchis, A., Juan, A.: Balancing error and supervision effort in interactive-predictive handwriting recognition. In: Proceedings of the 15th International Conference on Intelligent User Interfaces, Hong Kong (China), pp. 373–376 (2010)Serrano, N., Tarazón, L., Pérez, D., Ramos-Terrades, O., Juan, A.: The GIDOC prototype. In: Proceedings of the 10th International Workshop on Pattern Recognition in Information Systems, Funchal (Portugal), pp. 82–89 (2010)Settles, B.: Active Learning Literature Survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison (2009)Tarazón, L., Pérez, D., Serrano, N., Alabau, V., Ramos-Terrades, O., Sanchis, A., Juan, A.: Confidence measures for error correction in interactive transcription of handwritten text. In: Proceedings of the 15th International Conference on Image Analysis, Processing, Vietri sul Mare (Italy) (2009)Toselli, A., Juan, A., Keysers, D., González, J., Salvador, I., Ney, H., Vidal, E., Casacuberta, F.: Integrated handwriting recognition and interpretation using finite-state models. Int. J. Pattern Recognit. Artif. Intell. 18(4), 519–539 (2004)Toselli, A., Romero, V., Rodríguez, L., Vidal, E.: Computer assisted transcription of handwritten text. In: Proceedings of the 9th International Conference on Document Analysis and Recognition, Curitiba (Brazil), pp. 944–948 (2007)Valor, J., Pérez, A., Civera, J., Juan, A.: Integrating a state-of-the-art ASR system into the opencast Matterhorn platform. In: Proceedings of the Advances in Speech and Language Technologies for Iberian Languages (IBERSPEECH 2012), Madrid (Spain), pp. 237–246 (2012)Wessel, F., Ney, H.: Unsupervised training of acoustic models for large vocabulary continuous speech recognition. IEEE Trans Speech Audio Process 13(1), 23–31 (2005

    The VEX-93 environment as a hybrid tool for developing knowledge systems with different problem solving techniques

    Get PDF
    The paper describes VEX-93 as a hybrid environment for developing knowledge-based and problem solver systems. It integrates methods and techniques from artificial intelligence, image and signal processing and data analysis, which can be mixed. Two hierarchical levels of reasoning contains an intelligent toolbox with one upper strategic inference engine and four lower ones containing specific reasoning models: truth-functional (rule-based), probabilistic (causal networks), fuzzy (rule-based) and case-based (frames). There are image/signal processing-analysis capabilities in the form of programming languages with more than one hundred primitive functions. User-made programs are embeddable within knowledge basis, allowing the combination of perception and reasoning. The data analyzer toolbox contains a collection of numerical classification, pattern recognition and ordination methods, with neural network tools and a data base query language at inference engines's disposal. VEX-93 is an open system able to communicate with external computer programs relevant to a particular application. Metaknowledge can be used for elaborate conclusions, and man-machine interaction includes, besides windows and graphical interfaces, acceptance of voice commands and production of speech output. The system was conceived for real-world applications in general domains, but an example of a concrete medical diagnostic support system at present under completion as a cuban-spanish project is mentioned. Present version of VEX-93 is a huge system composed by about one and half millions of lines of C code and runs in microcomputers under Windows 3.1.Postprint (published version

    The Evolution of First Person Vision Methods: A Survey

    Full text link
    The emergence of new wearable technologies such as action cameras and smart-glasses has increased the interest of computer vision scientists in the First Person perspective. Nowadays, this field is attracting attention and investments of companies aiming to develop commercial devices with First Person Vision recording capabilities. Due to this interest, an increasing demand of methods to process these videos, possibly in real-time, is expected. Current approaches present a particular combinations of different image features and quantitative methods to accomplish specific objectives like object detection, activity recognition, user machine interaction and so on. This paper summarizes the evolution of the state of the art in First Person Vision video analysis between 1997 and 2014, highlighting, among others, most commonly used features, methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart Glasses, Computer Vision, Video Analytics, Human-machine Interactio
    • …
    corecore