185 research outputs found

    Un systÚme à base de connaissances pour une communication parlée personne-systÚme multilingue

    Get PDF
    La tĂąche de reconnaissance automatique de la parole (RAP), qui est au coeur de la communication parlĂ©e Personne-SystĂšme, peut ĂȘtre vue comme une gestion de l’information issue de la microstructure acoustique du signal vocal pour la transformer en une information reprĂ©sentĂ©e par la macrostructure phonĂ©tique implicite. La correspondance avec le moins d’erreurs possible de ces deux structures nĂ©cessite une intĂ©gration de connaissances a priori sur la macrostructure phonĂ©tique dans des systĂšmes dĂ©diĂ©s Ă  la gestion de l’information acoustico-phonĂ©tique. Dans cet article, nous abordons des aspects liĂ©s tant Ă  la gestion de l’information phonĂ©tique vĂ©hiculĂ©e par le signal vocal qu’à la topologie de systĂšmes experts capables de conduire des processus de reconnaissance phonĂ©mique multilingue. La dĂ©marche que nous proposons consiste Ă  enrichir la base de connaissances de ces experts par des indices reprĂ©sentatifs de la majoritĂ© des langues humaines afin de rehausser les performances d’identification des macro-classes et des traits phonĂ©tiques divers. Les rĂ©sultats obtenus sur des corpus de logatomes et de phrases en langues française et arabe montrent qu’il est possible d’orienter la conception des systĂšmes vers une unification du processus de reconnaissance pour l’adapter Ă  une identification phonĂ©mique multilingue.Automatic Speech Recognition (ASR) is at the heart of Man-Machine speech communication. It can be seen as a management of the information emanating from the speech acoustical microstructure. This process aims to transform this information in such a way that it can be represented by the phonetic implicit macrostructure. The effective matching between the two structures requires the integration into expert systems, of an a priori knowledge about the phonetic macrostructures. These expert systems are dedicated to the management of acoustic-phonetic information. This paper investigates aspects linked either to the management of phonetic information contained in the speech signal, or to the topology of expert systems that are capable of conducting a multilingual phonemic recognition process. The proposed method consists of feeding the knowledge base of these expert systems with indicative parameters representing the major human languages in order to enhance the identification performance of phonetic macro-classes and features. The results of experiments carried out on corpora composed of both French and Arabic utterances show that it is possible to conceive systems based on the concept of unified recognition processes dedicated to multilingual phonetic identification

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    An Introduction to Neural Data Compression

    Full text link
    Neural compression is the application of neural networks and other machine learning methods to data compression. Recent advances in statistical machine learning have opened up new possibilities for data compression, allowing compression algorithms to be learned end-to-end from data using powerful generative models such as normalizing flows, variational autoencoders, diffusion probabilistic models, and generative adversarial networks. The present article aims to introduce this field of research to a broader machine learning audience by reviewing the necessary background in information theory (e.g., entropy coding, rate-distortion theory) and computer vision (e.g., image quality assessment, perceptual metrics), and providing a curated guide through the essential ideas and methods in the literature thus far

    Multimodal Computational Attention for Scene Understanding

    Get PDF
    Robotic systems have limited computational capacities. Hence, computational attention models are important to focus on specific stimuli and allow for complex cognitive processing. For this purpose, we developed auditory and visual attention models that enable robotic platforms to efficiently explore and analyze natural scenes. To allow for attention guidance in human-robot interaction, we use machine learning to integrate the influence of verbal and non-verbal social signals into our models

    Speech-driven animation using multi-modal hidden Markov models

    Get PDF
    The main objective of this thesis was the synthesis of speech synchronised motion, in particular head motion. The hypothesis that head motion can be estimated from the speech signal was confirmed. In order to achieve satisfactory results, a motion capture data base was recorded, a definition of head motion in terms of articulation was discovered, a continuous stream mapping procedure was developed, and finally the synthesis was evaluated. Based on previous research into non-verbal behaviour basic types of head motion were invented that could function as modelling units. The stream mapping method investigated in this thesis is based on Hidden Markov Models (HMMs), which employ modelling units to map between continuous signals. The objective evaluation of the modelling parameters confirmed that head motion types could be predicted from the speech signal with an accuracy above chance, close to 70%. Furthermore, a special type ofHMMcalled trajectoryHMMwas used because it enables synthesis of continuous output. However head motion is a stochastic process therefore the trajectory HMM was further extended to allow for non-deterministic output. Finally the resulting head motion synthesis was perceptually evaluated. The effects of the “uncanny valley” were also considered in the evaluation, confirming that rendering quality has an influence on our judgement of movement of virtual characters. In conclusion a general method for synthesising speech-synchronised behaviour was invented that can applied to a whole range of behaviours

    Extraction and representation of semantic information in digital media

    Get PDF

    Contributions To Automatic Particle Identification In Electron Micrographs: Algorithms, Implementation, And Applications

    Get PDF
    Three dimensional reconstruction of large macromolecules like viruses at resolutions below 8 Ã
 - 10 Ã
 requires a large set of projection images and the particle identification step becomes a bottleneck. Several automatic and semi-automatic particle detection algorithms have been developed along the years. We present a general technique designed to automatically identify the projection images of particles. The method utilizes Markov random field modelling of the projected images and involves a preprocessing of electron micrographs followed by image segmentation and post processing for boxing of the particle projections. Due to the typically extensive computational requirements for extracting hundreds of thousands of particle projections, parallel processing becomes essential. We present parallel algorithms and load balancing schemes for our algorithms. The lack of a standard benchmark for relative performance analysis of particle identification algorithms has prompted us to develop a benchmark suite. Further, we present a collection of metrics for the relative performance analysis of particle identification algorithms on the micrograph images in the suite, and discuss the design of the benchmark suite

    24th International Conference on Information Modelling and Knowledge Bases

    Get PDF
    In the last three decades information modelling and knowledge bases have become essentially important subjects not only in academic communities related to information systems and computer science but also in the business area where information technology is applied. The series of European – Japanese Conference on Information Modelling and Knowledge Bases (EJC) originally started as a co-operation initiative between Japan and Finland in 1982. The practical operations were then organised by professor Ohsuga in Japan and professors Hannu Kangassalo and Hannu Jaakkola in Finland (Nordic countries). Geographical scope has expanded to cover Europe and also other countries. Workshop characteristic - discussion, enough time for presentations and limited number of participants (50) / papers (30) - is typical for the conference. Suggested topics include, but are not limited to: 1. Conceptual modelling: Modelling and specification languages; Domain-specific conceptual modelling; Concepts, concept theories and ontologies; Conceptual modelling of large and heterogeneous systems; Conceptual modelling of spatial, temporal and biological data; Methods for developing, validating and communicating conceptual models. 2. Knowledge and information modelling and discovery: Knowledge discovery, knowledge representation and knowledge management; Advanced data mining and analysis methods; Conceptions of knowledge and information; Modelling information requirements; Intelligent information systems; Information recognition and information modelling. 3. Linguistic modelling: Models of HCI; Information delivery to users; Intelligent informal querying; Linguistic foundation of information and knowledge; Fuzzy linguistic models; Philosophical and linguistic foundations of conceptual models. 4. Cross-cultural communication and social computing: Cross-cultural support systems; Integration, evolution and migration of systems; Collaborative societies; Multicultural web-based software systems; Intercultural collaboration and support systems; Social computing, behavioral modeling and prediction. 5. Environmental modelling and engineering: Environmental information systems (architecture); Spatial, temporal and observational information systems; Large-scale environmental systems; Collaborative knowledge base systems; Agent concepts and conceptualisation; Hazard prediction, prevention and steering systems. 6. Multimedia data modelling and systems: Modelling multimedia information and knowledge; Contentbased multimedia data management; Content-based multimedia retrieval; Privacy and context enhancing technologies; Semantics and pragmatics of multimedia data; Metadata for multimedia information systems. Overall we received 56 submissions. After careful evaluation, 16 papers have been selected as long paper, 17 papers as short papers, 5 papers as position papers, and 3 papers for presentation of perspective challenges. We thank all colleagues for their support of this issue of the EJC conference, especially the program committee, the organising committee, and the programme coordination team. The long and the short papers presented in the conference are revised after the conference and published in the Series of “Frontiers in Artificial Intelligence” by IOS Press (Amsterdam). The books “Information Modelling and Knowledge Bases” are edited by the Editing Committee of the conference. We believe that the conference will be productive and fruitful in the advance of research and application of information modelling and knowledge bases. Bernhard Thalheim Hannu Jaakkola Yasushi Kiyok

    Visual saliency computation for image analysis

    Full text link
    Visual saliency computation is about detecting and understanding salient regions and elements in a visual scene. Algorithms for visual saliency computation can give clues to where people will look in images, what objects are visually prominent in a scene, etc. Such algorithms could be useful in a wide range of applications in computer vision and graphics. In this thesis, we study the following visual saliency computation problems. 1) Eye Fixation Prediction. Eye fixation prediction aims to predict where people look in a visual scene. For this problem, we propose a Boolean Map Saliency (BMS) model which leverages the global surroundedness cue using a Boolean map representation. We draw a theoretic connection between BMS and the Minimum Barrier Distance (MBD) transform to provide insight into our algorithm. Experiment results show that BMS compares favorably with state-of-the-art methods on seven benchmark datasets. 2) Salient Region Detection. Salient region detection entails computing a saliency map that highlights the regions of dominant objects in a scene. We propose a salient region detection method based on the Minimum Barrier Distance (MBD) transform. We present a fast approximate MBD transform algorithm with an error bound analysis. Powered by this fast MBD transform algorithm, our method can run at about 80 FPS and achieve state-of-the-art performance on four benchmark datasets. 3) Salient Object Detection. Salient object detection targets at localizing each salient object instance in an image. We propose a method using a Convolutional Neural Network (CNN) model for proposal generation and a novel subset optimization formulation for bounding box filtering. In experiments, our subset optimization formulation consistently outperforms heuristic bounding box filtering baselines, such as Non-maximum Suppression, and our method substantially outperforms previous methods on three challenging datasets. 4) Salient Object Subitizing. We propose a new visual saliency computation task, called Salient Object Subitizing, which is to predict the existence and the number of salient objects in an image using holistic cues. To this end, we present an image dataset of about 14K everyday images which are annotated using an online crowdsourcing marketplace. We show that an end-to-end trained CNN subitizing model can achieve promising performance without requiring any localization process. A method is proposed to further improve the training of the CNN subitizing model by leveraging synthetic images. 5) Top-down Saliency Detection. Unlike the aforementioned tasks, top-down saliency detection entails generating task-specific saliency maps. We propose a weakly supervised top-down saliency detection approach by modeling the top-down attention of a CNN image classifier. We propose Excitation Backprop and the concept of contrastive attention to generate highly discriminative top-down saliency maps. Our top-down saliency detection method achieves superior performance in weakly supervised localization tasks on challenging datasets. The usefulness of our method is further validated in the text-to-region association task, where our method provides state-of-the-art performance using only weakly labeled web images for training
    • 

    corecore