84 research outputs found

    Highly efficient low-level feature extraction for video representation and retrieval.

    Get PDF
    PhDWitnessing the omnipresence of digital video media, the research community has raised the question of its meaningful use and management. Stored in immense multimedia databases, digital videos need to be retrieved and structured in an intelligent way, relying on the content and the rich semantics involved. Current Content Based Video Indexing and Retrieval systems face the problem of the semantic gap between the simplicity of the available visual features and the richness of user semantics. This work focuses on the issues of efficiency and scalability in video indexing and retrieval to facilitate a video representation model capable of semantic annotation. A highly efficient algorithm for temporal analysis and key-frame extraction is developed. It is based on the prediction information extracted directly from the compressed domain features and the robust scalable analysis in the temporal domain. Furthermore, a hierarchical quantisation of the colour features in the descriptor space is presented. Derived from the extracted set of low-level features, a video representation model that enables semantic annotation and contextual genre classification is designed. Results demonstrate the efficiency and robustness of the temporal analysis algorithm that runs in real time maintaining the high precision and recall of the detection task. Adaptive key-frame extraction and summarisation achieve a good overview of the visual content, while the colour quantisation algorithm efficiently creates hierarchical set of descriptors. Finally, the video representation model, supported by the genre classification algorithm, achieves excellent results in an automatic annotation system by linking the video clips with a limited lexicon of related keywords

    Motion recognition using nonparametric image motion models estimated from temporal and multiscale co-occurrence statistics

    Full text link

    Model-Based Environmental Visual Perception for Humanoid Robots

    Get PDF
    The visual perception of a robot should answer two fundamental questions: What? and Where? In order to properly and efficiently reply to these questions, it is essential to establish a bidirectional coupling between the external stimuli and the internal representations. This coupling links the physical world with the inner abstraction models by sensor transformation, recognition, matching and optimization algorithms. The objective of this PhD is to establish this sensor-model coupling

    Development of software for automatic sinchronization between tonalities and colours in audiovisual music therapy

    Full text link
    This end of bachelor project consists in the automation of music and colour synchronization designed to be used in music therapy. The idea behind this concept is using colour as a new dimension to visually interpret the complex variations in music, and this project contributes to it by improving its efficiency through automation. Studies have demonstrated that the tensions in a musical piece are related to the emotion or the mood the piece transmits [1], the same way it has been proved that different colours induce a certain mood or emotion in people [2]. Those conclusions have been used by a doctorate student in the Technical University of Eindhoven to back up the proposal of making emotion the common variable between music and colour [3][4] for further music therapy purposes. This project is the technological part of the later research work mentioned. To develop it, a thorough understanding of the doctorate student’s work is required, to then to propose a software that can fulfil its demands. To start with, this paper studies Music Information Retrieval research, MIDI files format and the relation between them. This is necessary to develop a program capable of reading a MIDI file and converting it into a list of notes with their corresponding timings. Afterwards, high level music features such as chord and tonality changes are extracted, making use of music theory knowledge to reinforce MIR methods. Once the MIDI file has been interpreted into a list of chords, they are expressed as musical intervals, i.e. relative distances between them. This step is done prior to carrying out an automatic mapping of musical intervals to colours, following findings and conclusions to make a coherent matching between both disciplines [3][4]. The music to colour mapping results as a list of colours associated to timings. Finally, to visualize the outcome of the previous steps, coloured lights are changed following the colours list while the music is synchronously played. This last section is controlled by a software that establishes a connection with the lamps via Wi-Fi and executes the change-colour commands. To finish with the project, an evaluation of an external software employed is carried out using a method based on speaker diarization. Finally, conclusions regarding the expectations of the project are made, and ideas for future work and improvement are suggested.Este Trabajo Fin de Grado consiste en la automatización de la sincronización entre música y color, diseñada para fines relacionados con la musicoterapia. La idea detrás de este concepto es utilizar el color como una nueva dimensión capaz de interpretar visualmente las complejas variaciones en la música, y este proyecto contribuye a ello mejorando su eficiencia a través de la automatización. Estudios han demostrado que las tensiones que aparecen en una pieza musical están relacionadas con una emoción o estado de ánimo [1], del mismo modo que se ha demostrado que determinados colores generan una emoción o un estado en la gente [2]. Estas conclusiones han sido utilizadas por una estudiante de doctorado de la Universidad Técnica de Eindhoven para apoyar su propuesta de utilizar la emoción como la variable común entre música y color [3][4] para contribuir a los avances de la musicoterapia. Este proyecto es la parte tecnológica del trabajo de investigación que acaba de ser mencionado. Para desarrollarlo, un minucioso entendimiento de dicho trabajo es necesario, para después poder proponer un software que se ajuste sus necesidades. Para comenzar, en esta memoria se estudian las bases de ‘Music Information Retrieval’, del formato MIDI, y de la relación entre ambos campos. Este estudio es necesario para desarrollar un programa capaz de leer un archivo MIDI, y convertirlo a una lista de notas con sus correspondientes marcas en el tiempo. Después, características musicales de alto nivel, como son detección de acordes o de tonalidad, son extraídas con ayuda de conocimientos de teoría musical para reforzar los métodos de MIR. Una vez el archivo MIDI ha sido interpretado como una lista de acordes, se expresan en forma de intervalos musicales, es decir, distancias relativas entre ellos. Este paso se realiza justo antes de llevar a cabo un mapeo entre intervalos musicales y colores, siguiendo los resultados concluidos por investigación [3][4] para poder emparejar ambas disciplinas de forma coherente. El resultado de dicho mapeo es una lista de colores asociada a sus correspondientes marcas en el tiempo. Finalmente, para visualizar el resultado de los pasos anteriores, luces de colores cambian siguiendo la lista de colores mientras la música suena de forma sincronizada. Esta última sección es controlada por un software que establece una conexión con las luces vía Wi-Fi y que ejecuta los comandos cambio-de-color. Para concluir con el proyecto, se realiza una evaluación del software externo utilizado, empleando un método basado en ‘speaker diarization’. Por último, se desarrollan conclusiones respecto a las expectativas del proyecto, y se proponen ideas y mejoras para un trabajo futuro

    New insights into hierarchical clustering and linguistic normalization for speaker diarization

    Get PDF
    Face au volume croissant de données audio et multimédia, les technologies liées à l'indexation de données et à l'analyse de contenu ont suscité beaucoup d'intérêt dans la communauté scientifique. Parmi celles-ci, la segmentation et le regroupement en locuteurs, répondant ainsi à la question 'Qui parle quand ?' a émergé comme une technique de pointe dans la communauté de traitement de la parole. D'importants progrès ont été réalisés dans le domaine ces dernières années principalement menés par les évaluations internationales du NIST. Tout au long de ces évaluations, deux approches se sont démarquées : l'une est bottom-up et l'autre top-down. L'ensemble des systèmes les plus performants ces dernières années furent essentiellement des systèmes types bottom-up, cependant nous expliquons dans cette thèse que l'approche top-down comporte elle aussi certains avantages. En effet, dans un premier temps, nous montrons qu'après avoir introduit une nouvelle composante de purification des clusters dans l'approche top-down, nous obtenons des performances comparables à celles de l'approche bottom-up. De plus, en étudiant en détails les deux types d'approches nous montrons que celles-ci se comportent différemment face à la discrimination des locuteurs et la robustesse face à la composante lexicale. Ces différences sont alors exploitées au travers d'un nouveau système combinant les deux approches. Enfin, nous présentons une nouvelle technologie capable de limiter l'influence de la composante lexicale, source potentielle d'artefacts dans le regroupement et la segmentation en locuteurs. Notre nouvelle approche se nomme Phone Adaptive Training par analogie au Speaker Adaptive TrainingThe ever-expanding volume of available audio and multimedia data has elevated technologies related to content indexing and structuring to the forefront of research. Speaker diarization, commonly referred to as the who spoke when?' task, is one such example and has emerged as a prominent, core enabling technology in the wider speech processing research community. Speaker diarization involves the detection of speaker turns within an audio document (segmentation) and the grouping together of all same-speaker segments (clustering). Much progress has been made in the field over recent years partly spearheaded by the NIST Rich Transcription evaluations focus on meeting domain, in the proceedings of which are found two general approaches: top-down and bottom-up. Even though the best performing systems over recent years have all been bottom-up approaches we show in this thesis that the top-down approach is not without significant merit. Indeed we first introduce a new purification component leading to competitive performance to the bottom-up approach. Moreover, while investigating the two diarization approaches more thoroughly we show that they behave differently in discriminating between individual speakers and in normalizing unwanted acoustic variation, i.e.\ that which does not pertain to different speakers. This difference of behaviours leads to a new top-down/bottom-up system combination outperforming the respective baseline system. Finally, we introduce a new technology able to limit the influence of linguistic effects, responsible for biasing the convergence of the diarization system. Our novel approach is referred to as Phone Adaptive Training (PAT).PARIS-Télécom ParisTech (751132302) / SudocSudocFranceF

    Diverse Contributions to Implicit Human-Computer Interaction

    Full text link
    Cuando las personas interactúan con los ordenadores, hay mucha información que no se proporciona a propósito. Mediante el estudio de estas interacciones implícitas es posible entender qué características de la interfaz de usuario son beneficiosas (o no), derivando así en implicaciones para el diseño de futuros sistemas interactivos. La principal ventaja de aprovechar datos implícitos del usuario en aplicaciones informáticas es que cualquier interacción con el sistema puede contribuir a mejorar su utilidad. Además, dichos datos eliminan el coste de tener que interrumpir al usuario para que envíe información explícitamente sobre un tema que en principio no tiene por qué guardar relación con la intención de utilizar el sistema. Por el contrario, en ocasiones las interacciones implícitas no proporcionan datos claros y concretos. Por ello, hay que prestar especial atención a la manera de gestionar esta fuente de información. El propósito de esta investigación es doble: 1) aplicar una nueva visión tanto al diseño como al desarrollo de aplicaciones que puedan reaccionar consecuentemente a las interacciones implícitas del usuario, y 2) proporcionar una serie de metodologías para la evaluación de dichos sistemas interactivos. Cinco escenarios sirven para ilustrar la viabilidad y la adecuación del marco de trabajo de la tesis. Resultados empíricos con usuarios reales demuestran que aprovechar la interacción implícita es un medio tanto adecuado como conveniente para mejorar de múltiples maneras los sistemas interactivos.Leiva Torres, LA. (2012). Diverse Contributions to Implicit Human-Computer Interaction [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/17803Palanci

    Estimation de cartes d'énergie du bruit apériodique de la marche humaine avec une caméra de profondeur pour la détection de pathologies et modèles légers de détection d'objets saillants basés sur l'opposition de couleurs

    Full text link
    Cette thèse a pour objectif l’étude de trois problèmes : l’estimation de cartes de saillance de l’énergie du bruit apériodique de la marche humaine par la perception de profondeur pour la détection de pathologies, les modèles de détection d’objets saillants en général et les modèles légers en particulier par l’opposition de couleurs. Comme première contribution, nous proposons un système basé sur une caméra de profondeur et un tapis roulant, qui analyse les parties du corps du patient ayant un mouvement irrégulier, en termes de périodicité, pendant la marche. Nous supposons que la marche d'un sujet sain présente n'importe où dans son corps, pendant les cycles de marche, un signal de profondeur avec un motif périodique sans bruit. La présence de bruit et son importance peuvent être utilisées pour signaler la présence et l'étendue de pathologies chez le sujet. Notre système estime, à partir de chaque séquence vidéo, une carte couleur de saillance montrant les zones de fortes irrégularités de marche, en termes de périodicité, appelées énergie de bruit apériodique, de chaque sujet. Notre système permet aussi de détecter automatiquement les cartes des individus sains et ceux malades. Nous présentons ensuite deux approches pour la détection d’objets saillants. Bien qu’ayant fait l’objet de plusieurs travaux de recherche, la détection d'objets saillants reste un défi. La plupart des modèles traitent la couleur et la texture séparément et les considèrent donc implicitement comme des caractéristiques indépendantes, à tort. Comme deuxième contribution, nous proposons une nouvelle stratégie, à travers un modèle simple, presque sans paramètres internes, générant une carte de saillance robuste pour une image naturelle. Cette stratégie consiste à intégrer la couleur dans les motifs de texture pour caractériser une micro-texture colorée, ceci grâce au motif ternaire local (LTP) (descripteur de texture simple mais puissant) appliqué aux paires de couleurs. La dissemblance entre chaque paire de micro-textures colorées est calculée en tenant compte de la non-linéarité des micro-textures colorées et en préservant leurs distances, donnant une carte de saillance intermédiaire pour chaque espace de couleur. La carte de saillance finale est leur combinaison pour avoir des cartes robustes. Le développement des réseaux de neurones profonds a récemment permis des performances élevées. Cependant, il reste un défi de développer des modèles de même performance pour des appareils avec des ressources limitées. Comme troisième contribution, nous proposons une nouvelle approche pour un modèle léger de réseau neuronal profond de détection d'objets saillants, inspiré par les processus de double opposition du cortex visuel primaire, qui lient inextricablement la couleur et la forme dans la perception humaine des couleurs. Notre modèle proposé, CoSOV1net, est entraîné à partir de zéro, sans utiliser de ``backbones'' de classification d'images ou d'autres tâches. Les expériences sur les ensembles de données les plus utilisés et les plus complexes pour la détection d'objets saillants montrent que CoSOV1Net atteint des performances compétitives avec des modèles de l’état-de-l’art, tout en étant un modèle léger de détection d'objets saillants et pouvant être adapté aux environnements mobiles et aux appareils à ressources limitées.The purpose of this thesis is to study three problems: the estimation of saliency maps of the aperiodic noise energy of human gait using depth perception for pathology detection, and to study models for salient objects detection in general and lightweight models in particular by color opposition. As our first contribution, we propose a system based on a depth camera and a treadmill, which analyzes the parts of the patient's body with irregular movement, in terms of periodicity, during walking. We assume that a healthy subject gait presents anywhere in his (her) body, during gait cycles, a depth signal with a periodic pattern without noise. The presence of noise and its importance can be used to point out presence and extent of the subject’s pathologies. Our system estimates, from each video sequence, a saliency map showing the areas of strong gait irregularities, in terms of periodicity, called aperiodic noise energy, of each subject. Our system also makes it possible to automatically detect the saliency map of healthy and sick subjects. We then present two approaches for salient objects detection. Although having been the subject of many research works, salient objects detection remains a challenge. Most models treat color and texture separately and therefore implicitly consider them as independent feature, erroneously. As a second contribution, we propose a new strategy through a simple model, almost without internal parameters, generating a robust saliency map for a natural image. This strategy consists in integrating color in texture patterns to characterize a colored micro-texture thanks to the local ternary pattern (LTP) (simple but powerful texture descriptor) applied to the color pairs. The dissimilarity between each colored micro-textures pair is computed considering non-linearity from colored micro-textures and preserving their distances. This gives an intermediate saliency map for each color space. The final saliency map is their combination to have robust saliency map. The development of deep neural networks has recently enabled high performance. However, it remains a challenge to develop models of the same performance for devices with limited resources. As a third contribution, we propose a new approach for a lightweight salient objects detection deep neural network model, inspired by the double opponent process in the primary visual cortex, which inextricably links color and shape in human color perception. Our proposed model, namely CoSOV1net, is trained from scratch, without using any image classification backbones or other tasks. Experiments on the most used and challenging datasets for salient objects detection show that CoSOV1Net achieves competitive performance with state-of-the-art models, yet it is a lightweight detection model and it is a salient objects detection that can be adapted to mobile environments and resource-constrained devices

    Contexts and Contributions: Building the Distributed Library

    Get PDF
    This report updates and expands on A Survey of Digital Library Aggregation Services, originally commissioned by the DLF as an internal report in summer 2003, and released to the public later that year. It highlights major developments affecting the ecosystem of scholarly communications and digital libraries since the last survey and provides an analysis of OAI implementation demographics, based on a comparative review of repository registries and cross-archive search services. Secondly, it reviews the state-of-practice for a cohort of digital library aggregation services, grouping them in the context of the problem space to which they most closely adhere. Based in part on responses collected in fall 2005 from an online survey distributed to the original core services, the report investigates the purpose, function and challenges of next-generation aggregation services. On a case-by-case basis, the advances in each service are of interest in isolation from each other, but the report also attempts to situate these services in a larger context and to understand how they fit into a multi-dimensional and interdependent ecosystem supporting the worldwide community of scholars. Finally, the report summarizes the contributions of these services thus far and identifies obstacles requiring further attention to realize the goal of an open, distributed digital library system

    Front-Line Physicians' Satisfaction with Information Systems in Hospitals

    Get PDF
    Day-to-day operations management in hospital units is difficult due to continuously varying situations, several actors involved and a vast number of information systems in use. The aim of this study was to describe front-line physicians' satisfaction with existing information systems needed to support the day-to-day operations management in hospitals. A cross-sectional survey was used and data chosen with stratified random sampling were collected in nine hospitals. Data were analyzed with descriptive and inferential statistical methods. The response rate was 65 % (n = 111). The physicians reported that information systems support their decision making to some extent, but they do not improve access to information nor are they tailored for physicians. The respondents also reported that they need to use several information systems to support decision making and that they would prefer one information system to access important information. Improved information access would better support physicians' decision making and has the potential to improve the quality of decisions and speed up the decision making process.Peer reviewe

    Consensus-Based Data Management within Fog Computing For the Internet of Things

    Get PDF
    The Internet of Things (IoT) infrastructure forms a gigantic network of interconnected and interacting devices. This infrastructure involves a new generation of service delivery models, more advanced data management and policy schemes, sophisticated data analytics tools, and effective decision making applications. IoT technology brings automation to a new level wherein nodes can communicate and make autonomous decisions in the absence of human interventions. IoT enabled solutions generate and process enormous volumes of heterogeneous data exchanged among billions of nodes. This results in Big Data congestion, data management, storage issues and various inefficiencies. Fog Computing aims at solving the issues with data management as it includes intelligent computational components and storage closer to the data sources. Often, an IoT-enabled infrastructure is shared among many users with various requirements. Sharing resources, sharing operational costs and collective decision making (consensus) among many stakeholders is frequently neglected. This research addresses an essential requirement for adaptive, autonomous and consensus-based Fog computational solutions which are able to support distributed and in-network schemes and policies. These network schemes and policies need to meet the requirements of many users. In this work, innovative consensus-based computational solutions are investigated. These proposed solutions aim to correlate and organise data for effective management and decision making in Fog. Instead of individual decision making, the algorithms aim to aggregate several decisions into a consensus decision representing a collective agreement, benefiting from the individuals variant knowledge and meeting multiple stakeholders requirements. In order to validate the proposed solutions, hybrid research methodology is involved that includes the design of a test-bed and the execution of several experiments. In order to investigate the effectiveness of the paradigm, three experiments were designed and validated. Real-life sensor data and synthetic statistical data was collected, processed and analysed. Bayesian Machine Learning models and Analytics were used to consolidate the design and evaluate the performance of the algorithms. In the Fog environment, the first scenario tests the Aggregation by Distribution algorithm. The solution contribute in achieving a notable efficiency of data delivery obtained with a minimal loss in precision. The second scenario validates the merits of the approach in predicting the activities of high mobility IoT applications. The third scenario tests the applications related to smart home IoT. All proposed Consensus algorithms use statistical analysis to support effective decision making in Fog and enable data aggregation for optimal storage, data transmission, processing and analytics. The final results of all experiments showed that all the implemented consensus approaches surpass the individual ones in different performance terms. Formal results also showed that the paradigm is a good fit in many IoT environments and can be suitable for different scenarios when applying data analysis to correlate data with the design. Finally, the design demonstrates that Fog Computing can compete with Cloud Computing in terms of accuracy with an added preference of locality
    corecore