3,719 research outputs found
Optimized mobile thin clients through a MPEG-4 BiFS semantic remote display framework
According to the thin client computing principle, the user interface is physically separated from the application logic. In practice only a viewer component is executed on the client device, rendering the display updates received from the distant application server and capturing the user interaction. Existing remote display frameworks are not optimized to encode the complex scenes of modern applications, which are composed of objects with very diverse graphical characteristics. In order to tackle this challenge, we propose to transfer to the client, in addition to the binary encoded objects, semantic information about the characteristics of each object. Through this semantic knowledge, the client is enabled to react autonomously on user input and does not have to wait for the display update from the server. Resulting in a reduction of the interaction latency and a mitigation of the bursty remote display traffic pattern, the presented framework is of particular interest in a wireless context, where the bandwidth is limited and expensive. In this paper, we describe a generic architecture of a semantic remote display framework. Furthermore, we have developed a prototype using the MPEG-4 Binary Format for Scenes to convey the semantic information to the client. We experimentally compare the bandwidth consumption of MPEG-4 BiFS with existing, non-semantic, remote display frameworks. In a text editing scenario, we realize an average reduction of 23% of the data peaks that are observed in remote display protocol traffic
Agent AI: Surveying the Horizons of Multimodal Interaction
Multi-modal AI systems will likely become a ubiquitous presence in our
everyday lives. A promising approach to making these systems more interactive
is to embody them as agents within physical and virtual environments. At
present, systems leverage existing foundation models as the basic building
blocks for the creation of embodied agents. Embedding agents within such
environments facilitates the ability of models to process and interpret visual
and contextual data, which is critical for the creation of more sophisticated
and context-aware AI systems. For example, a system that can perceive user
actions, human behavior, environmental objects, audio expressions, and the
collective sentiment of a scene can be used to inform and direct agent
responses within the given environment. To accelerate research on agent-based
multimodal intelligence, we define "Agent AI" as a class of interactive systems
that can perceive visual stimuli, language inputs, and other
environmentally-grounded data, and can produce meaningful embodied actions. In
particular, we explore systems that aim to improve agents based on
next-embodied action prediction by incorporating external knowledge,
multi-sensory inputs, and human feedback. We argue that by developing agentic
AI systems in grounded environments, one can also mitigate the hallucinations
of large foundation models and their tendency to generate environmentally
incorrect outputs. The emerging field of Agent AI subsumes the broader embodied
and agentic aspects of multimodal interactions. Beyond agents acting and
interacting in the physical world, we envision a future where people can easily
create any virtual reality or simulated scene and interact with agents embodied
within the virtual environment
Análise computacional de aspetos de comunicação não verbal em contextos de grupo
Human communication is a major field of study in psychology and social sciences.
Topics such as emergent leadership and group dynamics are commonly studied
cases when referring to group settings. Experiments regarding group settings are
usually analyzed in conversational and collaborative tasks environments in order to
study the communication process in small groups.
Former studies’ methods involve human analysis and manual annotation of others’
behaviors in communication settings. Later studies try to replace time consuming
and failure prone annotations by resorting to computational methods.
Having a custom, newly-gathered audiovisual dataset, from an experiment conducted
by the Department of Education and Psychology of the University of Aveiro,
a multidisciplinary group from the same institution with members from psychology
and engineering backgrounds, took the initiative to create computational methods
in order to facilitate the analysis of the collected data.
For that purpose, this work presents a multimodal computational framework using
state-of-the-art methods in computer vision, capable of enriching image data with
annotations of a broad range of nonverbal communication aspects, both at an
individual and group levels, thus facilitating the study of nonverbal communication
and group dynamics.
This works contributes to the community by presenting methods to directly increase
human knowledge about the human communication process, involving data
transformation processes in order to transform raw feature data into humanly understandable
meanings and a visualization tool capable of visualizing such methods
applied to the input data.A comunicação humana é uma grande área de estudo na psicologia e ciências
sociais. Temas como liderança emergente e dinâmicas de grupo são temas frequentemente
estudados quando se estudam contextos de grupo. Dentro da área
de estudos sobre contextos de grupo analisam-se situações de conversação e realização
de tarefas colaborativas em grupos de pequena dimensão.
Estudos primordiais envolviam análise e anotação humana para a anotação dos
comportamentos revelados nas experiências realizadas, equanto que estudos mais
recentes tendem a adotar métodos computacionais de forma a susbtituir os métodos
anteriormente usados por serem dispendiosos em termos de tempo e propÃcios a
erros.
Tendo como caso de estudo um conjunto de dados audiovisuais de uma experiência
realizada pelo Departamento de Educação e Psicologia da Universidade de Aveiro,
um grupo de investigação multidisciplinar das áreas da psicologia e engenharia
tomou a iniciativa de desenvolver métodos computacionais capazes de facilitar o
processo de análise dos dados recolhidos.
Como tal, este trabalho apresenta uma abordagem computacional multimodal,
utilizando métodos "Estado da arte", capaz de enriquecer os dados visuais com
anotações de uma larga extensão de aspetos de comunicação não-verbal, tanto
a nÃvel individual como de grupo, facilitando assim o estudo da comunicação em
geral e das dinâmicas de grupo.
Este trabalho contribui para a comunidade, fornecendo métodos para aumentar
o conhecimento existente sobre o processo de comunicação humana, incluindo
processos de transformação de dados, desde dados numéricos de baixa interpretação
para informação interpretável e compreensÃvel, assim como uma ferramenta de
visualização capaz de apresentar tais métodos aplicados aos dados de entrada.Mestrado em Engenharia Informátic
Rapid Prototyping for Virtual Environments
Development of Virtual Environment (VE) applications is challenging where application developers are required to have expertise in the target VE technologies along with the problem domain expertise. New VE technologies impose a significant learning curve to even the most experienced VE developer. The proposed solution relies on synthesis to automate the migration of a VE application to a new unfamiliar VE platform/technology. To solve the problem, the Common Scene Definition Framework (CSDF) is developed, that serves as a superset/model representation of the target virtual world. Input modules are developed to populate the framework with the capabilities of the virtual world imported from VRML 2.0 and X3D formats. The synthesis capability is built into the framework to synthesize the virtual world into a subset of VRML 2.0, VRML 1.0, X3D, Java3D, JavaFX, JavaME, and OpenGL technologies, which may reside on different platforms. Interfaces are designed to keep the framework extensible to different and new VE formats/technologies. The framework demonstrated the ability to quickly synthesize a working prototype of the input virtual environment in different VE formats
Test automation in designware HDMI RX
The first rule of any technology used in a business is that automation applied to any operation will magnify the efficiency. The second is that automation applied to an inefficient operation will magnify the inefficiency. HDMI (High-Definition Multimedia Interface) is a leading technology in audio/video interface for transmitting uncompressed digital multimedia data. The limelight of this project is to automate the test procedure involved for testing HDMI-RX by using developed or customized software tools (Jenkins, python scripts,..etc.) targeting to reduce engineering manpower effort. The focus is not only on reducing the testing time but also to maintain an effective record of the final design parameters and ensuring exact result reproducibility. The testing of hardware components involves proceeding through several steps based on design parameters, evaluating results, and storing records to a File
Fast human behavior analysis for scene understanding
Human behavior analysis has become an active topic of great interest and relevance for a number of applications and areas of research. The research in recent years has been considerably driven by the growing level of criminal behavior in large urban areas and increase of terroristic actions. Also, accurate behavior studies have been applied to sports analysis systems and are emerging in healthcare. When compared to conventional action recognition used in security applications, human behavior analysis techniques designed for embedded applications should satisfy the following technical requirements: (1) Behavior analysis should provide scalable and robust results; (2) High-processing efficiency to achieve (near) real-time operation with low-cost hardware; (3) Extensibility for multiple-camera setup including 3-D modeling to facilitate human behavior understanding and description in various events. The key to our problem statement is that we intend to improve behavior analysis performance while preserving the efficiency of the designed techniques, to allow implementation in embedded environments. More specifically, we look into (1) fast multi-level algorithms incorporating specific domain knowledge, and (2) 3-D configuration techniques for overall enhanced performance. If possible, we explore the performance of the current behavior-analysis techniques for improving accuracy and scalability. To fulfill the above technical requirements and tackle the research problems, we propose a flexible behavior-analysis framework consisting of three processing-layers: (1) pixel-based processing (background modeling with pixel labeling), (2) object-based modeling (human detection, tracking and posture analysis), and (3) event-based analysis (semantic event understanding). In Chapter 3, we specifically contribute to the analysis of individual human behavior. A novel body representation is proposed for posture classification based on a silhouette feature. Only pure binary-shape information is used for posture classification without texture/color or any explicit body models. To this end, we have studied an efficient HV-PCA shape-based descriptor with temporal modeling, which achieves a posture-recognition accuracy rate of about 86% and outperforms other existing proposals. As our human motion scheme is efficient and achieves a fast performance (6-8 frames/second), it enables a fast surveillance system or further analysis of human behavior. In addition, a body-part detection approach is presented. The color and body ratio are combined to provide clues for human body detection and classification. The conventional assumption of up-right body posture is not required. Afterwards, we design and construct a specific framework for fast algorithms and apply them in two applications: tennis sports analysis and surveillance. Chapter 4 deals with tennis sports analysis and presents an automatic real-time system for multi-level analysis of tennis video sequences. First, we employ a 3-D camera model to bridge the pixel-level, object-level and scene-level of tennis sports analysis. Second, a weighted linear model combining the visual cues in the real-world domain is proposed to identify various events. The experimentally found event extraction rate of the system is about 90%. Also, audio signals are combined to enhance the scene analysis performance. The complete proposed application is efficient enough to obtain a real-time or near real-time performance (2-3 frames/second for 720×576 resolution, and 5-7 frames/second for 320×240 resolution, with a P-IV PC running at 3GHz). Chapter 5 addresses surveillance and presents a full real-time behavior-analysis framework, featuring layers at pixel, object, event and visualization level. More specifically, this framework captures the human motion, classifies its posture, infers the semantic event exploiting interaction modeling, and performs the 3-D scene reconstruction. We have introduced our system design based on a specific software architecture, by employing the well-known "4+1" view model. In addition, human behavior analysis algorithms are directly designed for real-time operation and embedded in an experimental runtime AV content-analysis architecture. This executable system is designed to be generic for multiple streaming applications with component-based architectures. To evaluate the performance, we have applied this networked system in a single-camera setup. The experimental platform operates with two Pentium Quadcore engines (2.33 GHz) and 4-GB memory. Performance evaluations have shown that this networked framework is efficient and achieves a fast performance (13-15 frames/second) for monocular video sequences. Moreover, a dual-camera setup is tested within the behavior-analysis framework. After automatic camera calibration is conducted, the 3-D reconstruction and communication among different cameras are achieved. The extra view in the multi-camera setup improves the human tracking and event detection in case of occlusion. This extension of multiple-view fusion improves the event-based semantic analysis by 8.3-16.7% in accuracy rate. The detailed studies of two experimental intelligent applications, i.e., tennis sports analysis and surveillance, have proven their value in several extensive tests in the framework of the European Candela and Cantata ITEA research programs, where our proposed system has demonstrated competitive performance with respect to accuracy and efficiency
Real-time video vontent analysis tool for consumer media storage system
Current loop buffer organizations for very large instruction word processors are essentially centralized. As a consequence, they are energy inefficient and their scalability is limited. To alleviate this problem, we propose a clustered loop buffer organization, where the loop buffers are partitioned and functional units are logically grouped to form clusters, along with two schemes for buffer control, which regulate the activity in each cluster. Furthermore, we propose a design-time scheme to generate clusters by analyzing an application profile and grouping closely related functional units. The simulation results indicate that the energy consumed in the clustered loop buffers is, on average, 63 percent lower than the energy consumed in an uncompressed centralized loop buffer scheme, 35 percent lower than a centralized compressed loop buffer scheme, and 22 percent lower than a randomly clustered loop buffer scheme
- …