2,748 research outputs found
Efficient Semantic Segmentation on Edge Devices
Semantic segmentation works on the computer vision algorithm for assigning
each pixel of an image into a class. The task of semantic segmentation should
be performed with both accuracy and efficiency. Most of the existing deep FCNs
yield to heavy computations and these networks are very power hungry,
unsuitable for real-time applications on portable devices. This project
analyzes current semantic segmentation models to explore the feasibility of
applying these models for emergency response during catastrophic events. We
compare the performance of real-time semantic segmentation models with
non-real-time counterparts constrained by aerial images under oppositional
settings. Furthermore, we train several models on the Flood-Net dataset,
containing UAV images captured after Hurricane Harvey, and benchmark their
execution on special classes such as flooded buildings vs. non-flooded
buildings or flooded roads vs. non-flooded roads. In this project, we developed
a real-time UNet based model and deployed that network on Jetson AGX Xavier
module
Multi-sensor human action recognition with particular application to tennis event-based indexing
The ability to automatically classify human actions and activities using vi- sual sensors or by analysing body worn sensor data has been an active re- search area for many years. Only recently with advancements in both fields and the ubiquitous nature of low cost sensors in our everyday lives has auto- matic human action recognition become a reality. While traditional sports coaching systems rely on manual indexing of events from a single modality, such as visual or inertial sensors, this thesis investigates the possibility of cap- turing and automatically indexing events from multimodal sensor streams. In this work, we detail a novel approach to infer human actions by fusing multimodal sensors to improve recognition accuracy. State of the art visual action recognition approaches are also investigated. Firstly we apply these action recognition detectors to basic human actions in a non-sporting con- text. We then perform action recognition to infer tennis events in a tennis court instrumented with cameras and inertial sensing infrastructure. The system proposed in this thesis can use either visual or inertial sensors to au- tomatically recognise the main tennis events during play. A complete event retrieval system is also presented to allow coaches to build advanced queries, which existing sports coaching solutions cannot facilitate, without an inordi- nate amount of manual indexing. The event retrieval interface is evaluated against a leading commercial sports coaching tool in terms of both usability and efficiency
Unifying terrain awareness for the visually impaired through real-time semantic segmentation.
Navigational assistance aims to help visually-impaired people to ambulate the environment safely and independently. This topic becomes challenging as it requires detecting a wide variety of scenes to provide higher level assistive awareness. Vision-based technologies with monocular detectors or depth sensors have sprung up within several years of research. These separate approaches have achieved remarkable results with relatively low processing time and have improved the mobility of impaired people to a large extent. However, running all detectors jointly increases the latency and burdens the computational resources. In this paper, we put forward seizing pixel-wise semantic segmentation to cover navigation-related perception needs in a unified way. This is critical not only for the terrain awareness regarding traversable areas, sidewalks, stairs and water hazards, but also for the avoidance of short-range obstacles, fast-approaching pedestrians and vehicles. The core of our unification proposal is a deep architecture, aimed at attaining efficient semantic understanding. We have integrated the approach in a wearable navigation system by incorporating robust depth segmentation. A comprehensive set of experiments prove the qualified accuracy over state-of-the-art methods while maintaining real-time speed. We also present a closed-loop field test involving real visually-impaired users, demonstrating the effectivity and versatility of the assistive framework
A review on intelligent monitoring and activity interpretation
This survey paper provides a tour of the various monitoring and activity interpretation frameworks found in the literature. The needs of monitoring and interpretation systems are presented in relation to the area where they have been developed or applied. Their evolution is studied to better understand the characteristics of current systems. After this, the main features of monitoring and activity interpretation systems are defined.Este trabajo presenta una revisión de los marcos de trabajo para monitorización e interpretación de actividades presentes en la literatura. Dependiendo del área donde dichos marcos se han desarrollado o aplicado, se han identificado diferentes necesidades. Además, para comprender mejor las particularidades de los marcos de trabajo, esta revisión realiza un recorrido por su evolución histórica. Posteriormente, se definirían las principales características de los sistemas de monitorización e interpretación de actividades.This work was partially supported by Spanish Ministerio de Economía y Competitividad / FEDER under DPI2016-80894-R grant
Wireless End-to-End Image Transmission System using Semantic Communications
Semantic communication is considered the future of mobile communication,
which aims to transmit data beyond Shannon's theorem of communications by
transmitting the semantic meaning of the data rather than the bit-by-bit
reconstruction of the data at the receiver's end. The semantic communication
paradigm aims to bridge the gap of limited bandwidth problems in modern
high-volume multimedia application content transmission. Integrating AI
technologies with the 6G communications networks paved the way to develop
semantic communication-based end-to-end communication systems. In this study,
we have implemented a semantic communication-based end-to-end image
transmission system, and we discuss potential design considerations in
developing semantic communication systems in conjunction with physical channel
characteristics. A Pre-trained GAN network is used at the receiver as the
transmission task to reconstruct the realistic image based on the Semantic
segmented image at the receiver input. The semantic segmentation task at the
transmitter (encoder) and the GAN network at the receiver (decoder) is trained
on a common knowledge base, the COCO-Stuff dataset. The research shows that the
resource gain in the form of bandwidth saving is immense when transmitting the
semantic segmentation map through the physical channel instead of the ground
truth image in contrast to conventional communication systems. Furthermore, the
research studies the effect of physical channel distortions and quantization
noise on semantic communication-based multimedia content transmission.Comment: Accepted for IEEE Acces
- …