8,805 research outputs found
UMSL Bulletin 2023-2024
The 2023-2024 Bulletin and Course Catalog for the University of Missouri St. Louis.https://irl.umsl.edu/bulletin/1088/thumbnail.jp
SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion
Semantic scene completion (SSC) jointly predicts the semantics and geometry
of the entire 3D scene, which plays an essential role in 3D scene understanding
for autonomous driving systems. SSC has achieved rapid progress with the help
of semantic context in segmentation. However, how to effectively exploit the
relationships between the semantic context in semantic segmentation and
geometric structure in scene completion remains under exploration. In this
paper, we propose to solve outdoor SSC from the perspective of representation
separation and BEV fusion. Specifically, we present the network, named SSC-RS,
which uses separate branches with deep supervision to explicitly disentangle
the learning procedure of the semantic and geometric representations. And a BEV
fusion network equipped with the proposed Adaptive Representation Fusion (ARF)
module is presented to aggregate the multi-scale features effectively and
efficiently. Due to the low computational burden and powerful representation
ability, our model has good generality while running in real-time. Extensive
experiments on SemanticKITTI demonstrate our SSC-RS achieves state-of-the-art
performance.Comment: 8 pages, 5 figures, IROS202
Focused Decoding Enables 3D Anatomical Detection by Transformers
Detection Transformers represent end-to-end object detection approaches based
on a Transformer encoder-decoder architecture, exploiting the attention
mechanism for global relation modeling. Although Detection Transformers deliver
results on par with or even superior to their highly optimized CNN-based
counterparts operating on 2D natural images, their success is closely coupled
to access to a vast amount of training data. This, however, restricts the
feasibility of employing Detection Transformers in the medical domain, as
access to annotated data is typically limited. To tackle this issue and
facilitate the advent of medical Detection Transformers, we propose a novel
Detection Transformer for 3D anatomical structure detection, dubbed Focused
Decoder. Focused Decoder leverages information from an anatomical region atlas
to simultaneously deploy query anchors and restrict the cross-attention's field
of view to regions of interest, which allows for a precise focus on relevant
anatomical structures. We evaluate our proposed approach on two publicly
available CT datasets and demonstrate that Focused Decoder not only provides
strong detection results and thus alleviates the need for a vast amount of
annotated data but also exhibits exceptional and highly intuitive
explainability of results via attention weights. Our code is available at
https://github.com/bwittmann/transoar.Comment: Accepted for publication at the Journal of Machine Learning for
Biomedical Imaging (MELBA) https://melba-journal.org/2023:00
Semantic-aware Transmission for Robust Point Cloud Classification
As three-dimensional (3D) data acquisition devices become increasingly
prevalent, the demand for 3D point cloud transmission is growing. In this
study, we introduce a semantic-aware communication system for robust point
cloud classification that capitalizes on the advantages of pre-trained
Point-BERT models. Our proposed method comprises four main components: the
semantic encoder, channel encoder, channel decoder, and semantic decoder. By
employing a two-stage training strategy, our system facilitates efficient and
adaptable learning tailored to the specific classification tasks. The results
show that the proposed system achieves classification accuracy of over 89\%
when SNR is higher than 10 dB and still maintains accuracy above 66.6\% even at
SNR of 4 dB. Compared to the existing method, our approach performs at 0.8\% to
48\% better across different SNR values, demonstrating robustness to channel
noise. Our system also achieves a balance between accuracy and speed, being
computationally efficient while maintaining high classification performance
under noisy channel conditions. This adaptable and resilient approach holds
considerable promise for a wide array of 3D scene understanding applications,
effectively addressing the challenges posed by channel noise.Comment: submitted to globecom 202
IoT-Based Vehicle Monitoring and Driver Assistance System Framework for Safety and Smart Fleet Management
Curbing road accidents has always been one of the utmost priorities in every country. In Malaysia, Traffic Investigation and Enforcement Department reported that Malaysia’s total number of road accidents has increased from 373,071 to 533,875 in the last decade. One of the significant causes of road accidents is driver’s behaviours. However, drivers’ behaviour was challenging to regulate by the enforcement team or fleet operators, especially heavy vehicles. We proposed adopting the Internet of Things (IoT) and its’ emerging technologies to monitor and alert driver’s behavioural and driving patterns in reducing road accidents. In this work, we proposed a lane tracking and iris detection algorithm to monitor and alert the driver’s behaviour when the vehicle sways away from the lane and the driver feeling drowsy, respectively. We implemented electronic devices such as cameras, a global positioning system module, a global system communication module, and a microcontroller as an intelligent transportation system in the vehicle. We implemented face recognition for person identification using the same in-vehicle camera and recorded the working duration for authentication and operation health monitoring, respectively. With the GPS module, we monitored and alerted against permissible vehicle’s speed accordingly. We integrated IoT on the system for the fleet centre to monitor and alert the driver’s behavioural activities in real-time through the user access portal. We validated it successfully on Malaysian roads. The outcome of this pilot project benefits the safety of drivers, public road users, and passengers. The impact of this framework leads to a new regulation by the government agencies towards merit and demerit system, real-time fleet monitoring of intelligent transportation systems, and socio-economy such as cheaper health premiums. The big data can be used to predict the driver’s behavioural in the future
Modular lifelong machine learning
Deep learning has drastically improved the state-of-the-art in many important fields, including computer vision and natural language processing (LeCun et al., 2015). However, it is expensive to train a deep neural network on a machine learning problem. The overall training cost further increases when one wants to solve additional problems. Lifelong machine learning (LML) develops algorithms that aim to efficiently learn to solve a sequence of problems, which become available one at a time. New problems are solved with less resources by transferring previously learned knowledge. At the same time, an LML algorithm needs to retain good performance on all encountered problems, thus avoiding catastrophic forgetting. Current approaches do not possess all the desired properties of an LML algorithm. First, they primarily focus on preventing catastrophic forgetting (Diaz-Rodriguez et al., 2018; Delange et al., 2021). As a result, they neglect some knowledge transfer properties. Furthermore, they assume that all problems in a sequence share the same input space. Finally, scaling these methods to a large sequence of problems remains a challenge.
Modular approaches to deep learning decompose a deep neural network into sub-networks, referred to as modules. Each module can then be trained to perform an atomic transformation, specialised in processing a distinct subset of inputs. This modular approach to storing knowledge makes it easy to only reuse the subset of modules which are useful for the task at hand.
This thesis introduces a line of research which demonstrates the merits of a modular approach to lifelong machine learning, and its ability to address the aforementioned shortcomings of other methods. Compared to previous work, we show that a modular approach can be used to achieve more LML properties than previously demonstrated. Furthermore, we develop tools which allow modular LML algorithms to scale in order to retain said properties on longer sequences of problems.
First, we introduce HOUDINI, a neurosymbolic framework for modular LML. HOUDINI represents modular deep neural networks as functional programs and accumulates a library of pre-trained modules over a sequence of problems. Given a new problem, we use program synthesis to select a suitable neural architecture, as well as a high-performing combination of pre-trained and new modules. We show that our approach has most of the properties desired from an LML algorithm. Notably, it can perform forward transfer, avoid negative transfer and prevent catastrophic forgetting, even across problems with disparate input domains and problems which require different neural architectures.
Second, we produce a modular LML algorithm which retains the properties of HOUDINI but can also scale to longer sequences of problems. To this end, we fix the choice of a neural architecture and introduce a probabilistic search framework, PICLE, for searching through different module combinations. To apply PICLE, we introduce two probabilistic models over neural modules which allows us to efficiently identify promising module combinations.
Third, we phrase the search over module combinations in modular LML as black-box optimisation, which allows one to make use of methods from the setting of hyperparameter optimisation (HPO). We then develop a new HPO method which marries a multi-fidelity approach with model-based optimisation. We demonstrate that this leads to improvement in anytime performance in the HPO setting and discuss how this can in turn be used to augment modular LML methods.
Overall, this thesis identifies a number of important LML properties, which have not all been attained in past methods, and presents an LML algorithm which can achieve all of them, apart from backward transfer
HEAL-SWIN: A Vision Transformer On The Sphere
High-resolution wide-angle fisheye images are becoming more and more
important for robotics applications such as autonomous driving. However, using
ordinary convolutional neural networks or vision transformers on this data is
problematic due to projection and distortion losses introduced when projecting
to a rectangular grid on the plane. We introduce the HEAL-SWIN transformer,
which combines the highly uniform Hierarchical Equal Area iso-Latitude
Pixelation (HEALPix) grid used in astrophysics and cosmology with the
Hierarchical Shifted-Window (SWIN) transformer to yield an efficient and
flexible model capable of training on high-resolution, distortion-free
spherical data. In HEAL-SWIN, the nested structure of the HEALPix grid is used
to perform the patching and windowing operations of the SWIN transformer,
resulting in a one-dimensional representation of the spherical data with
minimal computational overhead. We demonstrate the superior performance of our
model for semantic segmentation and depth regression tasks on both synthetic
and real automotive datasets. Our code is available at
https://github.com/JanEGerken/HEAL-SWIN.Comment: Main body: 10 pages, 7 figures. Appendices: 4 pages, 2 figure
The State of the Art in Deep Learning Applications, Challenges, and Future Prospects::A Comprehensive Review of Flood Forecasting and Management
Floods are a devastating natural calamity that may seriously harm both infrastructure and people. Accurate flood forecasts and control are essential to lessen these effects and safeguard populations. By utilizing its capacity to handle massive amounts of data and provide accurate forecasts, deep learning has emerged as a potent tool for improving flood prediction and control. The current state of deep learning applications in flood forecasting and management is thoroughly reviewed in this work. The review discusses a variety of subjects, such as the data sources utilized, the deep learning models used, and the assessment measures adopted to judge their efficacy. It assesses current approaches critically and points out their advantages and disadvantages. The article also examines challenges with data accessibility, the interpretability of deep learning models, and ethical considerations in flood prediction. The report also describes potential directions for deep-learning research to enhance flood predictions and control. Incorporating uncertainty estimates into forecasts, integrating many data sources, developing hybrid models that mix deep learning with other methodologies, and enhancing the interpretability of deep learning models are a few of these. These research goals can help deep learning models become more precise and effective, which will result in better flood control plans and forecasts. Overall, this review is a useful resource for academics and professionals working on the topic of flood forecasting and management. By reviewing the current state of the art, emphasizing difficulties, and outlining potential areas for future study, it lays a solid basis. Communities may better prepare for and lessen the destructive effects of floods by implementing cutting-edge deep learning algorithms, thereby protecting people and infrastructure
Autonomy 2.0: The Quest for Economies of Scale
With the advancement of robotics and AI technologies in the past decade, we
have now entered the age of autonomous machines. In this new age of information
technology, autonomous machines, such as service robots, autonomous drones,
delivery robots, and autonomous vehicles, rather than humans, will provide
services. In this article, through examining the technical challenges and
economic impact of the digital economy, we argue that scalability is both
highly necessary from a technical perspective and significantly advantageous
from an economic perspective, thus is the key for the autonomy industry to
achieve its full potential. Nonetheless, the current development paradigm,
dubbed Autonomy 1.0, scales with the number of engineers, instead of with the
amount of data or compute resources, hence preventing the autonomy industry to
fully benefit from the economies of scale, especially the exponentially
cheapening compute cost and the explosion of available data. We further analyze
the key scalability blockers and explain how a new development paradigm, dubbed
Autonomy 2.0, can address these problems to greatly boost the autonomy
industry
Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques
The rapid growth of demanding applications in domains applying multimedia
processing and machine learning has marked a new era for edge and cloud
computing. These applications involve massive data and compute-intensive tasks,
and thus, typical computing paradigms in embedded systems and data centers are
stressed to meet the worldwide demand for high performance. Concurrently, the
landscape of the semiconductor field in the last 15 years has constituted power
as a first-class design concern. As a result, the community of computing
systems is forced to find alternative design approaches to facilitate
high-performance and/or power-efficient computing. Among the examined
solutions, Approximate Computing has attracted an ever-increasing interest,
with research works applying approximations across the entire traditional
computing stack, i.e., at software, hardware, and architectural levels. Over
the last decade, there is a plethora of approximation techniques in software
(programs, frameworks, compilers, runtimes, languages), hardware (circuits,
accelerators), and architectures (processors, memories). The current article is
Part I of our comprehensive survey on Approximate Computing, and it reviews its
motivation, terminology and principles, as well it classifies and presents the
technical details of the state-of-the-art software and hardware approximation
techniques.Comment: Under Review at ACM Computing Survey
- …