43,523 research outputs found
O<sub>2</sub>SAT: Object-Oriented-Segmentation-Guided Spatial-Attention Network for 3D Object Detection in Autonomous Vehicles
Autonomous vehicles (AVs) strive to adapt to the specific characteristics of sustainable urban environments. Accurate 3D object detection with LiDAR is paramount for autonomous driving. However, existing research predominantly relies on the 3D object-based assumption, which overlooks the complexity of real-world road environments. Consequently, current methods experience performance degradation when targeting only local features and overlooking the intersection of objects and road features, especially in uneven road conditions. This study proposes a 3D Object-Oriented-Segmentation Spatial-Attention (O2SAT) approach to distinguish object points from road points and enhance the keypoint feature learning by a channel-wise spatial attention mechanism. O2SAT consists of three modules: Object-Oriented Segmentation (OOS), Spatial-Attention Feature Reweighting (SFR), and Road-Aware 3D Detection Head (R3D). OOS distinguishes object and road points and performs object-aware downsampling to augment data by learning to identify the hidden connection between landscape and object; SFR performs weight augmentation to learn crucial neighboring relationships and dynamically adjust feature weights through spatial attention mechanisms, which enhances the long-range interactions and contextual feature discrimination for noise suppression, improving overall detection performance; and R3D utilizes refined object segmentation and optimized feature representations. Our system forecasts prediction confidence into existing point-backbones. Our method’s effectiveness and robustness across diverse datasets (KITTI) has been demonstrated through vast experiments. The proposed modules seamlessly integrate into existing point-based frameworks, following a plug-and-play approach
State of the art: iterative CT reconstruction techniques
Owing to recent advances in computing power, iterative reconstruction (IR) algorithms have become a clinically viable option in computed tomographic (CT) imaging. Substantial evidence is accumulating about the advantages of IR algorithms over established analytical methods, such as filtered back projection. IR improves image quality through cyclic image processing. Although all available solutions share the common mechanism of artifact reduction and/or potential for radiation dose savings, chiefly due to image noise suppression, the magnitude of these effects depends on the specific IR algorithm. In the first section of this contribution, the technical bases of IR are briefly reviewed and the currently available algorithms released by the major CT manufacturers are described. In the second part, the current status of their clinical implementation is surveyed. Regardless of the applied IR algorithm, the available evidence attests to the substantial potential of IR algorithms for overcoming traditional limitations in CT imaging
Learning Human Pose Estimation Features with Convolutional Networks
This paper introduces a new architecture for human pose estimation using a
multi- layer convolutional network architecture and a modified learning
technique that learns low-level features and higher-level weak spatial models.
Unconstrained human pose estimation is one of the hardest problems in computer
vision, and our new architecture and learning schema shows significant
improvement over the current state-of-the-art results. The main contribution of
this paper is showing, for the first time, that a specific variation of deep
learning is able to outperform all existing traditional architectures on this
task. The paper also discusses several lessons learned while researching
alternatives, most notably, that it is possible to learn strong low-level
feature detectors on features that might even just cover a few pixels in the
image. Higher-level spatial models improve somewhat the overall result, but to
a much lesser extent then expected. Many researchers previously argued that the
kinematic structure and top-down information is crucial for this domain, but
with our purely bottom up, and weak spatial model, we could improve other more
complicated architectures that currently produce the best results. This mirrors
what many other researchers, like those in the speech recognition, object
recognition, and other domains have experienced
Context-aware Human Motion Prediction
The problem of predicting human motion given a sequence of past observations
is at the core of many applications in robotics and computer vision. Current
state-of-the-art formulate this problem as a sequence-to-sequence task, in
which a historical of 3D skeletons feeds a Recurrent Neural Network (RNN) that
predicts future movements, typically in the order of 1 to 2 seconds. However,
one aspect that has been obviated so far, is the fact that human motion is
inherently driven by interactions with objects and/or other humans in the
environment. In this paper, we explore this scenario using a novel
context-aware motion prediction architecture. We use a semantic-graph model
where the nodes parameterize the human and objects in the scene and the edges
their mutual interactions. These interactions are iteratively learned through a
graph attention layer, fed with the past observations, which now include both
object and human body motions. Once this semantic graph is learned, we inject
it to a standard RNN to predict future movements of the human/s and object/s.
We consider two variants of our architecture, either freezing the contextual
interactions in the future of updating them. A thorough evaluation in the
"Whole-Body Human Motion Database" shows that in both cases, our context-aware
networks clearly outperform baselines in which the context information is not
considered.Comment: Accepted at CVPR2
- …