143 research outputs found
SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation
Estimating 3D orientation and translation of objects is essential for
infrastructure-less autonomous navigation and driving. In case of monocular
vision, successful methods have been mainly based on two ingredients: (i) a
network generating 2D region proposals, (ii) a R-CNN structure predicting 3D
object pose by utilizing the acquired regions of interest. We argue that the 2D
detection network is redundant and introduces non-negligible noise for 3D
detection. Hence, we propose a novel 3D object detection method, named SMOKE,
in this paper that predicts a 3D bounding box for each detected object by
combining a single keypoint estimate with regressed 3D variables. As a second
contribution, we propose a multi-step disentangling approach for constructing
the 3D bounding box, which significantly improves both training convergence and
detection accuracy. In contrast to previous 3D detection techniques, our method
does not require complicated pre/post-processing, extra data, and a refinement
stage. Despite of its structural simplicity, our proposed SMOKE network
outperforms all existing monocular 3D detection methods on the KITTI dataset,
giving the best state-of-the-art result on both 3D object detection and Bird's
eye view evaluation. The code will be made publicly available.Comment: 8 pages, 6 figure
Graph-Segmenter: Graph Transformer with Boundary-aware Attention for Semantic Segmentation
The transformer-based semantic segmentation approaches, which divide the
image into different regions by sliding windows and model the relation inside
each window, have achieved outstanding success. However, since the relation
modeling between windows was not the primary emphasis of previous work, it was
not fully utilized. To address this issue, we propose a Graph-Segmenter,
including a Graph Transformer and a Boundary-aware Attention module, which is
an effective network for simultaneously modeling the more profound relation
between windows in a global view and various pixels inside each window as a
local one, and for substantial low-cost boundary adjustment. Specifically, we
treat every window and pixel inside the window as nodes to construct graphs for
both views and devise the Graph Transformer. The introduced boundary-aware
attention module optimizes the edge information of the target objects by
modeling the relationship between the pixel on the object's edge. Extensive
experiments on three widely used semantic segmentation datasets (Cityscapes,
ADE-20k and PASCAL Context) demonstrate that our proposed network, a Graph
Transformer with Boundary-aware Attention, can achieve state-of-the-art
segmentation performance
Host Range, Biology, and Species Specificity of Seven-Segmented Influenza Viruses—A Comparative Review on Influenza C and D
Other than genome structure, influenza C (ICV), and D (IDV) viruses with seven-segmented genomes are biologically different from the eight-segmented influenza A (IAV), and B (IBV) viruses concerning the presence of hemagglutinin–esterase fusion protein, which combines the function of hemagglutinin and neuraminidase responsible for receptor-binding, fusion, and receptor-destroying enzymatic activities, respectively. Whereas ICV with humans as primary hosts emerged nearly 74 years ago, IDV, a distant relative of ICV, was isolated in 2011, with bovines as the primary host. Despite its initial emergence in swine, IDV has turned out to be a transboundary bovine pathogen and a broader host range, similar to influenza A viruses (IAV). The receptor specificities of ICV and IDV determine the host range and the species specificity. The recent findings of the presence of the IDV genome in the human respiratory sample, and high traffic human environments indicate its public health significance. Conversely, the presence of ICV in pigs and cattle also raises the possibility of gene segment interactions/virus reassortment between ICV and IDV where these viruses co-exist. This review is a holistic approach to discuss the ecology of seven-segmented influenza viruses by focusing on what is known so far on the host range, seroepidemiology, biology, receptor, phylodynamics, species specificity, and cross-species transmission of the ICV and IDV
ADD: An Automatic Desensitization Fisheye Dataset for Autonomous Driving
Autonomous driving systems require many images for analyzing the surrounding
environment. However, there is fewer data protection for private information
among these captured images, such as pedestrian faces or vehicle license
plates, which has become a significant issue. In this paper, in response to the
call for data security laws and regulations and based on the advantages of
large Field of View(FoV) of the fisheye camera, we build the first Autopilot
Desensitization Dataset, called ADD, and formulate the first
deep-learning-based image desensitization framework, to promote the study of
image desensitization in autonomous driving scenarios. The compiled dataset
consists of 650K images, including different face and vehicle license plate
information captured by the surround-view fisheye camera. It covers various
autonomous driving scenarios, including diverse facial characteristics and
license plate colors. Then, we propose an efficient multitask desensitization
network called DesCenterNet as a benchmark on the ADD dataset, which can
perform face and vehicle license plate detection and desensitization tasks.
Based on ADD, we further provide an evaluation criterion for desensitization
performance, and extensive comparison experiments have verified the
effectiveness and superiority of our method on image desensitization
Characterizing the Influence of Graph Elements
Influence function, a method from robust statistics, measures the changes of
model parameters or some functions about model parameters concerning the
removal or modification of training instances. It is an efficient and useful
post-hoc method for studying the interpretability of machine learning models
without the need for expensive model re-training. Recently, graph convolution
networks (GCNs), which operate on graph data, have attracted a great deal of
attention. However, there is no preceding research on the influence functions
of GCNs to shed light on the effects of removing training nodes/edges from an
input graph. Since the nodes/edges in a graph are interdependent in GCNs, it is
challenging to derive influence functions for GCNs. To fill this gap, we
started with the simple graph convolution (SGC) model that operates on an
attributed graph and formulated an influence function to approximate the
changes in model parameters when a node or an edge is removed from an
attributed graph. Moreover, we theoretically analyzed the error bound of the
estimated influence of removing an edge. We experimentally validated the
accuracy and effectiveness of our influence estimation function. In addition,
we showed that the influence function of an SGC model could be used to estimate
the impact of removing training nodes/edges on the test performance of the SGC
without re-training the model. Finally, we demonstrated how to use influence
functions to guide the adversarial attacks on GCNs effectively
LineMarkNet: Line Landmark Detection for Valet Parking
We aim for accurate and efficient line landmark detection for valet parking,
which is a long-standing yet unsolved problem in autonomous driving. To this
end, we present a deep line landmark detection system where we carefully design
the modules to be lightweight. Specifically, we first empirically design four
general line landmarks including three physical lines and one novel mental
line. The four line landmarks are effective for valet parking. We then develop
a deep network (LineMarkNet) to detect line landmarks from surround-view
cameras where we, via the pre-calibrated homography, fuse context from four
separate cameras into the unified bird-eye-view (BEV) space, specifically we
fuse the surroundview features and BEV features, then employ the multi-task
decoder to detect multiple line landmarks where we apply the center-based
strategy for object detection task, and design our graph transformer to enhance
the vision transformer with hierarchical level graph reasoning for semantic
segmentation task. At last, we further parameterize the detected line landmarks
(e.g., intercept-slope form) whereby a novel filtering backend incorporates
temporal and multi-view consistency to achieve smooth and stable detection.
Moreover, we annotate a large-scale dataset to validate our method.
Experimental results show that our framework achieves the enhanced performance
compared with several line detection methods and validate the multi-task
network's efficiency about the real-time line landmark detection on the
Qualcomm 820A platform while meantime keeps superior accuracy, with our deep
line landmark detection system.Comment: 29 pages, 12 figure
Complete Solution for Vehicle Re-ID in Surround-view Camera System
Vehicle re-identification (Re-ID) is a critical component of the autonomous
driving perception system, and research in this area has accelerated in recent
years. However, there is yet no perfect solution to the vehicle
re-identification issue associated with the car's surround-view camera system.
Our analysis identifies two significant issues in the aforementioned scenario:
i) It is difficult to identify the same vehicle in many picture frames due to
the unique construction of the fisheye camera. ii) The appearance of the same
vehicle when seen via the surround vision system's several cameras is rather
different. To overcome these issues, we suggest an integrative vehicle Re-ID
solution method. On the one hand, we provide a technique for determining the
consistency of the tracking box drift with respect to the target. On the other
hand, we combine a Re-ID network based on the attention mechanism with spatial
limitations to increase performance in situations involving multiple cameras.
Finally, our approach combines state-of-the-art accuracy with real-time
performance. We will soon make the source code and annotated fisheye dataset
available.Comment: 11 pages, 10 figures. arXiv admin note: substantial text overlap with
arXiv:2006.1650
GlycoNMR: Dataset and benchmarks for NMR chemical shift prediction of carbohydrates with graph neural networks
Molecular representation learning (MRL) is a powerful tool for bridging the
gap between machine learning and chemical sciences, as it converts molecules
into numerical representations while preserving their chemical features. These
encoded representations serve as a foundation for various downstream
biochemical studies, including property prediction and drug design. MRL has had
great success with proteins and general biomolecule datasets. Yet, in the
growing sub-field of glycoscience (the study of carbohydrates, where longer
carbohydrates are also called glycans), MRL methods have been barely explored.
This under-exploration can be primarily attributed to the limited availability
of comprehensive and well-curated carbohydrate-specific datasets and a lack of
Machine learning (ML) pipelines specifically tailored to meet the unique
problems presented by carbohydrate data. Since interpreting and annotating
carbohydrate-specific data is generally more complicated than protein data,
domain experts are usually required to get involved. The existing MRL methods,
predominately optimized for proteins and small biomolecules, also cannot be
directly used in carbohydrate applications without special modifications. To
address this challenge, accelerate progress in glycoscience, and enrich the
data resources of the MRL community, we introduce GlycoNMR. GlycoNMR contains
two laboriously curated datasets with 2,609 carbohydrate structures and 211,543
annotated nuclear magnetic resonance (NMR) chemical shifts for precise
atomic-level prediction. We tailored carbohydrate-specific features and adapted
existing MRL models to tackle this problem effectively. For illustration, we
benchmark four modified MRL models on our new datasets
- …