178 research outputs found
FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
Transformer, as an alternative to CNN, has been proven effective in many
modalities (e.g., texts and images). For 3D point cloud transformers, existing
efforts focus primarily on pushing their accuracy to the state-of-the-art
level. However, their latency lags behind sparse convolution-based models (3x
slower), hindering their usage in resource-constrained, latency-sensitive
applications (such as autonomous driving). This inefficiency comes from point
clouds' sparse and irregular nature, whereas transformers are designed for
dense, regular workloads. This paper presents FlatFormer to close this latency
gap by trading spatial proximity for better computational regularity. We first
flatten the point cloud with window-based sorting and partition points into
groups of equal sizes rather than windows of equal shapes. This effectively
avoids expensive structuring and padding overheads. We then apply
self-attention within groups to extract local features, alternate sorting axis
to gather features from different directions, and shift windows to exchange
features across groups. FlatFormer delivers state-of-the-art accuracy on Waymo
Open Dataset with 4.6x speedup over (transformer-based) SST and 1.4x speedup
over (sparse convolutional) CenterPoint. This is the first point cloud
transformer that achieves real-time performance on edge GPUs and is faster than
sparse convolutional methods while achieving on-par or even superior accuracy
on large-scale benchmarks. Code to reproduce our results will be made publicly
available.Comment: The first two authors contributed equally to this wor
Indoor simultaneous localization and mapping based on fringe projection profilometry
Simultaneous Localization and Mapping (SLAM) plays an important role in
outdoor and indoor applications ranging from autonomous driving to indoor
robotics. Outdoor SLAM has been widely used with the assistance of LiDAR or
GPS. For indoor applications, the LiDAR technique does not satisfy the accuracy
requirement and the GPS signals will be lost. An accurate and efficient scene
sensing technique is required for indoor SLAM. As the most promising 3D sensing
technique, the opportunities for indoor SLAM with fringe projection
profilometry (FPP) systems are obvious, but methods to date have not fully
leveraged the accuracy and speed of sensing that such systems offer. In this
paper, we propose a novel FPP-based indoor SLAM method based on the coordinate
transformation relationship of FPP, where the 2D-to-3D descriptor-assisted is
used for mapping and localization. The correspondences generated by matching
descriptors are used for fast and accurate mapping, and the transform
estimation between the 2D and 3D descriptors is used to localize the sensor.
The provided experimental results demonstrate that the proposed indoor SLAM can
achieve the localization and mapping accuracy around one millimeter
A 5G DMRS-based Signal for Integrated Sensing and Communication System
Integrated sensing and communication (ISAC) is considered as the potential
key technology of the future mobile communication systems. The signal design is
fundamental for the ISAC system. The reference signals in mobile communication
systems have good detection performance, which is worth further research.
Existing studies applied the single reference signal to radar sensing. In this
paper, a multiple reference signals collaborative sensing scheme is designed.
Specifically, we jointly apply channel state information reference signal
(CSI-RS), positioning reference signal (PRS) and demodulation reference signal
(DMRS) in radar sensing, which improve the performance of radar sensing via
obtaining continuous time-frequency resource mapping. Cr\'amer-Rao lower bound
(CRLB) of the joint reference signal for distance and velocity estimation is
derived. The impacts of carrier frequency and subcarrier spacing on the
performance of distance and velocity estimation are revealed. The results of
simulation experiments show that compared with the single reference signal
sensing scheme, the multiple reference signals collaborative sensing scheme
effectively improves the sensing accuracy. Moreover, because of the
discontinuous OFDM symbols, the accuracy of velocity estimation could be
further improved via compressed sensing (CS). This paper has verified that
multiple reference signals, instead of single reference signal, have much more
superior performance on radar sensing, which is a practical and efficient
approach in designing ISAC signal
TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs
Sparse convolution plays a pivotal role in emerging workloads, including
point cloud processing in AR/VR, autonomous driving, and graph understanding in
recommendation systems. Since the computation pattern is sparse and irregular,
specialized high-performance kernels are required. Existing GPU libraries offer
two dataflow types for sparse convolution. The gather-GEMM-scatter dataflow is
easy to implement but not optimal in performance, while the dataflows with
overlapped computation and memory access (e.g.implicit GEMM) are highly
performant but have very high engineering costs. In this paper, we introduce
TorchSparse++, a new GPU library that achieves the best of both worlds. We
create a highly efficient Sparse Kernel Generator that generates performant
sparse convolution kernels at less than one-tenth of the engineering cost of
the current state-of-the-art system. On top of this, we design the Sparse
Autotuner, which extends the design space of existing sparse convolution
libraries and searches for the best dataflow configurations for training and
inference workloads. Consequently, TorchSparse++ achieves 2.9x, 3.3x, 2.2x and
1.7x measured end-to-end speedup on an NVIDIA A100 GPU over state-of-the-art
MinkowskiEngine, SpConv 1.2, TorchSparse and SpConv v2 in inference; and is
1.2-1.3x faster than SpConv v2 in mixed precision training across seven
representative autonomous driving benchmarks. It also seamlessly supports graph
convolutions, achieving 2.6-7.6x faster inference speed compared with
state-of-the-art graph deep learning libraries.Comment: MICRO 2023; Haotian Tang and Shang Yang contributed equally to this
projec
Hounsfield unit for assessing bone mineral density distribution within lumbar vertebrae and its clinical values
Study DesignRetrospective radiological analysis.ObjectiveThe aim of this study is to evaluate the distribution of bone mineral density (BMD) in lumbar vertebrae using the Hounsfield unit (HU) measurement method and investigate the clinical implications of HU values for assessing lumbar vertebrae BMD.MethodTwo hundred and ninety-six patients were retrospectively reviewed and divided into six groups according to age: Group 1(20–29 years old), Group 2 (30–39 years old), Group 3 (40–49 years old), Group 4 (50–59 years old), Group 5 (60–69 years old), Group 6 (70–79 years old). Six different locations from each vertebra of L1-L5 were selected as regions of interest: the anterior, middle and posterior parts of the upper and lower slices of the vertebrae. HU values were measured for the six regions of interest, followed by statistical analysis.ResultsThe HU values of vertebrae showed a decreasing trend from young patients to elderly patients in Group 1 to Group 5. There was no significant difference in HU values among different vertebrae in the same age group. In all age groups, the HU values of the anterior and posterior part of the vertebral body were significantly different from L1 to L3, with the anterior part of the vertebral body having lower HU values than the posterior part. The HU values of the anterior and posterior part of the vertebral body of L4 and L5 were statistically significant only in Group 5 and Group 6, and the HU values of the anterior part of the vertebral body were lower than those of the posterior part. The HU values of posterior part of L4 and L5 in Group6 were higher than those in Group5.ConclusionBone mineral density in the lumbar vertebrae is not uniformly distributed, potentially attributed to varying stress stimuli. The assessment of local HU values in the lumbar spine is of significant importance for surgical treatment
Defects in efferent duct multiciliogenesis underlie male infertility in GEMC1-, MCIDAS- or CCNO-deficient mice
GEMC1 and MCIDAS are geminin family proteins that transcriptionally activate E2F4/5-target genes during multiciliogenesis, including Foxj1 and Ccno. Male mice that lacked Gemc1, Mcidas or Ccno were found to be infertile, but the origin of this defect has remained unclear. Here, we show that all three genes are necessary for the generation of functional multiciliated cells in the efferent ducts that are required for spermatozoa to enter the epididymis. In mice that are mutant for Gemc1, Mcidas or Ccno, we observed a similar spectrum of phenotypes, including thinning of the seminiferous tubule epithelia, dilation of the rete testes, sperm agglutinations in the efferent ducts and lack of spermatozoa in the epididymis (azoospermia). These data suggest that defective efferent duct development is the dominant cause of male infertility in these mouse models, and this likely extends to individuals with the ciliopathy reduced generation of multiple motile cilia with mutations in MCIDAS and CCNO
ChemiQ: A Chemistry Simulator for Quantum Computer
Quantum computing, an innovative computing system carrying prominent
processing rate, is meant to be the solutions to problems in many fields. Among
these realms, the most intuitive application is to help chemical researchers
correctly de-scribe strong correlation and complex systems, which are the great
challenge in current chemistry simulation. In this paper, we will present a
standalone quantum simulation tool for chemistry, ChemiQ, which is designed to
assist people carry out chemical research or molecular calculation on real or
virtual quantum computers. Under the idea of modular programming in C++
language, the software is designed as a full-stack tool without third-party
physics or chemistry application packages. It provides services as follow:
visually construct molecular structure, quickly simulate ground-state energy,
scan molecular potential energy curve by distance or angle, study chemical
reaction, and return calculation results graphically after analysis.Comment: software,7 pages, 5 figure
- …