178 research outputs found

    FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

    Full text link
    Transformer, as an alternative to CNN, has been proven effective in many modalities (e.g., texts and images). For 3D point cloud transformers, existing efforts focus primarily on pushing their accuracy to the state-of-the-art level. However, their latency lags behind sparse convolution-based models (3x slower), hindering their usage in resource-constrained, latency-sensitive applications (such as autonomous driving). This inefficiency comes from point clouds' sparse and irregular nature, whereas transformers are designed for dense, regular workloads. This paper presents FlatFormer to close this latency gap by trading spatial proximity for better computational regularity. We first flatten the point cloud with window-based sorting and partition points into groups of equal sizes rather than windows of equal shapes. This effectively avoids expensive structuring and padding overheads. We then apply self-attention within groups to extract local features, alternate sorting axis to gather features from different directions, and shift windows to exchange features across groups. FlatFormer delivers state-of-the-art accuracy on Waymo Open Dataset with 4.6x speedup over (transformer-based) SST and 1.4x speedup over (sparse convolutional) CenterPoint. This is the first point cloud transformer that achieves real-time performance on edge GPUs and is faster than sparse convolutional methods while achieving on-par or even superior accuracy on large-scale benchmarks. Code to reproduce our results will be made publicly available.Comment: The first two authors contributed equally to this wor

    Indoor simultaneous localization and mapping based on fringe projection profilometry

    Full text link
    Simultaneous Localization and Mapping (SLAM) plays an important role in outdoor and indoor applications ranging from autonomous driving to indoor robotics. Outdoor SLAM has been widely used with the assistance of LiDAR or GPS. For indoor applications, the LiDAR technique does not satisfy the accuracy requirement and the GPS signals will be lost. An accurate and efficient scene sensing technique is required for indoor SLAM. As the most promising 3D sensing technique, the opportunities for indoor SLAM with fringe projection profilometry (FPP) systems are obvious, but methods to date have not fully leveraged the accuracy and speed of sensing that such systems offer. In this paper, we propose a novel FPP-based indoor SLAM method based on the coordinate transformation relationship of FPP, where the 2D-to-3D descriptor-assisted is used for mapping and localization. The correspondences generated by matching descriptors are used for fast and accurate mapping, and the transform estimation between the 2D and 3D descriptors is used to localize the sensor. The provided experimental results demonstrate that the proposed indoor SLAM can achieve the localization and mapping accuracy around one millimeter

    A 5G DMRS-based Signal for Integrated Sensing and Communication System

    Full text link
    Integrated sensing and communication (ISAC) is considered as the potential key technology of the future mobile communication systems. The signal design is fundamental for the ISAC system. The reference signals in mobile communication systems have good detection performance, which is worth further research. Existing studies applied the single reference signal to radar sensing. In this paper, a multiple reference signals collaborative sensing scheme is designed. Specifically, we jointly apply channel state information reference signal (CSI-RS), positioning reference signal (PRS) and demodulation reference signal (DMRS) in radar sensing, which improve the performance of radar sensing via obtaining continuous time-frequency resource mapping. Cr\'amer-Rao lower bound (CRLB) of the joint reference signal for distance and velocity estimation is derived. The impacts of carrier frequency and subcarrier spacing on the performance of distance and velocity estimation are revealed. The results of simulation experiments show that compared with the single reference signal sensing scheme, the multiple reference signals collaborative sensing scheme effectively improves the sensing accuracy. Moreover, because of the discontinuous OFDM symbols, the accuracy of velocity estimation could be further improved via compressed sensing (CS). This paper has verified that multiple reference signals, instead of single reference signal, have much more superior performance on radar sensing, which is a practical and efficient approach in designing ISAC signal

    TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

    Full text link
    Sparse convolution plays a pivotal role in emerging workloads, including point cloud processing in AR/VR, autonomous driving, and graph understanding in recommendation systems. Since the computation pattern is sparse and irregular, specialized high-performance kernels are required. Existing GPU libraries offer two dataflow types for sparse convolution. The gather-GEMM-scatter dataflow is easy to implement but not optimal in performance, while the dataflows with overlapped computation and memory access (e.g.implicit GEMM) are highly performant but have very high engineering costs. In this paper, we introduce TorchSparse++, a new GPU library that achieves the best of both worlds. We create a highly efficient Sparse Kernel Generator that generates performant sparse convolution kernels at less than one-tenth of the engineering cost of the current state-of-the-art system. On top of this, we design the Sparse Autotuner, which extends the design space of existing sparse convolution libraries and searches for the best dataflow configurations for training and inference workloads. Consequently, TorchSparse++ achieves 2.9x, 3.3x, 2.2x and 1.7x measured end-to-end speedup on an NVIDIA A100 GPU over state-of-the-art MinkowskiEngine, SpConv 1.2, TorchSparse and SpConv v2 in inference; and is 1.2-1.3x faster than SpConv v2 in mixed precision training across seven representative autonomous driving benchmarks. It also seamlessly supports graph convolutions, achieving 2.6-7.6x faster inference speed compared with state-of-the-art graph deep learning libraries.Comment: MICRO 2023; Haotian Tang and Shang Yang contributed equally to this projec

    Hounsfield unit for assessing bone mineral density distribution within lumbar vertebrae and its clinical values

    Get PDF
    Study DesignRetrospective radiological analysis.ObjectiveThe aim of this study is to evaluate the distribution of bone mineral density (BMD) in lumbar vertebrae using the Hounsfield unit (HU) measurement method and investigate the clinical implications of HU values for assessing lumbar vertebrae BMD.MethodTwo hundred and ninety-six patients were retrospectively reviewed and divided into six groups according to age: Group 1(20–29 years old), Group 2 (30–39 years old), Group 3 (40–49 years old), Group 4 (50–59 years old), Group 5 (60–69 years old), Group 6 (70–79 years old). Six different locations from each vertebra of L1-L5 were selected as regions of interest: the anterior, middle and posterior parts of the upper and lower slices of the vertebrae. HU values were measured for the six regions of interest, followed by statistical analysis.ResultsThe HU values of vertebrae showed a decreasing trend from young patients to elderly patients in Group 1 to Group 5. There was no significant difference in HU values among different vertebrae in the same age group. In all age groups, the HU values of the anterior and posterior part of the vertebral body were significantly different from L1 to L3, with the anterior part of the vertebral body having lower HU values than the posterior part. The HU values of the anterior and posterior part of the vertebral body of L4 and L5 were statistically significant only in Group 5 and Group 6, and the HU values of the anterior part of the vertebral body were lower than those of the posterior part. The HU values of posterior part of L4 and L5 in Group6 were higher than those in Group5.ConclusionBone mineral density in the lumbar vertebrae is not uniformly distributed, potentially attributed to varying stress stimuli. The assessment of local HU values in the lumbar spine is of significant importance for surgical treatment

    Defects in efferent duct multiciliogenesis underlie male infertility in GEMC1-, MCIDAS- or CCNO-deficient mice

    Get PDF
    GEMC1 and MCIDAS are geminin family proteins that transcriptionally activate E2F4/5-target genes during multiciliogenesis, including Foxj1 and Ccno. Male mice that lacked Gemc1, Mcidas or Ccno were found to be infertile, but the origin of this defect has remained unclear. Here, we show that all three genes are necessary for the generation of functional multiciliated cells in the efferent ducts that are required for spermatozoa to enter the epididymis. In mice that are mutant for Gemc1, Mcidas or Ccno, we observed a similar spectrum of phenotypes, including thinning of the seminiferous tubule epithelia, dilation of the rete testes, sperm agglutinations in the efferent ducts and lack of spermatozoa in the epididymis (azoospermia). These data suggest that defective efferent duct development is the dominant cause of male infertility in these mouse models, and this likely extends to individuals with the ciliopathy reduced generation of multiple motile cilia with mutations in MCIDAS and CCNO

    ChemiQ: A Chemistry Simulator for Quantum Computer

    Full text link
    Quantum computing, an innovative computing system carrying prominent processing rate, is meant to be the solutions to problems in many fields. Among these realms, the most intuitive application is to help chemical researchers correctly de-scribe strong correlation and complex systems, which are the great challenge in current chemistry simulation. In this paper, we will present a standalone quantum simulation tool for chemistry, ChemiQ, which is designed to assist people carry out chemical research or molecular calculation on real or virtual quantum computers. Under the idea of modular programming in C++ language, the software is designed as a full-stack tool without third-party physics or chemistry application packages. It provides services as follow: visually construct molecular structure, quickly simulate ground-state energy, scan molecular potential energy curve by distance or angle, study chemical reaction, and return calculation results graphically after analysis.Comment: software,7 pages, 5 figure
    • …
    corecore