144 research outputs found

    Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation

    Full text link
    Accurately estimating the 3D pose of humans in video sequences requires both accuracy and a well-structured architecture. With the success of transformers, we introduce the Refined Temporal Pyramidal Compression-and-Amplification (RTPCA) transformer. Exploiting the temporal dimension, RTPCA extends intra-block temporal modeling via its Temporal Pyramidal Compression-and-Amplification (TPCA) structure and refines inter-block feature interaction with a Cross-Layer Refinement (XLR) module. In particular, TPCA block exploits a temporal pyramid paradigm, reinforcing key and value representation capabilities and seamlessly extracting spatial semantics from motion sequences. We stitch these TPCA blocks with XLR that promotes rich semantic representation through continuous interaction of queries, keys, and values. This strategy embodies early-stage information with current flows, addressing typical deficits in detail and stability seen in other transformer-based methods. We demonstrate the effectiveness of RTPCA by achieving state-of-the-art results on Human3.6M, HumanEva-I, and MPI-INF-3DHP benchmarks with minimal computational overhead. The source code is available at https://github.com/hbing-l/RTPCA.Comment: 11 pages, 5 figure

    DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving

    Full text link
    Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research. To address this gap, we present DAMO-StreamNet, an optimized framework that combines recent advances from the YOLO series with a comprehensive analysis of spatial and temporal perception mechanisms, delivering a cutting-edge solution. The key innovations of DAMO-StreamNet are: (1) A robust neck structure incorporating deformable convolution, enhancing the receptive field and feature alignment capabilities. (2) A dual-branch structure that integrates short-path semantic features and long-path temporal features, improving motion state prediction accuracy. (3) Logits-level distillation for efficient optimization, aligning the logits of teacher and student networks in semantic space. (4) A real-time forecasting mechanism that updates support frame features with the current frame, ensuring seamless streaming perception during inference. Our experiments demonstrate that DAMO-StreamNet surpasses existing state-of-the-art methods, achieving 37.8% (normal size (600, 960)) and 43.3% (large size (1200, 1920)) sAP without using extra data. This work not only sets a new benchmark for real-time perception but also provides valuable insights for future research. Additionally, DAMO-StreamNet can be applied to various autonomous systems, such as drones and robots, paving the way for real-time perception. The code is available at https://github.com/zhiqic/DAMO-StreamNet

    PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation

    Full text link
    Existing 3D human pose estimators face challenges in adapting to new datasets due to the lack of 2D-3D pose pairs in training sets. To overcome this issue, we propose \textit{Multi-Hypothesis \textbf{P}ose \textbf{Syn}thesis \textbf{D}omain \textbf{A}daptation} (\textbf{PoSynDA}) framework to bridge this data disparity gap in target domain. Typically, PoSynDA uses a diffusion-inspired structure to simulate 3D pose distribution in the target domain. By incorporating a multi-hypothesis network, PoSynDA generates diverse pose hypotheses and aligns them with the target domain. To do this, it first utilizes target-specific source augmentation to obtain the target domain distribution data from the source domain by decoupling the scale and position parameters. The process is then further refined through the teacher-student paradigm and low-rank adaptation. With extensive comparison of benchmarks such as Human3.6M and MPI-INF-3DHP, PoSynDA demonstrates competitive performance, even comparable to the target-trained MixSTE model\cite{zhang2022mixste}. This work paves the way for the practical application of 3D human pose estimation in unseen domains. The code is available at https://github.com/hbing-l/PoSynDA.Comment: Accepted to ACM Multimedia 2023; 10 pages, 4 figures, 8 tables; the code is at https://github.com/hbing-l/PoSynD

    KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration

    Full text link
    In the realm of facial analysis, accurate landmark detection is crucial for various applications, ranging from face recognition and expression analysis to animation. Conventional heatmap or coordinate regression-based techniques, however, often face challenges in terms of computational burden and quantization errors. To address these issues, we present the KeyPoint Positioning System (KeyPosS) - a groundbreaking facial landmark detection framework that stands out from existing methods. The framework utilizes a fully convolutional network to predict a distance map, which computes the distance between a Point of Interest (POI) and multiple anchor points. These anchor points are ingeniously harnessed to triangulate the POI's position through the True-range Multilateration algorithm. Notably, the plug-and-play nature of KeyPosS enables seamless integration into any decoding stage, ensuring a versatile and adaptable solution. We conducted a thorough evaluation of KeyPosS's performance by benchmarking it against state-of-the-art models on four different datasets. The results show that KeyPosS substantially outperforms leading methods in low-resolution settings while requiring a minimal time overhead. The code is available at https://github.com/zhiqic/KeyPosS.Comment: Accepted to ACM Multimedia 2023; 10 pages, 7 figures, 6 tables; the code is at https://github.com/zhiqic/KeyPos

    WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models

    Full text link
    This paper introduces WordArt Designer, a user-driven framework for artistic typography synthesis, relying on the Large Language Model (LLM). The system incorporates four key modules: the LLM Engine, SemTypo, StyTypo, and TexTypo modules. 1) The LLM Engine, empowered by the LLM (e.g., GPT-3.5), interprets user inputs and generates actionable prompts for the other modules, thereby transforming abstract concepts into tangible designs. 2) The SemTypo module optimizes font designs using semantic concepts, striking a balance between artistic transformation and readability. 3) Building on the semantic layout provided by the SemTypo module, the StyTypo module creates smooth, refined images. 4) The TexTypo module further enhances the design's aesthetics through texture rendering, enabling the generation of inventive textured fonts. Notably, WordArt Designer highlights the fusion of generative AI with artistic typography. Experience its capabilities on ModelScope: https://www.modelscope.cn/studios/WordArt/WordArt.Comment: Accepted by EMNLP 2023, 10 pages, 11 figures, 1 table, the system is at https://www.modelscope.cn/studios/WordArt/WordAr

    Comparative evaluation of the diagnosis, reporting and investigation of malaria cases in China, 2005-2014: transition from control to elimination for the national malaria programme

    No full text
    Background: The elimination of malaria requires high-quality surveillance data to enable rapid detection and response to individual cases. Evaluation of the performance of a national malaria surveillance system could identify shortcomings which, if addressed, will improve the surveillance program for malaria elimination.Methods: Case-level data for the period 2005–2014 were extracted from the China National Notifiable Infectious Disease Reporting Information System and Malaria Enhanced Surveillance Information System. The occurrence of cases, accuracy and timeliness of case diagnosis, reporting and investigation, were assessed and compared between the malaria control stage (2005–2010) and elimination stage (2011–2014) in mainland China.Results: A total of 210 730 malaria cases were reported in mainland China in 2005–2014. The average annual incidence declined dramatically from 2.5 per 100 000 people at the control stage to 0.2 per 100 000 at the elimination stage, but the proportion of migrant cases increased from 9.8 % to 41.0 %. Since the initiation of the National Malaria Elimination Programme in 2010, the overall proportion of cases diagnosed by laboratory testing consistently improved, with the highest of 99.0 % in 2014. However, this proportion was significantly lower in non-endemic provinces (79.0 %) than that in endemic provinces (91.4 %) during 2011–2014. The median interval from illness onset to diagnosis was 3 days at the elimination stage, with one day earlier than that at the control stage. Since 2011, more than 99 % cases were reported within 1 day after being diagnosed, while the proportion of cases that were reported within one day after diagnosis was lowest in Tibet (37.5 %). The predominant source of cases reporting shifted from town-level hospitals at the control stage (67.9 % cases) to city-level hospitals and public health institutes at the eliminate stage (69.4 % cases). The proportion of investigation within 3 days after case reporting has improved, from 74.6 % in 2010 to 98.5 % in 2014.Conclusions: The individual case-based malaria surveillance system in China operated well during the malaria elimination stage. This ensured that malaria cases could be diagnosed, reported and timely investigated at local level. However, domestic migrants and overseas populations, as well as cases in the historically malarial non-endemic areas and hard-to-reach area are new challenges in the surveillance for malaria elimination.<br/

    Inactivated COVID-19 Vaccination did not affect In vitro fertilization (IVF) / Intra-Cytoplasmic Sperm Injection (ICSI) cycle outcomes

    Full text link
    Background: The objective of this study is to evaluate the impact of COVID-19 inactivated vaccine administration on the outcomes of in vitro fertilization (IVF) and intracytoplasmic sperm injection (ICSI) cycles in infertile couples in China. Methods: We collected data from the CYART prospective cohort, which included couples undergoing IVF treatment from January 2021 to September 2022 at Sichuan Jinxin Xinan Women & Children's Hospital. Based on whether they received vaccination before ovarian stimulation, the couples were divided into the vaccination group and the non-vaccination group. We compared the laboratory parameters and pregnancy outcomes between the two groups. Findings: After performing propensity score matching (PSM), the analysis demonstrated similar clinical pregnancy rates, biochemical pregnancy and ongoing pregnancy rates between vaccinated and unvaccinated women. No significant disparities were found in terms of embryo development and laboratory parameters among the groups. Moreover, male vaccination had no impact on patient performance or pregnancy outcomes in assisted reproductive technology treatments. Additionally, there were no significant differences observed in the effects of vaccination on embryo development and pregnancy outcomes among couples undergoing ART. Interpretation: The findings suggest that COVID-19 vaccination did not have a significant effect on patients undergoing IVF/ICSI with fresh embryo transfer. Therefore, it is recommended that couples should receive COVID-19 vaccination as scheduled to help mitigate the COVID-19 pandemic.Comment: 26 pages, 4 figures and 5 table

    Observation of Dirac hierarchy in three-dimensional acoustic topological insulators

    Full text link
    Dirac cones (DCs) play a pivotal role in various unique phenomena ranging from massless electrons in graphene to robust surface states in topological insulators (TIs). Recent studies have theoretically revealed a full Dirac hierarchy comprising an eightfold bulk DC, a fourfold surface DC, and a twofold hinge DC, associated with a hierarchy of topological phases including first-order to third-order three-dimensional (3D) topological insulators, using the same 3D base lattice. Here, we report the first experimental observation of the Dirac hierarchy in 3D acoustic TIs. Using acoustic measurements, we unambiguously reveal that lifting of multifold DCs in each hierarchy can induce two-dimensional (2D) topological surface states with a fourfold DC in a first-order 3D TI, one-dimensional (1D) topological hinge states with a twofold DC in a second-order 3D TI, and zero-dimensional (0D) topological corner states in a third-order 3D TI. Our work not only expands the fundamental research scope of Dirac physics, but also opens up a new route for multidimensional robust wave manipulation
    • …
    corecore