144 research outputs found
Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation
Accurately estimating the 3D pose of humans in video sequences requires both
accuracy and a well-structured architecture. With the success of transformers,
we introduce the Refined Temporal Pyramidal Compression-and-Amplification
(RTPCA) transformer. Exploiting the temporal dimension, RTPCA extends
intra-block temporal modeling via its Temporal Pyramidal
Compression-and-Amplification (TPCA) structure and refines inter-block feature
interaction with a Cross-Layer Refinement (XLR) module. In particular, TPCA
block exploits a temporal pyramid paradigm, reinforcing key and value
representation capabilities and seamlessly extracting spatial semantics from
motion sequences. We stitch these TPCA blocks with XLR that promotes rich
semantic representation through continuous interaction of queries, keys, and
values. This strategy embodies early-stage information with current flows,
addressing typical deficits in detail and stability seen in other
transformer-based methods. We demonstrate the effectiveness of RTPCA by
achieving state-of-the-art results on Human3.6M, HumanEva-I, and MPI-INF-3DHP
benchmarks with minimal computational overhead. The source code is available at
https://github.com/hbing-l/RTPCA.Comment: 11 pages, 5 figure
DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving
Real-time perception, or streaming perception, is a crucial aspect of
autonomous driving that has yet to be thoroughly explored in existing research.
To address this gap, we present DAMO-StreamNet, an optimized framework that
combines recent advances from the YOLO series with a comprehensive analysis of
spatial and temporal perception mechanisms, delivering a cutting-edge solution.
The key innovations of DAMO-StreamNet are: (1) A robust neck structure
incorporating deformable convolution, enhancing the receptive field and feature
alignment capabilities. (2) A dual-branch structure that integrates short-path
semantic features and long-path temporal features, improving motion state
prediction accuracy. (3) Logits-level distillation for efficient optimization,
aligning the logits of teacher and student networks in semantic space. (4) A
real-time forecasting mechanism that updates support frame features with the
current frame, ensuring seamless streaming perception during inference. Our
experiments demonstrate that DAMO-StreamNet surpasses existing state-of-the-art
methods, achieving 37.8% (normal size (600, 960)) and 43.3% (large size (1200,
1920)) sAP without using extra data. This work not only sets a new benchmark
for real-time perception but also provides valuable insights for future
research. Additionally, DAMO-StreamNet can be applied to various autonomous
systems, such as drones and robots, paving the way for real-time perception.
The code is available at https://github.com/zhiqic/DAMO-StreamNet
PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation
Existing 3D human pose estimators face challenges in adapting to new datasets
due to the lack of 2D-3D pose pairs in training sets. To overcome this issue,
we propose \textit{Multi-Hypothesis \textbf{P}ose \textbf{Syn}thesis
\textbf{D}omain \textbf{A}daptation} (\textbf{PoSynDA}) framework to bridge
this data disparity gap in target domain. Typically, PoSynDA uses a
diffusion-inspired structure to simulate 3D pose distribution in the target
domain. By incorporating a multi-hypothesis network, PoSynDA generates diverse
pose hypotheses and aligns them with the target domain. To do this, it first
utilizes target-specific source augmentation to obtain the target domain
distribution data from the source domain by decoupling the scale and position
parameters. The process is then further refined through the teacher-student
paradigm and low-rank adaptation. With extensive comparison of benchmarks such
as Human3.6M and MPI-INF-3DHP, PoSynDA demonstrates competitive performance,
even comparable to the target-trained MixSTE model\cite{zhang2022mixste}. This
work paves the way for the practical application of 3D human pose estimation in
unseen domains. The code is available at https://github.com/hbing-l/PoSynDA.Comment: Accepted to ACM Multimedia 2023; 10 pages, 4 figures, 8 tables; the
code is at https://github.com/hbing-l/PoSynD
KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration
In the realm of facial analysis, accurate landmark detection is crucial for
various applications, ranging from face recognition and expression analysis to
animation. Conventional heatmap or coordinate regression-based techniques,
however, often face challenges in terms of computational burden and
quantization errors. To address these issues, we present the KeyPoint
Positioning System (KeyPosS) - a groundbreaking facial landmark detection
framework that stands out from existing methods. The framework utilizes a fully
convolutional network to predict a distance map, which computes the distance
between a Point of Interest (POI) and multiple anchor points. These anchor
points are ingeniously harnessed to triangulate the POI's position through the
True-range Multilateration algorithm. Notably, the plug-and-play nature of
KeyPosS enables seamless integration into any decoding stage, ensuring a
versatile and adaptable solution. We conducted a thorough evaluation of
KeyPosS's performance by benchmarking it against state-of-the-art models on
four different datasets. The results show that KeyPosS substantially
outperforms leading methods in low-resolution settings while requiring a
minimal time overhead. The code is available at
https://github.com/zhiqic/KeyPosS.Comment: Accepted to ACM Multimedia 2023; 10 pages, 7 figures, 6 tables; the
code is at https://github.com/zhiqic/KeyPos
WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models
This paper introduces WordArt Designer, a user-driven framework for artistic
typography synthesis, relying on the Large Language Model (LLM). The system
incorporates four key modules: the LLM Engine, SemTypo, StyTypo, and TexTypo
modules. 1) The LLM Engine, empowered by the LLM (e.g., GPT-3.5), interprets
user inputs and generates actionable prompts for the other modules, thereby
transforming abstract concepts into tangible designs. 2) The SemTypo module
optimizes font designs using semantic concepts, striking a balance between
artistic transformation and readability. 3) Building on the semantic layout
provided by the SemTypo module, the StyTypo module creates smooth, refined
images. 4) The TexTypo module further enhances the design's aesthetics through
texture rendering, enabling the generation of inventive textured fonts.
Notably, WordArt Designer highlights the fusion of generative AI with artistic
typography. Experience its capabilities on ModelScope:
https://www.modelscope.cn/studios/WordArt/WordArt.Comment: Accepted by EMNLP 2023, 10 pages, 11 figures, 1 table, the system is
at https://www.modelscope.cn/studios/WordArt/WordAr
Comparative evaluation of the diagnosis, reporting and investigation of malaria cases in China, 2005-2014: transition from control to elimination for the national malaria programme
Background: The elimination of malaria requires high-quality surveillance data to enable rapid detection and response to individual cases. Evaluation of the performance of a national malaria surveillance system could identify shortcomings which, if addressed, will improve the surveillance program for malaria elimination.Methods: Case-level data for the period 2005–2014 were extracted from the China National Notifiable Infectious Disease Reporting Information System and Malaria Enhanced Surveillance Information System. The occurrence of cases, accuracy and timeliness of case diagnosis, reporting and investigation, were assessed and compared between the malaria control stage (2005–2010) and elimination stage (2011–2014) in mainland China.Results: A total of 210 730 malaria cases were reported in mainland China in 2005–2014. The average annual incidence declined dramatically from 2.5 per 100 000 people at the control stage to 0.2 per 100 000 at the elimination stage, but the proportion of migrant cases increased from 9.8 % to 41.0 %. Since the initiation of the National Malaria Elimination Programme in 2010, the overall proportion of cases diagnosed by laboratory testing consistently improved, with the highest of 99.0 % in 2014. However, this proportion was significantly lower in non-endemic provinces (79.0 %) than that in endemic provinces (91.4 %) during 2011–2014. The median interval from illness onset to diagnosis was 3 days at the elimination stage, with one day earlier than that at the control stage. Since 2011, more than 99 % cases were reported within 1 day after being diagnosed, while the proportion of cases that were reported within one day after diagnosis was lowest in Tibet (37.5 %). The predominant source of cases reporting shifted from town-level hospitals at the control stage (67.9 % cases) to city-level hospitals and public health institutes at the eliminate stage (69.4 % cases). The proportion of investigation within 3 days after case reporting has improved, from 74.6 % in 2010 to 98.5 % in 2014.Conclusions: The individual case-based malaria surveillance system in China operated well during the malaria elimination stage. This ensured that malaria cases could be diagnosed, reported and timely investigated at local level. However, domestic migrants and overseas populations, as well as cases in the historically malarial non-endemic areas and hard-to-reach area are new challenges in the surveillance for malaria elimination.<br/
Inactivated COVID-19 Vaccination did not affect In vitro fertilization (IVF) / Intra-Cytoplasmic Sperm Injection (ICSI) cycle outcomes
Background: The objective of this study is to evaluate the impact of COVID-19
inactivated vaccine administration on the outcomes of in vitro fertilization
(IVF) and intracytoplasmic sperm injection (ICSI) cycles in infertile couples
in China. Methods: We collected data from the CYART prospective cohort, which
included couples undergoing IVF treatment from January 2021 to September 2022
at Sichuan Jinxin Xinan Women & Children's Hospital. Based on whether they
received vaccination before ovarian stimulation, the couples were divided into
the vaccination group and the non-vaccination group. We compared the laboratory
parameters and pregnancy outcomes between the two groups. Findings: After
performing propensity score matching (PSM), the analysis demonstrated similar
clinical pregnancy rates, biochemical pregnancy and ongoing pregnancy rates
between vaccinated and unvaccinated women. No significant disparities were
found in terms of embryo development and laboratory parameters among the
groups. Moreover, male vaccination had no impact on patient performance or
pregnancy outcomes in assisted reproductive technology treatments.
Additionally, there were no significant differences observed in the effects of
vaccination on embryo development and pregnancy outcomes among couples
undergoing ART. Interpretation: The findings suggest that COVID-19 vaccination
did not have a significant effect on patients undergoing IVF/ICSI with fresh
embryo transfer. Therefore, it is recommended that couples should receive
COVID-19 vaccination as scheduled to help mitigate the COVID-19 pandemic.Comment: 26 pages, 4 figures and 5 table
Observation of Dirac hierarchy in three-dimensional acoustic topological insulators
Dirac cones (DCs) play a pivotal role in various unique phenomena ranging
from massless electrons in graphene to robust surface states in topological
insulators (TIs). Recent studies have theoretically revealed a full Dirac
hierarchy comprising an eightfold bulk DC, a fourfold surface DC, and a twofold
hinge DC, associated with a hierarchy of topological phases including
first-order to third-order three-dimensional (3D) topological insulators, using
the same 3D base lattice. Here, we report the first experimental observation of
the Dirac hierarchy in 3D acoustic TIs. Using acoustic measurements, we
unambiguously reveal that lifting of multifold DCs in each hierarchy can induce
two-dimensional (2D) topological surface states with a fourfold DC in a
first-order 3D TI, one-dimensional (1D) topological hinge states with a twofold
DC in a second-order 3D TI, and zero-dimensional (0D) topological corner states
in a third-order 3D TI. Our work not only expands the fundamental research
scope of Dirac physics, but also opens up a new route for multidimensional
robust wave manipulation
- …