228 research outputs found
MotionBERT: A Unified Perspective on Learning Human Motion Representations
We present a unified perspective on tackling various human-centric video
tasks by learning human motion representations from large-scale and
heterogeneous data resources. Specifically, we propose a pretraining stage in
which a motion encoder is trained to recover the underlying 3D motion from
noisy partial 2D observations. The motion representations acquired in this way
incorporate geometric, kinematic, and physical knowledge about human motion,
which can be easily transferred to multiple downstream tasks. We implement the
motion encoder with a Dual-stream Spatio-temporal Transformer (DSTformer)
neural network. It could capture long-range spatio-temporal relationships among
the skeletal joints comprehensively and adaptively, exemplified by the lowest
3D pose estimation error so far when trained from scratch. Furthermore, our
proposed framework achieves state-of-the-art performance on all three
downstream tasks by simply finetuning the pretrained motion encoder with a
simple regression head (1-2 layers), which demonstrates the versatility of the
learned motion representations. Code and models are available at
https://motionbert.github.io/Comment: ICCV 2023 Camera Read
ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors
Understanding the behavior of non-human primates is crucial for improving
animal welfare, modeling social behavior, and gaining insights into
distinctively human and phylogenetically shared behaviors. However, the lack of
datasets on non-human primate behavior hinders in-depth exploration of primate
social interactions, posing challenges to research on our closest living
relatives. To address these limitations, we present ChimpACT, a comprehensive
dataset for quantifying the longitudinal behavior and social relations of
chimpanzees within a social group. Spanning from 2015 to 2018, ChimpACT
features videos of a group of over 20 chimpanzees residing at the Leipzig Zoo,
Germany, with a particular focus on documenting the developmental trajectory of
one young male, Azibo. ChimpACT is both comprehensive and challenging,
consisting of 163 videos with a cumulative 160,500 frames, each richly
annotated with detection, identification, pose estimation, and fine-grained
spatiotemporal behavior labels. We benchmark representative methods of three
tracks on ChimpACT: (i) tracking and identification, (ii) pose estimation, and
(iii) spatiotemporal action detection of the chimpanzees. Our experiments
reveal that ChimpACT offers ample opportunities for both devising new methods
and adapting existing ones to solve fundamental computer vision tasks applied
to chimpanzee groups, such as detection, pose estimation, and behavior
analysis, ultimately deepening our comprehension of communication and sociality
in non-human primates.Comment: NeurIPS 202
Recent Advances on Internet of Things
Meng, X.; Lloret, J.; Zhu, X.; Zhou, Z. (2014). Recent Advances on Internet of Things. Scientific World Journal. doi:10.1155/2014/709345
Effect of intraoperative use of low-dose dexmedetomidine on the prognosis of patients undergoing breast cancer surgery
Objective·To investigate the influence of intraoperative dexmedetomidine infusion on the recurrence-free survival (RFS) and overall survival (OS) rate of patients undergoing breast cancer surgery.Methods·A retrospective analysis was performed on patients who underwent breast cancer surgery at the Breast Disease Diagnosis and Treatment Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine from July 2013 to June 2014. Patients were divided into dexmedetomidine group and control group according to whether dexmedetomidine was injected intraoperatively at 0.7‒0.8 ìg/kg. After correcting the confounding factors between the two groups by using the propensity score matching method, the factors affecting the prognosis and survival of patients undertaking breast cancer surgery were investigated by univariate and multivariate analysis, and the Kaplan-Meier survival curve was further used to analyze the effect of intraoperative dexmedetomidine on RFS and OS after five years of breast cancer surgery.Results·There were significant differences in age, progesterone receptor (PR) positive ratio, proliferating cell nuclear antigen (Ki67) expression score, American Society of Anesthesiologists (ASA) grade, and anesthesia duration between the two groups before propensity score matching. After propensity score matching, a total of 239 pairs were successfully matched, and there was no significant difference in baseline data and perioperative data between the two groups (P>0.05). Univariate analysis showed that age (P=0.032), postoperative radiotherapy (P=0.041), Ki67 score (P=0.021), and tumor TMN stage (P=0.029) were significantly correlated with postoperative five-year OS. The results of multivariate analysis showed that postoperative radiotherapy, Ki67 score, and tumor TMN stage were significantly correlated with postoperative five-year OS (P0.05). Compared with the control group, the five-year postoperative RFS and OS in dexmedetomidine group did not decrease significantly (P>0.05).Conclusion·Intraoperative use of low dosage of dexmedetomidine (0.7‒0.8 μg/kg) has no significant effect on RFS and OS in patients undergoing breast cancer surgery at five years postoperatively, providing theoretical reference for the rational selection of anesthetics for tumor patients. The effect of higher dosage of dexmedetomidine needs to be further confirmed by prospective multicenter randomized controlled studies
Human Motion Generation: A Survey
Human motion generation aims to generate natural human pose sequences and
shows immense potential for real-world applications. Substantial progress has
been made recently in motion data collection technologies and generation
methods, laying the foundation for increasing interest in human motion
generation. Most research within this field focuses on generating human motions
based on conditional signals, such as text, audio, and scene contexts. While
significant advancements have been made in recent years, the task continues to
pose challenges due to the intricate nature of human motion and its implicit
relationship with conditional signals. In this survey, we present a
comprehensive literature review of human motion generation, which, to the best
of our knowledge, is the first of its kind in this field. We begin by
introducing the background of human motion and generative models, followed by
an examination of representative methods for three mainstream sub-tasks:
text-conditioned, audio-conditioned, and scene-conditioned human motion
generation. Additionally, we provide an overview of common datasets and
evaluation metrics. Lastly, we discuss open problems and outline potential
future research directions. We hope that this survey could provide the
community with a comprehensive glimpse of this rapidly evolving field and
inspire novel ideas that address the outstanding challenges.Comment: 20 pages, 5 figure
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Recent advances in large language models (LLMs) have demonstrated notable
progress on many mathematical benchmarks. However, most of these benchmarks
only feature problems grounded in junior and senior high school subjects,
contain only multiple-choice questions, and are confined to a limited scope of
elementary arithmetic operations. To address these issues, this paper
introduces an expansive benchmark suite SciBench that aims to systematically
examine the reasoning capabilities required for complex scientific problem
solving. SciBench contains two carefully curated datasets: an open set
featuring a range of collegiate-level scientific problems drawn from
mathematics, chemistry, and physics textbooks, and a closed set comprising
problems from undergraduate-level exams in computer science and mathematics.
Based on the two datasets, we conduct an in-depth benchmark study of two
representative LLMs with various prompting strategies. The results reveal that
current LLMs fall short of delivering satisfactory performance, with an overall
score of merely 35.80%. Furthermore, through a detailed user study, we
categorize the errors made by LLMs into ten problem-solving abilities. Our
analysis indicates that no single prompting strategy significantly outperforms
others and some strategies that demonstrate improvements in certain
problem-solving skills result in declines in other skills. We envision that
SciBench will catalyze further developments in the reasoning abilities of LLMs,
thereby ultimately contributing to scientific research and discovery.Comment: Work in progress, 18 page
Estuarine plastisphere as an overlooked source of N2O production
“Plastisphere”, microbial communities colonizing plastic debris, has sparked global concern for marine ecosystems. Microbiome inhabiting this novel human-made niche has been increasingly characterized; however, whether the plastisphere holds crucial roles in biogeochemical cycling remains largely unknown. Here we evaluate the potential of plastisphere in biotic and abiotic denitrification and nitrous oxide (N2O) production in estuaries. Biofilm formation provides anoxic conditions favoring denitrifiers. Comparing with surrounding bulk water, plastisphere exhibits a higher denitrifying activity and N2O production, suggesting an overlooked N2O source. Regardless of plastisphere and bulk water, bacterial and fungal denitrifications are the main regulators for N2O production instead of chemodenitrification. However, the contributions of bacteria and fungi in the plastisphere are different from those in bulk water, indicating a distinct N2O production pattern in the plastisphere. These findings pinpoint plastisphere as a N2O source, and provide insights into roles of the new biotope in biogeochemical cycling in the Anthropocene
PathAsst: Redefining Pathology through Generative Foundation AI Assistant for Pathology
As advances in large language models (LLMs) and multimodal techniques
continue to mature, the development of general-purpose multimodal large
language models (MLLMs) has surged, with significant applications in natural
image interpretation. However, the field of pathology has largely remained
untapped in this regard, despite the growing need for accurate, timely, and
personalized diagnostics. To bridge the gap in pathology MLLMs, we present the
PathAsst in this study, which is a generative foundation AI assistant to
revolutionize diagnostic and predictive analytics in pathology. To develop
PathAsst, we collect over 142K high-quality pathology image-text pairs from a
variety of reliable sources, including PubMed, comprehensive pathology
textbooks, reputable pathology websites, and private data annotated by
pathologists. Leveraging the advanced capabilities of ChatGPT/GPT-4, we
generate over 180K instruction-following samples. Furthermore, we devise
additional instruction-following data, specifically tailored for the invocation
of the pathology-specific models, allowing the PathAsst to effectively interact
with these models based on the input image and user intent, consequently
enhancing the model's diagnostic capabilities. Subsequently, our PathAsst is
trained based on Vicuna-13B language model in coordination with the CLIP vision
encoder. The results of PathAsst show the potential of harnessing the
AI-powered generative foundation model to improve pathology diagnosis and
treatment processes. We are committed to open-sourcing our meticulously curated
dataset, as well as a comprehensive toolkit designed to aid researchers in the
extensive collection and preprocessing of their own datasets. Resources can be
obtained at
https://github.com/superjamessyx/Generative-Foundation-AI-Assistant-for-Pathology.Comment: 13 pages, 5 figures, conferenc
- …