9 research outputs found
NeuS-PIR: Learning Relightable Neural Surface using Pre-Integrated Rendering
Recent advances in neural implicit fields enables rapidly reconstructing 3D
geometry from multi-view images. Beyond that, recovering physical properties
such as material and illumination is essential for enabling more applications.
This paper presents a new method that effectively learns relightable neural
surface using pre-intergrated rendering, which simultaneously learns geometry,
material and illumination within the neural implicit field. The key insight of
our work is that these properties are closely related to each other, and
optimizing them in a collaborative manner would lead to consistent
improvements. Specifically, we propose NeuS-PIR, a method that factorizes the
radiance field into a spatially varying material field and a differentiable
environment cubemap, and jointly learns it with geometry represented by neural
surface. Our experiments demonstrate that the proposed method outperforms the
state-of-the-art method in both synthetic and real datasets
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
Large language models have become a potential pathway toward achieving
artificial general intelligence. Recent works on multi-modal large language
models have demonstrated their effectiveness in handling visual modalities. In
this work, we extend the research of MLLMs to point clouds and present the
LAMM-Dataset and LAMM-Benchmark for 2D image and 3D point cloud understanding.
We also establish an extensible framework to facilitate the extension of MLLMs
to additional modalities. Our main contribution is three-fold: 1) We present
the LAMM-Dataset and LAMM-Benchmark, which cover almost all high-level vision
tasks for 2D and 3D vision. Extensive experiments validate the effectiveness of
our dataset and benchmark. 2) We demonstrate the detailed methods of
constructing instruction-tuning datasets and benchmarks for MLLMs, which will
enable future research on MLLMs to scale up and extend to other domains, tasks,
and modalities faster. 3) We provide a primary but potential MLLM training
framework optimized for modalities' extension. We also provide baseline models,
comprehensive experimental observations, and analysis to accelerate future
research. Codes and datasets are now available at
https://github.com/OpenLAMM/LAMM.Comment: 37 pages, 33 figures. Code available at
https://github.com/OpenLAMM/LAMM ; Project page: https://openlamm.github.io
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Multi-modal Large Language Models (MLLMs) have shown impressive abilities in
generating reasonable responses with respect to multi-modal contents. However,
there is still a wide gap between the performance of recent MLLM-based
applications and the expectation of the broad public, even though the most
powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper
strives to enhance understanding of the gap through the lens of a qualitative
study on the generalizability, trustworthiness, and causal reasoning
capabilities of recent proprietary and open-source MLLMs across four
modalities: ie, text, code, image, and video, ultimately aiming to improve the
transparency of MLLMs. We believe these properties are several representative
factors that define the reliability of MLLMs, in supporting various downstream
applications. To be specific, we evaluate the closed-source GPT-4 and Gemini
and 6 open-source LLMs and MLLMs. Overall we evaluate 230 manually designed
cases, where the qualitative results are then summarized into 12 scores (ie, 4
modalities times 3 properties). In total, we uncover 14 empirical findings that
are useful to understand the capabilities and limitations of both proprietary
and open-source MLLMs, towards more reliable downstream multi-modal
applications
DanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer
Generating 3D dances from music is an emerged research task that benefits a
lot of applications in vision and graphics. Previous works treat this task as
sequence generation, however, it is challenging to render a music-aligned
long-term sequence with high kinematic complexity and coherent movements. In
this paper, we reformulate it by a two-stage process, ie, a key pose generation
and then an in-between parametric motion curve prediction, where the key poses
are easier to be synchronized with the music beats and the parametric curves
can be efficiently regressed to render fluent rhythm-aligned movements. We
named the proposed method as DanceFormer, which includes two cascading
kinematics-enhanced transformer-guided networks (called DanTrans) that tackle
each stage, respectively. Furthermore, we propose a large-scale music
conditioned 3D dance dataset, called PhantomDance, that is accurately labeled
by experienced animators rather than reconstruction or motion capture. This
dataset also encodes dances as key poses and parametric motion curves apart
from pose sequences, thus benefiting the training of our DanceFormer. Extensive
experiments demonstrate that the proposed method, even trained by existing
datasets, can generate fluent, performative, and music-matched 3D dances that
surpass previous works quantitatively and qualitatively. Moreover, the proposed
DanceFormer, together with the PhantomDance dataset, are seamlessly compatible
with industrial animation software, thus facilitating the adaptation for
various downstream applications.Comment: This is the version accepted by AAAI-2
Research of the FLCÂ +Â PID switching control strategy based on real-time error for the pneumatic polishing force regulating system
This paper designs an active pneumatic polishing force control system and investigates its control strategy. A straightforward method based on the experiments is proposed for identifying and modeling the pneumatic system and a convenient and effective control strategy based on real-time error is proposed for pneumatic polishing force regulating. Corresponding function between the input duty cycles and the output force values is created based on the experiment data tested by the system. Then the pneumatic force regulating system is modeled by the corresponding function. Base on the results of simulations and experiments, a FLC (Fuzzy Logic Control) + PID (Proportion-Integration-Differentiation) switching control strategy is designed with a switching mechanism to choose different control algorithm based on real-time error which can take the advantages of both PID and FLC. The FLC + PID switching controller can eliminate the steady state errors occurring in FLC controller and achieve the same accuracy as the PID controller but with a faster response speed. The approach effectively stabilizes the polishing force, resulting in polishing force errors that are distributed within ± 1 N. The mean errors are less than 0.2 N and the absolute mean errors hover around 0.3 N. The pneumatic polishing force control system regulated by the FLC + PID switching control exhibits effectiveness, stability, high control precision and fast response speed, and satisfies the force control requirements of polishing process
Assessment of Multimodal Large Language Models in Alignment with Human Values
Large Language Models (LLMs) aim to serve as versatile assistants aligned
with human values, as defined by the principles of being helpful, honest, and
harmless (hhh). However, in terms of Multimodal Large Language Models (MLLMs),
despite their commendable performance in perception and reasoning tasks, their
alignment with human values remains largely unexplored, given the complexity of
defining hhh dimensions in the visual world and the difficulty in collecting
relevant data that accurately mirrors real-world situations. To address this
gap, we introduce Ch3Ef, a Compreh3ensive Evaluation dataset and strategy for
assessing alignment with human expectations. Ch3Ef dataset contains 1002
human-annotated data samples, covering 12 domains and 46 tasks based on the hhh
principle. We also present a unified evaluation strategy supporting assessment
across various scenarios and different perspectives. Based on the evaluation
results, we summarize over 10 key findings that deepen the understanding of
MLLM capabilities, limitations, and the dynamic relationships between
evaluation levels, guiding future advancements in the field.Comment: arXiv admin note: text overlap with arXiv:2311.0269
RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents
The ultimate goals of robotic learning is to acquire a comprehensive and
generalizable robotic system capable of performing both seen skills within the
training distribution and unseen skills in novel environments. Recent progress
in utilizing language models as high-level planners has demonstrated that the
complexity of tasks can be reduced through decomposing them into
primitive-level plans, making it possible to generalize on novel robotic tasks
in a composable manner. Despite the promising future, the community is not yet
adequately prepared for composable generalization agents, particularly due to
the lack of primitive-level real-world robotic datasets. In this paper, we
propose a primitive-level robotic dataset, namely RH20T-P, which contains about
33000 video clips covering 44 diverse and complicated robotic tasks. Each clip
is manually annotated according to a set of meticulously designed primitive
skills, facilitating the future development of composable generalization
agents. To validate the effectiveness of RH20T-P, we also construct a potential
and scalable agent based on RH20T-P, called RA-P. Equipped with two planners
specialized in task decomposition and motion planning, RA-P can adapt to novel
physical skills through composable generalization. Our website and videos can
be found at https://sites.google.com/view/rh20t-primitive/main. Dataset and
code will be made available soon.Comment: 24 pages, 12 figures, 6 table
Orthogeriatric co-managements lower early mortality in long-lived elderly hip fracture: a post-hoc analysis of a prospective study
Abstract Objective To evaluate the clinical effectiveness of orthogeriatric co-management care in long-lived elderly hip fracture patients (age ≥ 90). Methods Secondary analysis was conducted in long-lived hip fracture patients between 2018 to 2019 in 6 hospitals in Beijing, China. Patients were divided into the orthogeriatric co-management group (CM group) and traditional consultation mode group (TC group) depending on the management mode. With 30-day mortality as the primary outcome, multivariate regression analyses were performed after adjusting for potential covariates. 30-day mobility and quality of life were compared between groups. Results A total of 233 patients were included, 223 of whom completed follow-up (125 in CM group, 98 in TC group). The average age was 92.4 ± 2.5 years old (range 90–102). The 30-day mortality in CM group was significantly lower than that in TC group after adjustments for (2.4% vs. 10.2%; OR = 0.231; 95% CI 0.059 ~ 0.896; P = 0.034). The proportion of patients undergoing surgery and surgery performed within 48 h also favored the CM group (97.6% vs. 85.7%, P = 0.002; 74.4% vs. 24.5%, P  0.05). Conclusions For long-lived elderly hip fracture patients, orthogeriatric co-management care lowered early mortality, improved early mobility and compared with the traditional consultation mode