9 research outputs found

    NeuS-PIR: Learning Relightable Neural Surface using Pre-Integrated Rendering

    Full text link
    Recent advances in neural implicit fields enables rapidly reconstructing 3D geometry from multi-view images. Beyond that, recovering physical properties such as material and illumination is essential for enabling more applications. This paper presents a new method that effectively learns relightable neural surface using pre-intergrated rendering, which simultaneously learns geometry, material and illumination within the neural implicit field. The key insight of our work is that these properties are closely related to each other, and optimizing them in a collaborative manner would lead to consistent improvements. Specifically, we propose NeuS-PIR, a method that factorizes the radiance field into a spatially varying material field and a differentiable environment cubemap, and jointly learns it with geometry represented by neural surface. Our experiments demonstrate that the proposed method outperforms the state-of-the-art method in both synthetic and real datasets

    LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

    Full text link
    Large language models have become a potential pathway toward achieving artificial general intelligence. Recent works on multi-modal large language models have demonstrated their effectiveness in handling visual modalities. In this work, we extend the research of MLLMs to point clouds and present the LAMM-Dataset and LAMM-Benchmark for 2D image and 3D point cloud understanding. We also establish an extensible framework to facilitate the extension of MLLMs to additional modalities. Our main contribution is three-fold: 1) We present the LAMM-Dataset and LAMM-Benchmark, which cover almost all high-level vision tasks for 2D and 3D vision. Extensive experiments validate the effectiveness of our dataset and benchmark. 2) We demonstrate the detailed methods of constructing instruction-tuning datasets and benchmarks for MLLMs, which will enable future research on MLLMs to scale up and extend to other domains, tasks, and modalities faster. 3) We provide a primary but potential MLLM training framework optimized for modalities' extension. We also provide baseline models, comprehensive experimental observations, and analysis to accelerate future research. Codes and datasets are now available at https://github.com/OpenLAMM/LAMM.Comment: 37 pages, 33 figures. Code available at https://github.com/OpenLAMM/LAMM ; Project page: https://openlamm.github.io

    From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

    Full text link
    Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance understanding of the gap through the lens of a qualitative study on the generalizability, trustworthiness, and causal reasoning capabilities of recent proprietary and open-source MLLMs across four modalities: ie, text, code, image, and video, ultimately aiming to improve the transparency of MLLMs. We believe these properties are several representative factors that define the reliability of MLLMs, in supporting various downstream applications. To be specific, we evaluate the closed-source GPT-4 and Gemini and 6 open-source LLMs and MLLMs. Overall we evaluate 230 manually designed cases, where the qualitative results are then summarized into 12 scores (ie, 4 modalities times 3 properties). In total, we uncover 14 empirical findings that are useful to understand the capabilities and limitations of both proprietary and open-source MLLMs, towards more reliable downstream multi-modal applications

    DanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer

    Full text link
    Generating 3D dances from music is an emerged research task that benefits a lot of applications in vision and graphics. Previous works treat this task as sequence generation, however, it is challenging to render a music-aligned long-term sequence with high kinematic complexity and coherent movements. In this paper, we reformulate it by a two-stage process, ie, a key pose generation and then an in-between parametric motion curve prediction, where the key poses are easier to be synchronized with the music beats and the parametric curves can be efficiently regressed to render fluent rhythm-aligned movements. We named the proposed method as DanceFormer, which includes two cascading kinematics-enhanced transformer-guided networks (called DanTrans) that tackle each stage, respectively. Furthermore, we propose a large-scale music conditioned 3D dance dataset, called PhantomDance, that is accurately labeled by experienced animators rather than reconstruction or motion capture. This dataset also encodes dances as key poses and parametric motion curves apart from pose sequences, thus benefiting the training of our DanceFormer. Extensive experiments demonstrate that the proposed method, even trained by existing datasets, can generate fluent, performative, and music-matched 3D dances that surpass previous works quantitatively and qualitatively. Moreover, the proposed DanceFormer, together with the PhantomDance dataset, are seamlessly compatible with industrial animation software, thus facilitating the adaptation for various downstream applications.Comment: This is the version accepted by AAAI-2

    Research of the FLC + PID switching control strategy based on real-time error for the pneumatic polishing force regulating system

    No full text
    This paper designs an active pneumatic polishing force control system and investigates its control strategy. A straightforward method based on the experiments is proposed for identifying and modeling the pneumatic system and a convenient and effective control strategy based on real-time error is proposed for pneumatic polishing force regulating. Corresponding function between the input duty cycles and the output force values is created based on the experiment data tested by the system. Then the pneumatic force regulating system is modeled by the corresponding function. Base on the results of simulations and experiments, a FLC (Fuzzy Logic Control) + PID (Proportion-Integration-Differentiation) switching control strategy is designed with a switching mechanism to choose different control algorithm based on real-time error which can take the advantages of both PID and FLC. The FLC + PID switching controller can eliminate the steady state errors occurring in FLC controller and achieve the same accuracy as the PID controller but with a faster response speed. The approach effectively stabilizes the polishing force, resulting in polishing force errors that are distributed within ± 1 N. The mean errors are less than 0.2 N and the absolute mean errors hover around 0.3 N. The pneumatic polishing force control system regulated by the FLC + PID switching control exhibits effectiveness, stability, high control precision and fast response speed, and satisfies the force control requirements of polishing process

    Assessment of Multimodal Large Language Models in Alignment with Human Values

    Full text link
    Large Language Models (LLMs) aim to serve as versatile assistants aligned with human values, as defined by the principles of being helpful, honest, and harmless (hhh). However, in terms of Multimodal Large Language Models (MLLMs), despite their commendable performance in perception and reasoning tasks, their alignment with human values remains largely unexplored, given the complexity of defining hhh dimensions in the visual world and the difficulty in collecting relevant data that accurately mirrors real-world situations. To address this gap, we introduce Ch3Ef, a Compreh3ensive Evaluation dataset and strategy for assessing alignment with human expectations. Ch3Ef dataset contains 1002 human-annotated data samples, covering 12 domains and 46 tasks based on the hhh principle. We also present a unified evaluation strategy supporting assessment across various scenarios and different perspectives. Based on the evaluation results, we summarize over 10 key findings that deepen the understanding of MLLM capabilities, limitations, and the dynamic relationships between evaluation levels, guiding future advancements in the field.Comment: arXiv admin note: text overlap with arXiv:2311.0269

    RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

    Full text link
    The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments. Recent progress in utilizing language models as high-level planners has demonstrated that the complexity of tasks can be reduced through decomposing them into primitive-level plans, making it possible to generalize on novel robotic tasks in a composable manner. Despite the promising future, the community is not yet adequately prepared for composable generalization agents, particularly due to the lack of primitive-level real-world robotic datasets. In this paper, we propose a primitive-level robotic dataset, namely RH20T-P, which contains about 33000 video clips covering 44 diverse and complicated robotic tasks. Each clip is manually annotated according to a set of meticulously designed primitive skills, facilitating the future development of composable generalization agents. To validate the effectiveness of RH20T-P, we also construct a potential and scalable agent based on RH20T-P, called RA-P. Equipped with two planners specialized in task decomposition and motion planning, RA-P can adapt to novel physical skills through composable generalization. Our website and videos can be found at https://sites.google.com/view/rh20t-primitive/main. Dataset and code will be made available soon.Comment: 24 pages, 12 figures, 6 table

    Orthogeriatric co-managements lower early mortality in long-lived elderly hip fracture: a post-hoc analysis of a prospective study

    No full text
    Abstract Objective To evaluate the clinical effectiveness of orthogeriatric co-management care in long-lived elderly hip fracture patients (age ≥ 90). Methods Secondary analysis was conducted in long-lived hip fracture patients between 2018 to 2019 in 6 hospitals in Beijing, China. Patients were divided into the orthogeriatric co-management group (CM group) and traditional consultation mode group (TC group) depending on the management mode. With 30-day mortality as the primary outcome, multivariate regression analyses were performed after adjusting for potential covariates. 30-day mobility and quality of life were compared between groups. Results A total of 233 patients were included, 223 of whom completed follow-up (125 in CM group, 98 in TC group). The average age was 92.4 ± 2.5 years old (range 90–102). The 30-day mortality in CM group was significantly lower than that in TC group after adjustments for (2.4% vs. 10.2%; OR = 0.231; 95% CI 0.059 ~ 0.896; P = 0.034). The proportion of patients undergoing surgery and surgery performed within 48 h also favored the CM group (97.6% vs. 85.7%, P = 0.002; 74.4% vs. 24.5%, P  0.05). Conclusions For long-lived elderly hip fracture patients, orthogeriatric co-management care lowered early mortality, improved early mobility and compared with the traditional consultation mode
    corecore