75 research outputs found
SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning Models
A common limitation of diagnostic tests for detecting social biases in NLP
models is that they may only detect stereotypic associations that are
pre-specified by the designer of the test. Since enumerating all possible
problematic associations is infeasible, it is likely these tests fail to detect
biases that are present in a model but not pre-specified by the designer. To
address this limitation, we propose SODAPOP (SOcial bias Discovery from Answers
about PeOPle) in social commonsense question-answering. Our pipeline generates
modified instances from the Social IQa dataset (Sap et al., 2019) by (1)
substituting names associated with different demographic groups, and (2)
generating many distractor answers from a masked language model. By using a
social commonsense model to score the generated distractors, we are able to
uncover the model's stereotypic associations between demographic groups and an
open set of words. We also test SODAPOP on debiased models and show the
limitations of multiple state-of-the-art debiasing algorithms.Comment: EACL 202
HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields
Human hands are highly articulated and versatile at handling objects. Jointly
estimating the 3D poses of a hand and the object it manipulates from a
monocular camera is challenging due to frequent occlusions. Thus, existing
methods often rely on intermediate 3D shape representations to increase
performance. These representations are typically explicit, such as 3D point
clouds or meshes, and thus provide information in the direct surroundings of
the intermediate hand pose estimate. To address this, we introduce HOISDF, a
Signed Distance Field (SDF) guided hand-object pose estimation network, which
jointly exploits hand and object SDFs to provide a global, implicit
representation over the complete reconstruction volume. Specifically, the role
of the SDFs is threefold: equip the visual encoder with implicit shape
information, help to encode hand-object interactions, and guide the hand and
object pose regression via SDF-based sampling and by augmenting the feature
representations. We show that HOISDF achieves state-of-the-art results on
hand-object pose estimation benchmarks (DexYCB and HO3Dv2). Code is available
at https://github.com/amathislab/HOISDFComment: Accepted at CVPR 2024. 9 figures, many table
Invariant-Feature Subspace Recovery: A New Class of Provable Domain Generalization Algorithms
Domain generalization asks for models trained over a set of training
environments to generalize well in unseen test environments. Recently, a series
of algorithms such as Invariant Risk Minimization (IRM) have been proposed for
domain generalization. However, Rosenfeld et al. (2021) shows that in a simple
linear data model, even if non-convexity issues are ignored, IRM and its
extensions cannot generalize to unseen environments with less than
training environments, where is the dimension of the spurious-feature
subspace. In this work, we propose Invariant-feature Subspace Recovery (ISR): a
new class of algorithms to achieve provable domain generalization across the
settings of classification and regression problems. First, in the binary
classification setup of Rosenfeld et al. (2021), we show that our first
algorithm, ISR-Mean, can identify the subspace spanned by invariant features
from the first-order moments of the class-conditional distributions, and
achieve provable domain generalization with training environments. Our
second algorithm, ISR-Cov, further reduces the required number of training
environments to using the information of second-order moments. Notably,
unlike IRM, our algorithms bypass non-convexity issues and enjoy global
convergence guarantees. Next, we extend ISR-Mean to the more general setting of
multi-class classification and propose ISR-Multiclass, which leverages class
information and provably recovers the invariant-feature subspace with training environments for -class classification. Finally, for
regression problems, we propose ISR-Regression that can identify the
invariant-feature subspace with training environments. Empirically, we
demonstrate the superior performance of our ISRs on synthetic benchmarks.
Further, ISR can be used as post-processing methods for feature extractors such
as neural nets.Comment: Submitted to JMLR. This journal version significantly extends our
ICML 2022 paper, arXiv:2201.1291
Distantly-Supervised Named Entity Recognition with Uncertainty-aware Teacher Learning and Student-student Collaborative Learning
Distantly-Supervised Named Entity Recognition (DS-NER) effectively alleviates
the burden of annotation, but meanwhile suffers from the label noise. Recent
works attempt to adopt the teacher-student framework to gradually refine the
training labels and improve the overall robustness. However, we argue that
these teacher-student methods achieve limited performance because poor network
calibration produces incorrectly pseudo-labeled samples, leading to error
propagation. Therefore, we attempt to mitigate this issue by proposing: (1)
Uncertainty-aware Teacher Learning that leverages the prediction uncertainty to
guide the selection of pseudo-labels, avoiding the number of incorrect
pseudo-labels in the self-training stage. (2) Student-student Collaborative
Learning that allows the transfer of reliable labels between two student
networks instead of completely relying on all pseudo-labels from its teacher.
Meanwhile, this approach allows a full exploration of mislabeled samples rather
than simply filtering unreliable pseudo-labeled samples. Extensive experimental
results on five DS-NER datasets demonstrate that our method is superior to
state-of-the-art teacher-student methods
Multi-view 3D Face Reconstruction Based on Flame
At present, face 3D reconstruction has broad application prospects in various
fields, but the research on it is still in the development stage. In this
paper, we hope to achieve better face 3D reconstruction quality by combining
multi-view training framework with face parametric model Flame, propose a
multi-view training and testing model MFNet (Multi-view Flame Network). We
build a self-supervised training framework and implement constraints such as
multi-view optical flow loss function and face landmark loss, and finally
obtain a complete MFNet. We propose innovative implementations of multi-view
optical flow loss and the covisible mask. We test our model on AFLW and
facescape datasets and also take pictures of our faces to reconstruct 3D faces
while simulating actual scenarios as much as possible, which achieves good
results. Our work mainly addresses the problem of combining parametric models
of faces with multi-view face 3D reconstruction and explores the implementation
of a Flame based multi-view training and testing framework for contributing to
the field of face 3D reconstruction
Biodynamic features Syuantszy Chzhuanti 720°.
Presents the internal parameters and image Syuantszy Chzhuanti 720 ° is shown that in the implementation of the element Syuantszy Chzhuanti 720 °, the center of gravity shifts to 2.94 pm, 1.71 m. and 1.22 m. on the X, Y and Z; rate varies according to X - with 4,22 m/s to 0, Y - to 2,42 m/s to 0, and Z - from 3.68 m/s to 3.86 m/s. Run-time item 1.4 seconds: the first turnover - 0.41 sec., The second turnover-0, 33 sec. At the end of the takeoff run strike force left and right foot of 1147.2 N and 1005 N. Pressing the second, third, fourth, fifth finger and part of the metatarsal of right foot maximum intensity of pressure - 146.1 N; when pressing the first finger and part of the metatarsal maximum intensity of pressure - 280.8 N. The dependence of convergence or remove body parts with a vertical axis of the torque to increase or decrease its speed
Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond
In this study, we explore the potential of Multimodal Large Language Models
(MLLMs) in improving embodied decision-making processes for agents. While Large
Language Models (LLMs) have been widely used due to their advanced reasoning
skills and vast world knowledge, MLLMs like GPT4-Vision offer enhanced visual
understanding and reasoning capabilities. We investigate whether
state-of-the-art MLLMs can handle embodied decision-making in an end-to-end
manner and whether collaborations between LLMs and MLLMs can enhance
decision-making. To address these questions, we introduce a new benchmark
called PCA-EVAL, which evaluates embodied decision-making from the perspectives
of Perception, Cognition, and Action. Additionally, we propose HOLMES, a
multi-agent cooperation framework that allows LLMs to leverage MLLMs and APIs
to gather multimodal information for informed decision-making. We compare
end-to-end embodied decision-making and HOLMES on our benchmark and find that
the GPT4-Vision model demonstrates strong end-to-end embodied decision-making
abilities, outperforming GPT4-HOLMES in terms of average decision accuracy
(+3%). However, this performance is exclusive to the latest GPT4-Vision model,
surpassing the open-source state-of-the-art MLLM by 26%. Our results indicate
that powerful MLLMs like GPT4-Vision hold promise for decision-making in
embodied agents, offering new avenues for MLLM research. Code and data are open
at https://github.com/pkunlp-icler/PCA-EVAL/.Comment: FMDM@NeurIPS2023, Code and data:
https://github.com/pkunlp-icler/PCA-EVAL
Constant real-space fractal dimensionality and structure evolution in Ti62Cu38 metallic glass under high pressure
The structure of binary Ti62Cu38 metallic glass is investigated under pressures up to 33.8 GPa using the pair distribution function analysis based on high-energy x-ray scattering and reverse Monte Carlo (RMC) simulations. At a global scale, its relative volume shows a continuously smooth curve as a function of pressure. The isothermal bulk modulus of Ti62Cu38 metallic glass is estimated as B0=132(3)GPa with B0′=5.8(0.4). At a local scale, the atomic packing structure under compression conditions, which is extracted from RMC simulations, shows that the topological short-range order is dominated by the deformed icosahedron polyhedra and basically maintains stable. From the relationship between the relative volume and changing ratio of the atomic separation distances, the real-space fractal dimensionality of this metallic glass is determined as about 2.5 for all of the first four peaks. This experimental result reveals the consistent nature of the fractal feature on the degree of self-similarity in this sample within the entire experimental pressure range
Why does Prediction Accuracy Decrease over Time? Uncertain Positive Learning for Cloud Failure Prediction
With the rapid growth of cloud computing, a variety of software services have
been deployed in the cloud. To ensure the reliability of cloud services, prior
studies focus on failure instance (disk, node, and switch, etc.) prediction.
Once the output of prediction is positive, mitigation actions are taken to
rapidly resolve the underlying failure. According to our real-world practice in
Microsoft Azure, we find that the prediction accuracy may decrease by about 9%
after retraining the models. Considering that the mitigation actions may result
in uncertain positive instances since they cannot be verified after mitigation,
which may introduce more noise while updating the prediction model. To the best
of our knowledge, we are the first to identify this Uncertain Positive Learning
(UPLearning) issue in the real-world cloud failure prediction scenario. To
tackle this problem, we design an Uncertain Positive Learning Risk Estimator
(Uptake) approach. Using two real-world datasets of disk failure prediction and
conducting node prediction experiments in Microsoft Azure, which is a top-tier
cloud provider that serves millions of users, we demonstrate Uptake can
significantly improve the failure prediction accuracy by 5% on average
- …