204 research outputs found
IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization
Fine-tuning pre-trained language models (PTLMs), such as BERT and its better
variant RoBERTa, has been a common practice for advancing performance in
natural language understanding (NLU) tasks. Recent advance in representation
learning shows that isotropic (i.e., unit-variance and uncorrelated) embeddings
can significantly improve performance on downstream tasks with faster
convergence and better generalization. The isotropy of the pre-trained
embeddings in PTLMs, however, is relatively under-explored. In this paper, we
analyze the isotropy of the pre-trained [CLS] embeddings of PTLMs with
straightforward visualization, and point out two major issues: high variance in
their standard deviation, and high correlation between different dimensions. We
also propose a new network regularization method, isotropic batch normalization
(IsoBN) to address the issues, towards learning more isotropic representations
in fine-tuning by dynamically penalizing dominating principal components. This
simple yet effective fine-tuning method yields about 1.0 absolute increment on
the average of seven NLU tasks.Comment: AAAI 202
An Improved Baseline for Sentence-level Relation Extraction
Sentence-level relation extraction (RE) aims at identifying the relationship
between two entities in a sentence. Many efforts have been devoted to this
problem, while the best performing methods are still far from perfect. In this
paper, we revisit two problems that affect the performance of existing RE
models, namely entity representation and noisy or ill-defined labels. Our
improved baseline model, incorporated with entity representations with typed
markers, achieves an F1 of 74.6% on TACRED, significantly outperforms previous
SOTA methods. Furthermore, the presented new baseline achieves an F1 of 91.1%
on the refined Re-TACRED dataset, demonstrating that the pre-trained language
models achieve unexpectedly high performance on this task. We release our code
to the community for future research.Comment: Code available at https://github.com/wzhouad/RE_improved_baselin
Learning to Grasp the Ungraspable with Emergent Extrinsic Dexterity
A simple gripper can solve more complex manipulation tasks if it can utilize
the external environment such as pushing the object against the table or a
vertical wall, known as "Extrinsic Dexterity." Previous work in extrinsic
dexterity usually has careful assumptions about contacts which impose
restrictions on robot design, robot motions, and the variations of the physical
parameters. In this work, we develop a system based on reinforcement learning
(RL) to address these limitations. We study the task of "Occluded Grasping"
which aims to grasp the object in configurations that are initially occluded;
the robot needs to move the object into a configuration from which these grasps
can be achieved. We present a system with model-free RL that successfully
achieves this task using a simple gripper with extrinsic dexterity. The policy
learns emergent behaviors of pushing the object against the wall to rotate and
then grasp it without additional reward terms on extrinsic dexterity. We
discuss important components of the system including the design of the RL
problem, multi-grasp training and selection, and policy generalization with
automatic curriculum. Most importantly, the policy trained in simulation is
zero-shot transferred to a physical robot. It demonstrates dynamic and
contact-rich motions with a simple gripper that generalizes across objects with
various size, density, surface friction, and shape with a 78% success rate.
Videos can be found at https://sites.google.com/view/grasp-ungraspable/
Automated Surface Grinder "Slicer"
ME450 Capstone Design and Manufacturing Experience: Fall 2015Mr. Bernn Hitch, President of Island Ceramic Grinding, tasked team 16 to automate a manual Chevalier FSG-618M Surface Grinder so that it can run a simple, repetitive program for slicing ceramic pieces with minimal operator interaction besides the initial setup of the program. To accomplish this goal, we were given requirements by our sponsor that include automation of the three axes of the surface grinder through one cohesive interface that allows for editing of the program while it is in use and no measurable increase in tolerance of the parts being manufactured for under $5000. Alumina is to be used for all testing, as it will ensure any product that Island Ceramic Grinding currently uses on their grinders will not present any problems for the automation. Engineering specifications were generated according to the requirements such as the tolerance of motion accuracy and precision. After performing a functional decomposition and brainstorming, several of concepts were generated to meet the requirement of our sponsor and the corresponding engineering specifications. Five Pugh Charts were created for different functionalities that need to be realized, with weighting between one and five for each criterion. The five Pugh Charts decided that the transmission would be timing belts, stepper motors would be used for in-out direction and up-down direction, a DC motor would be used for left-right direction, a sealed keypad would be used for interface; the microcontroller would be a PLC, a Hall Effect sensor would be used to control the motion of the grinder. Engineering analysis was implemented to determine the required specifications for motors and transmissions. Theoretical modeling and empirical testing were implemented to find out the maximum torque when turning the hand wheels during normal operation. The component selection was narrowed down using the specifications from this analysis, and CAD models of mountings for each axis were generated. A detailed plan for the control system is also described. FMEA and risk analyses were done to discover, evaluate, and minimize potential problems. Design verification testing was then performed to verify that the individual components performed as expected and would allow the machine to function as intended. Precision of the Y and Z-axes were confirmed to outperform expectations, and the speed of the X-axis was also deemed acceptable. Verification was unable to be performed on the system as a whole due to additional components that were needed and delays in assembling the electrical system.http://deepblue.lib.umich.edu/bitstream/2027.42/117355/1/ME450-F15-Project16-FinalReport.pd
Sharpness-Aware Minimization with Dynamic Reweighting
Deep neural networks are often overparameterized and may not easily achieve
model generalization. Adversarial training has shown effectiveness in improving
generalization by regularizing the change of loss on top of adversarially
chosen perturbations. The recently proposed sharpness-aware minimization (SAM)
algorithm conducts adversarial weight perturbation, encouraging the model to
converge to a flat minima. SAM finds a common adversarial weight perturbation
per-batch. Although per-instance adversarial weight perturbations are stronger
adversaries and they can potentially lead to better generalization performance,
their computational cost is very high and thus it is impossible to use
per-instance perturbations efficiently in SAM. In this paper, we tackle this
efficiency bottleneck and propose sharpness-aware minimization with dynamic
reweighting ({\delta}-SAM). Our theoretical analysis motivates that it is
possible to approach the stronger, per-instance adversarial weight
perturbations using reweighted per-batch weight perturbations. {\delta}-SAM
dynamically reweights perturbation within each batch according to the
theoretically principled weighting factors, serving as a good approximation to
per-instance perturbation. Experiments on various natural language
understanding tasks demonstrate the effectiveness of {\delta}-SAM
Context-faithful Prompting for Large Language Models
Large language models (LLMs) encode parametric knowledge about world facts
and have shown remarkable performance in knowledge-driven NLP tasks. However,
their reliance on parametric knowledge may cause them to overlook contextual
cues, leading to incorrect predictions in context-sensitive NLP tasks (e.g.,
knowledge acquisition tasks). In this paper, we seek to assess and enhance
LLMs' contextual faithfulness in two aspects: knowledge conflict and prediction
with abstention. We demonstrate that LLMs' faithfulness can be significantly
improved using carefully designed prompting strategies. In particular, we
identify opinion-based prompts and counterfactual demonstrations as the most
effective methods. Opinion-based prompts reframe the context as a narrator's
statement and inquire about the narrator's opinions, while counterfactual
demonstrations use instances containing false facts to improve faithfulness in
knowledge conflict situations. Neither technique requires additional training.
We conduct experiments on three datasets of two standard NLP tasks, machine
reading comprehension and relation extraction, and the results demonstrate
significant improvement in faithfulness to contexts. Code and data are released
at https://github.com/wzhouad/context-faithful-llm.Comment: Accepted at EMNLP 2023 Findings. Code and data are released at
https://github.com/wzhouad/context-faithful-ll
Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling
Document-level relation extraction (RE) poses new challenges compared to its
sentence-level counterpart. One document commonly contains multiple entity
pairs, and one entity pair occurs multiple times in the document associated
with multiple possible relations. In this paper, we propose two novel
techniques, adaptive thresholding and localized context pooling, to solve the
multi-label and multi-entity problems. The adaptive thresholding replaces the
global threshold for multi-label classification in the prior work with a
learnable entities-dependent threshold. The localized context pooling directly
transfers attention from pre-trained language models to locate relevant context
that is useful to decide the relation. We experiment on three document-level RE
benchmark datasets: DocRED, a recently released large-scale RE dataset, and two
datasets CDRand GDA in the biomedical domain. Our ATLOP (Adaptive Thresholding
and Localized cOntext Pooling) model achieves an F1 score of 63.4, and also
significantly outperforms existing models on both CDR and GDA.Comment: Accepted by AAAI 2021. Code available at
https://github.com/wzhouad/ATLO
Parameter-Efficient Tuning with Special Token Adaptation
Parameter-efficient tuning aims at updating only a small subset of parameters
when adapting a pretrained model to downstream tasks. In this work, we
introduce PASTA, in which we only modify the special token representations
(e.g., [SEP] and [CLS] in BERT) before the self-attention module at each layer
in Transformer-based models. PASTA achieves comparable performance to
fine-tuning in natural language understanding tasks including text
classification and NER with up to only 0.029% of total parameters trained. Our
work not only provides a simple yet effective way of parameter-efficient
tuning, which has a wide range of practical applications when deploying
finetuned models for multiple tasks, but also demonstrates the pivotal role of
special tokens in pretrained language models
GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding
Humans subconsciously engage in geospatial reasoning when reading articles.
We recognize place names and their spatial relations in text and mentally
associate them with their physical locations on Earth. Although pretrained
language models can mimic this cognitive process using linguistic context, they
do not utilize valuable geospatial information in large, widely available
geographical databases, e.g., OpenStreetMap. This paper introduces GeoLM, a
geospatially grounded language model that enhances the understanding of
geo-entities in natural language. GeoLM leverages geo-entity mentions as
anchors to connect linguistic information in text corpora with geospatial
information extracted from geographical databases. GeoLM connects the two types
of context through contrastive learning and masked language modeling. It also
incorporates a spatial coordinate embedding mechanism to encode distance and
direction relations to capture geospatial context. In the experiment, we
demonstrate that GeoLM exhibits promising capabilities in supporting toponym
recognition, toponym linking, relation extraction, and geo-entity typing, which
bridge the gap between natural language processing and geospatial sciences. The
code is publicly available at https://github.com/knowledge-computing/geolm.Comment: Accepted to EMNLP23 mai
- …