204 research outputs found

    IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization

    Full text link
    Fine-tuning pre-trained language models (PTLMs), such as BERT and its better variant RoBERTa, has been a common practice for advancing performance in natural language understanding (NLU) tasks. Recent advance in representation learning shows that isotropic (i.e., unit-variance and uncorrelated) embeddings can significantly improve performance on downstream tasks with faster convergence and better generalization. The isotropy of the pre-trained embeddings in PTLMs, however, is relatively under-explored. In this paper, we analyze the isotropy of the pre-trained [CLS] embeddings of PTLMs with straightforward visualization, and point out two major issues: high variance in their standard deviation, and high correlation between different dimensions. We also propose a new network regularization method, isotropic batch normalization (IsoBN) to address the issues, towards learning more isotropic representations in fine-tuning by dynamically penalizing dominating principal components. This simple yet effective fine-tuning method yields about 1.0 absolute increment on the average of seven NLU tasks.Comment: AAAI 202

    An Improved Baseline for Sentence-level Relation Extraction

    Full text link
    Sentence-level relation extraction (RE) aims at identifying the relationship between two entities in a sentence. Many efforts have been devoted to this problem, while the best performing methods are still far from perfect. In this paper, we revisit two problems that affect the performance of existing RE models, namely entity representation and noisy or ill-defined labels. Our improved baseline model, incorporated with entity representations with typed markers, achieves an F1 of 74.6% on TACRED, significantly outperforms previous SOTA methods. Furthermore, the presented new baseline achieves an F1 of 91.1% on the refined Re-TACRED dataset, demonstrating that the pre-trained language models achieve unexpectedly high performance on this task. We release our code to the community for future research.Comment: Code available at https://github.com/wzhouad/RE_improved_baselin

    Learning to Grasp the Ungraspable with Emergent Extrinsic Dexterity

    Full text link
    A simple gripper can solve more complex manipulation tasks if it can utilize the external environment such as pushing the object against the table or a vertical wall, known as "Extrinsic Dexterity." Previous work in extrinsic dexterity usually has careful assumptions about contacts which impose restrictions on robot design, robot motions, and the variations of the physical parameters. In this work, we develop a system based on reinforcement learning (RL) to address these limitations. We study the task of "Occluded Grasping" which aims to grasp the object in configurations that are initially occluded; the robot needs to move the object into a configuration from which these grasps can be achieved. We present a system with model-free RL that successfully achieves this task using a simple gripper with extrinsic dexterity. The policy learns emergent behaviors of pushing the object against the wall to rotate and then grasp it without additional reward terms on extrinsic dexterity. We discuss important components of the system including the design of the RL problem, multi-grasp training and selection, and policy generalization with automatic curriculum. Most importantly, the policy trained in simulation is zero-shot transferred to a physical robot. It demonstrates dynamic and contact-rich motions with a simple gripper that generalizes across objects with various size, density, surface friction, and shape with a 78% success rate. Videos can be found at https://sites.google.com/view/grasp-ungraspable/

    Automated Surface Grinder "Slicer"

    Full text link
    ME450 Capstone Design and Manufacturing Experience: Fall 2015Mr. Bernn Hitch, President of Island Ceramic Grinding, tasked team 16 to automate a manual Chevalier FSG-618M Surface Grinder so that it can run a simple, repetitive program for slicing ceramic pieces with minimal operator interaction besides the initial setup of the program. To accomplish this goal, we were given requirements by our sponsor that include automation of the three axes of the surface grinder through one cohesive interface that allows for editing of the program while it is in use and no measurable increase in tolerance of the parts being manufactured for under $5000. Alumina is to be used for all testing, as it will ensure any product that Island Ceramic Grinding currently uses on their grinders will not present any problems for the automation. Engineering specifications were generated according to the requirements such as the tolerance of motion accuracy and precision. After performing a functional decomposition and brainstorming, several of concepts were generated to meet the requirement of our sponsor and the corresponding engineering specifications. Five Pugh Charts were created for different functionalities that need to be realized, with weighting between one and five for each criterion. The five Pugh Charts decided that the transmission would be timing belts, stepper motors would be used for in-out direction and up-down direction, a DC motor would be used for left-right direction, a sealed keypad would be used for interface; the microcontroller would be a PLC, a Hall Effect sensor would be used to control the motion of the grinder. Engineering analysis was implemented to determine the required specifications for motors and transmissions. Theoretical modeling and empirical testing were implemented to find out the maximum torque when turning the hand wheels during normal operation. The component selection was narrowed down using the specifications from this analysis, and CAD models of mountings for each axis were generated. A detailed plan for the control system is also described. FMEA and risk analyses were done to discover, evaluate, and minimize potential problems. Design verification testing was then performed to verify that the individual components performed as expected and would allow the machine to function as intended. Precision of the Y and Z-axes were confirmed to outperform expectations, and the speed of the X-axis was also deemed acceptable. Verification was unable to be performed on the system as a whole due to additional components that were needed and delays in assembling the electrical system.http://deepblue.lib.umich.edu/bitstream/2027.42/117355/1/ME450-F15-Project16-FinalReport.pd

    Sharpness-Aware Minimization with Dynamic Reweighting

    Full text link
    Deep neural networks are often overparameterized and may not easily achieve model generalization. Adversarial training has shown effectiveness in improving generalization by regularizing the change of loss on top of adversarially chosen perturbations. The recently proposed sharpness-aware minimization (SAM) algorithm conducts adversarial weight perturbation, encouraging the model to converge to a flat minima. SAM finds a common adversarial weight perturbation per-batch. Although per-instance adversarial weight perturbations are stronger adversaries and they can potentially lead to better generalization performance, their computational cost is very high and thus it is impossible to use per-instance perturbations efficiently in SAM. In this paper, we tackle this efficiency bottleneck and propose sharpness-aware minimization with dynamic reweighting ({\delta}-SAM). Our theoretical analysis motivates that it is possible to approach the stronger, per-instance adversarial weight perturbations using reweighted per-batch weight perturbations. {\delta}-SAM dynamically reweights perturbation within each batch according to the theoretically principled weighting factors, serving as a good approximation to per-instance perturbation. Experiments on various natural language understanding tasks demonstrate the effectiveness of {\delta}-SAM

    Context-faithful Prompting for Large Language Models

    Full text link
    Large language models (LLMs) encode parametric knowledge about world facts and have shown remarkable performance in knowledge-driven NLP tasks. However, their reliance on parametric knowledge may cause them to overlook contextual cues, leading to incorrect predictions in context-sensitive NLP tasks (e.g., knowledge acquisition tasks). In this paper, we seek to assess and enhance LLMs' contextual faithfulness in two aspects: knowledge conflict and prediction with abstention. We demonstrate that LLMs' faithfulness can be significantly improved using carefully designed prompting strategies. In particular, we identify opinion-based prompts and counterfactual demonstrations as the most effective methods. Opinion-based prompts reframe the context as a narrator's statement and inquire about the narrator's opinions, while counterfactual demonstrations use instances containing false facts to improve faithfulness in knowledge conflict situations. Neither technique requires additional training. We conduct experiments on three datasets of two standard NLP tasks, machine reading comprehension and relation extraction, and the results demonstrate significant improvement in faithfulness to contexts. Code and data are released at https://github.com/wzhouad/context-faithful-llm.Comment: Accepted at EMNLP 2023 Findings. Code and data are released at https://github.com/wzhouad/context-faithful-ll

    Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling

    Full text link
    Document-level relation extraction (RE) poses new challenges compared to its sentence-level counterpart. One document commonly contains multiple entity pairs, and one entity pair occurs multiple times in the document associated with multiple possible relations. In this paper, we propose two novel techniques, adaptive thresholding and localized context pooling, to solve the multi-label and multi-entity problems. The adaptive thresholding replaces the global threshold for multi-label classification in the prior work with a learnable entities-dependent threshold. The localized context pooling directly transfers attention from pre-trained language models to locate relevant context that is useful to decide the relation. We experiment on three document-level RE benchmark datasets: DocRED, a recently released large-scale RE dataset, and two datasets CDRand GDA in the biomedical domain. Our ATLOP (Adaptive Thresholding and Localized cOntext Pooling) model achieves an F1 score of 63.4, and also significantly outperforms existing models on both CDR and GDA.Comment: Accepted by AAAI 2021. Code available at https://github.com/wzhouad/ATLO

    Parameter-Efficient Tuning with Special Token Adaptation

    Full text link
    Parameter-efficient tuning aims at updating only a small subset of parameters when adapting a pretrained model to downstream tasks. In this work, we introduce PASTA, in which we only modify the special token representations (e.g., [SEP] and [CLS] in BERT) before the self-attention module at each layer in Transformer-based models. PASTA achieves comparable performance to fine-tuning in natural language understanding tasks including text classification and NER with up to only 0.029% of total parameters trained. Our work not only provides a simple yet effective way of parameter-efficient tuning, which has a wide range of practical applications when deploying finetuned models for multiple tasks, but also demonstrates the pivotal role of special tokens in pretrained language models

    GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding

    Full text link
    Humans subconsciously engage in geospatial reasoning when reading articles. We recognize place names and their spatial relations in text and mentally associate them with their physical locations on Earth. Although pretrained language models can mimic this cognitive process using linguistic context, they do not utilize valuable geospatial information in large, widely available geographical databases, e.g., OpenStreetMap. This paper introduces GeoLM, a geospatially grounded language model that enhances the understanding of geo-entities in natural language. GeoLM leverages geo-entity mentions as anchors to connect linguistic information in text corpora with geospatial information extracted from geographical databases. GeoLM connects the two types of context through contrastive learning and masked language modeling. It also incorporates a spatial coordinate embedding mechanism to encode distance and direction relations to capture geospatial context. In the experiment, we demonstrate that GeoLM exhibits promising capabilities in supporting toponym recognition, toponym linking, relation extraction, and geo-entity typing, which bridge the gap between natural language processing and geospatial sciences. The code is publicly available at https://github.com/knowledge-computing/geolm.Comment: Accepted to EMNLP23 mai
    • …
    corecore