40 research outputs found
Dependency Parsing with Dilated Iterated Graph CNNs
Dependency parses are an effective way to inject linguistic knowledge into
many downstream tasks, and many practitioners wish to efficiently parse
sentences at scale. Recent advances in GPU hardware have enabled neural
networks to achieve significant gains over the previous best models, these
models still fail to leverage GPUs' capability for massive parallelism due to
their requirement of sequential processing of the sentence. In response, we
propose Dilated Iterated Graph Convolutional Neural Networks (DIG-CNNs) for
graph-based dependency parsing, a graph convolutional architecture that allows
for efficient end-to-end GPU parsing. In experiments on the English Penn
TreeBank benchmark, we show that DIG-CNNs perform on par with some of the best
neural network parsers.Comment: 2nd Workshop on Structured Prediction for Natural Language Processing
(at EMNLP '17
Modeling the Spread of Biologically-Inspired Internet Worms
Infections by malicious software, such as Internet worms, spreading on computer networks can have devastating consequences, resulting in loss of information, time, and money. To better understand how these worms spread, and thus how to more effectively limit future infections, we apply the household model from epidemiology to simulate the proliferation of adaptive and non-adaptive preference-scanning worms, which take advantage of biologically-inspired strategies. From scans of the actual distribution of Web servers on the Internet, we find that vulnerable machines seem to be highly clustered in Internet Protocol version 4 (IPv4) address space, and our simulations suggest that this organization fosters the quick and comprehensive proliferation of preference-scanning Internet worms
Recommended from our members
Machine Learning Models for Efficient and Robust Natural Language Processing
Natural language processing (NLP) has come of age. For example, semantic role labeling (SRL), which automatically annotates sentences with a labeled graph representing who did what to whom, has in the past ten years seen nearly 40% reduction in error, bringing it to useful accuracy. As a result, a myriad of practitioners now want to deploy NLP systems on billions of documents across many domains. However, state-of-the-art NLP systems are typically not optimized for cross-domain robustness nor computational efficiency. In this dissertation I develop machine learning methods to facilitate fast and robust inference across many common NLP tasks.
First, I describe paired learning and inference algorithms for dynamic feature selection which accelerate inference in linear classifiers, the heart of the fastest NLP models, by 5-10 times. I then present iterated dilated convolutional neural networks (ID-CNNs), a distinct combination of network structure, parameter sharing and training procedures that increase inference speed by 14-20 times with accuracy matching bidirectional LSTMs, the most accurate models for NLP sequence labeling. Finally, I describe linguistically-informed self-attention (LISA), a neural network model that combines multi-head self-attention with multi-task learning to facilitate improved generalization to new domains. We show that incorporating linguistic structure in this way leads to substantial improvements over the previous state-of-the-art (syntax-free) neural network models for SRL, especially when evaluating out-of-domain. I conclude with a brief discussion of potential future directions stemming from my thesis work
Learning Dynamic Feature Selection for Fast Sequential Prediction
We present paired learning and inference algorithms for significantly
reducing computation and increasing speed of the vector dot products in the
classifiers that are at the heart of many NLP components. This is accomplished
by partitioning the features into a sequence of templates which are ordered
such that high confidence can often be reached using only a small fraction of
all features. Parameter estimation is arranged to maximize accuracy and early
confidence in this sequence. Our approach is simpler and better suited to NLP
than other related cascade methods. We present experiments in left-to-right
part-of-speech tagging, named entity recognition, and transition-based
dependency parsing. On the typical benchmarking datasets we can preserve POS
tagging accuracy above 97% and parsing LAS above 88.5% both with over a
five-fold reduction in run-time, and NER F1 above 88 with more than 2x increase
in speed.Comment: Appears in The 53rd Annual Meeting of the Association for
Computational Linguistics, Beijing, China, July 201
Understanding the Effect of Model Compression on Social Bias in Large Language Models
Large Language Models (LLMs) trained with self-supervision on vast corpora of
web text fit to the social biases of that text. Without intervention, these
social biases persist in the model's predictions in downstream tasks, leading
to representational harm. Many strategies have been proposed to mitigate the
effects of inappropriate social biases learned during pretraining.
Simultaneously, methods for model compression have become increasingly popular
to reduce the computational burden of LLMs. Despite the popularity and need for
both approaches, little work has been done to explore the interplay between
these two. We perform a carefully controlled study of the impact of model
compression via quantization and knowledge distillation on measures of social
bias in LLMs. Longer pretraining and larger models led to higher social bias,
and quantization showed a regularizer effect with its best trade-off around 20%
of the original pretraining time.Comment: EMNLP 2023 Mai
Training for Fast Sequential Prediction Using Dynamic Feature Selection
We present paired learning and inference algorithms for significantly
reducing computation and increasing speed of the vector dot products in the
classifiers that are at the heart of many NLP components. This is accomplished
by partitioning the features into a sequence of templates which are ordered
such that high confidence can often be reached using only a small fraction of
all features. Parameter estimation is arranged to maximize accuracy and early
confidence in this sequence. We present experiments in left-to-right
part-of-speech tagging on WSJ, demonstrating that we can preserve accuracy
above 97% with over a five-fold reduction in run-time.Comment: 5 pages, NIPS Modern ML + NLP Workshop 201
Data-efficient Active Learning for Structured Prediction with Partial Annotation and Self-Training
In this work we propose a pragmatic method that reduces the annotation cost
for structured label spaces using active learning. Our approach leverages
partial annotation, which reduces labeling costs for structured outputs by
selecting only the most informative sub-structures for annotation. We also
utilize self-training to incorporate the current model's automatic predictions
as pseudo-labels for un-annotated sub-structures. A key challenge in
effectively combining partial annotation with self-training to reduce
annotation cost is determining which sub-structures to select to label. To
address this challenge, we adopt an error estimator to adaptively decide the
partial selection ratio according to the current model's capability. In
evaluations spanning four structured prediction tasks, we show that our
combination of partial annotation and self-training using an adaptive selection
ratio reduces annotation cost over strong full annotation baselines under a
fair comparison scheme that takes reading time into consideration.Comment: Findings of EMNLP 202