12 research outputs found
Learning to Learn to Disambiguate: Meta-Learning for Few-Shot Word Sense Disambiguation
The success of deep learning methods hinges on the availability of large
training datasets annotated for the task of interest. In contrast to human
intelligence, these methods lack versatility and struggle to learn and adapt
quickly to new tasks, where labeled data is scarce. Meta-learning aims to solve
this problem by training a model on a large number of few-shot tasks, with an
objective to learn new tasks quickly from a small number of examples. In this
paper, we propose a meta-learning framework for few-shot word sense
disambiguation (WSD), where the goal is to learn to disambiguate unseen words
from only a few labeled instances. Meta-learning approaches have so far been
typically tested in an -way, -shot classification setting where each task
has classes with examples per class. Owing to its nature, WSD deviates
from this controlled setup and requires the models to handle a large number of
highly unbalanced classes. We extend several popular meta-learning approaches
to this scenario, and analyze their strengths and weaknesses in this new
challenging setting.Comment: Added additional experiment
MINER: Improving Out-of-Vocabulary Named Entity Recognition from an Information Theoretic Perspective
NER model has achieved promising performance on standard NER benchmarks.
However, recent studies show that previous approaches may over-rely on entity
mention information, resulting in poor performance on out-of-vocabulary (OOV)
entity recognition. In this work, we propose MINER, a novel NER learning
framework, to remedy this issue from an information-theoretic perspective. The
proposed approach contains two mutual information-based training objectives: i)
generalizing information maximization, which enhances representation via deep
understanding of context and entity surface forms; ii) superfluous information
minimization, which discourages representation from rote memorizing entity
names or exploiting biased cues in data. Experiments on various settings and
datasets demonstrate that it achieves better performance in predicting OOV
entities
Zero-Shot Learning with Common Sense Knowledge Graphs
Zero-shot learning relies on semantic class representations such as
hand-engineered attributes or learned embeddings to predict classes without any
labeled examples. We propose to learn class representations from common sense
knowledge graphs. Common sense knowledge graphs are an untapped source of
explicit high-level knowledge that requires little human effort to apply to a
range of tasks. To capture the knowledge in the graph, we introduce ZSL-KG, a
general-purpose framework with a novel transformer graph convolutional network
(TrGCN) for generating class representations. Our proposed TrGCN architecture
computes non-linear combinations of the node neighbourhood and shows
improvements on zero-shot learning tasks in language and vision. Our results
show ZSL-KG outperforms the best performing graph-based zero-shot learning
framework by an average of 2.1 accuracy points with improvements as high as 3.4
accuracy points. Our ablation study on ZSL-KG with alternate graph neural
networks shows that our TrGCN adds up to 1.2 accuracy points improvement on
these tasks
SIGL:Securing Software Installations Through Deep Graph Learning
Many users implicitly assume that software can only be exploited after it is
installed. However, recent supply-chain attacks demonstrate that application
integrity must be ensured during installation itself. We introduce SIGL, a new
tool for detecting malicious behavior during software installation. SIGL
collects traces of system call activity, building a data provenance graph that
it analyzes using a novel autoencoder architecture with a graph long short-term
memory network (graph LSTM) for the encoder and a standard multilayer
perceptron for the decoder. SIGL flags suspicious installations as well as the
specific installation-time processes that are likely to be malicious. Using a
test corpus of 625 malicious installers containing real-world malware, we
demonstrate that SIGL has a detection accuracy of 96%, outperforming similar
systems from industry and academia by up to 87% in precision and recall and 45%
in accuracy. We also demonstrate that SIGL can pinpoint the processes most
likely to have triggered malicious behavior, works on different audit platforms
and operating systems, and is robust to training data contamination and
adversarial attack. It can be used with application-specific models, even in
the presence of new software versions, as well as application-agnostic
meta-models that encompass a wide range of applications and installers.Comment: 18 pages, to appear in the 30th USENIX Security Symposium (USENIX
Security '21