4,835 research outputs found
Dual Long Short-Term Memory Networks for Sub-Character Representation Learning
Characters have commonly been regarded as the minimal processing unit in
Natural Language Processing (NLP). But many non-latin languages have
hieroglyphic writing systems, involving a big alphabet with thousands or
millions of characters. Each character is composed of even smaller parts, which
are often ignored by the previous work. In this paper, we propose a novel
architecture employing two stacked Long Short-Term Memory Networks (LSTMs) to
learn sub-character level representation and capture deeper level of semantic
meanings. To build a concrete study and substantiate the efficiency of our
neural architecture, we take Chinese Word Segmentation as a research case
example. Among those languages, Chinese is a typical case, for which every
character contains several components called radicals. Our networks employ a
shared radical level embedding to solve both Simplified and Traditional Chinese
Word Segmentation, without extra Traditional to Simplified Chinese conversion,
in such a highly end-to-end way the word segmentation can be significantly
simplified compared to the previous work. Radical level embeddings can also
capture deeper semantic meaning below character level and improve the system
performance of learning. By tying radical and character embeddings together,
the parameter count is reduced whereas semantic knowledge is shared and
transferred between two levels, boosting the performance largely. On 3 out of 4
Bakeoff 2005 datasets, our method surpassed state-of-the-art results by up to
0.4%. Our results are reproducible, source codes and corpora are available on
GitHub.Comment: Accepted & forthcoming at ITNG-201
Long short-term memory networks for earthquake detection in Venezuelan regions
Reliable earthquake detection and location algorithms are necessary to properly catalog and analyze the continuously growing seismic records. This paper reports the results of applying Long Short-Term Memory (LSTM) networks to single-station three-channel waveforms for P-wave earthquake detection in western and north central regions of Venezuela. Precisely, we apply our technique to study the seismicity along the dextral strike-slip Boconó and La Victoria - San Sebastián faults, with complex tectonics driven by the interactions between the South American and Caribbean plates.Peer ReviewedPostprint (author's final draft
Response Characterization for Auditing Cell Dynamics in Long Short-term Memory Networks
In this paper, we introduce a novel method to interpret recurrent neural
networks (RNNs), particularly long short-term memory networks (LSTMs) at the
cellular level. We propose a systematic pipeline for interpreting individual
hidden state dynamics within the network using response characterization
methods. The ranked contribution of individual cells to the network's output is
computed by analyzing a set of interpretable metrics of their decoupled step
and sinusoidal responses. As a result, our method is able to uniquely identify
neurons with insightful dynamics, quantify relationships between dynamical
properties and test accuracy through ablation analysis, and interpret the
impact of network capacity on a network's dynamical distribution. Finally, we
demonstrate generalizability and scalability of our method by evaluating a
series of different benchmark sequential datasets
Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Path
Relation classification is an important research arena in the field of
natural language processing (NLP). In this paper, we present SDP-LSTM, a novel
neural network to classify the relation of two entities in a sentence. Our
neural architecture leverages the shortest dependency path (SDP) between two
entities; multichannel recurrent neural networks, with long short term memory
(LSTM) units, pick up heterogeneous information along the SDP. Our proposed
model has several distinct features: (1) The shortest dependency paths retain
most relevant information (to relation classification), while eliminating
irrelevant words in the sentence. (2) The multichannel LSTM networks allow
effective information integration from heterogeneous sources over the
dependency paths. (3) A customized dropout strategy regularizes the neural
network to alleviate overfitting. We test our model on the SemEval 2010
relation classification task, and achieve an -score of 83.7\%, higher than
competing methods in the literature.Comment: EMNLP '1
- …