71 research outputs found
Explanation and Downscalability of Google's Dependency Parser Parsey McParseface
Using the data collected during the hyperparameter tuning for Google's Dependency Parser Parsey McParseface, Feedforward neural networks and the correlation between its hyperparameter during the networks training are explained and analysed in depth.:1 Introduction to Neural Networks 4
1.1 History of AI 4
1.2 The role of Neural Networks in AI Research 6
1.2.1 Artificial Intelligence 6
1.2.2 Machine Learning 6
1.2.3 Neural Network 8
1.3 Structure of Neural Networks 8
1.3.1 Biology Analogy of Artificial Neural Networks 9
1.3.2 Architecture of Artificial Neural Networks 9
1.3.3 Biological Model of Nodes – Neurons 11
1.3.4 Structure of Artificial Neurons 12
1.4 Training a Neural Network 21
1.4.1 Data 21
1.4.2 Hyperparameters 22
1.4.3 Training process 26
1.4.4 Overfitting 27
2 Natural Language Processing (NLP) 29
2.1 Data Preparation 29
2.1.1 Text Preprocessing 29
2.1.2 Part-of-Speech Tagging 30
2.2 Dependency Parsing 31
2.2.1 Dependency Grammar 31
2.2.2 Dependency Parsing Rule-Based & Data-Driven Approach 33
2.2.3 Syntactic Parser 33
2.3 Parsey McParseface 34
2.3.1 SyntaxNet 34
2.3.2 Corpus 34
2.3.3 Architecture 34
2.3.4 Improvements to the Feed Forward Neural Network 38
3 Training of Parsey’s Cousins 41
3.1 Training a Model 41
3.1.1 Building the Framework 41
3.1.2 Corpus 41
3.1.3 Training Process 43
3.1.4 Settings for the Training 44
3.2 Results and Analysis 46
3.2.1 Results from Google’s Models 46
3.2.2 Effect of Hyperparameter 47
4 Conclusion 63
5 Bibliography 65
6 Appendix 7
On Multilingual Training of Neural Dependency Parsers
We show that a recently proposed neural dependency parser can be improved by
joint training on multiple languages from the same family. The parser is
implemented as a deep neural network whose only input is orthographic
representations of words. In order to successfully parse, the network has to
discover how linguistically relevant concepts can be inferred from word
spellings. We analyze the representations of characters and words that are
learned by the network to establish which properties of languages were
accounted for. In particular we show that the parser has approximately learned
to associate Latin characters with their Cyrillic counterparts and that it can
group Polish and Russian words that have a similar grammatical function.
Finally, we evaluate the parser on selected languages from the Universal
Dependencies dataset and show that it is competitive with other recently
proposed state-of-the art methods, while having a simple structure.Comment: preprint accepted into the TSD201
Elimination of Spurious Ambiguity in Transition-Based Dependency Parsing
We present a novel technique to remove spurious ambiguity from transition
systems for dependency parsing. Our technique chooses a canonical sequence of
transition operations (computation) for a given dependency tree. Our technique
can be applied to a large class of bottom-up transition systems, including for
instance Nivre (2004) and Attardi (2006)
Deterministic choices in a data-driven parser.
Data-driven parsers rely on recommendations from parse models,
which are generated from a set of training data using a machine learning classifier,
to perform parse operations. However, in some cases a parse model cannot
recommend a parse action to a parser unless it learns from the training
data what parse action(s) to take in every possible situation. Therefore, it will
be hard for a parser to make an informed decision as to what parse operation
to perform when a parse model recommends no/several parse actions to a parser. Here we examine the effect of various deterministic choices on a datadriven
parser when it is presented with no/several recommendation from a
parse model
OCRonym: Entity Extraction and Retrieval for Scanned Books
In the past five years, massive book-scanning projects have produced an explosion in the number of sources for the humanities, available on-line to the broadest possible audiences. Transcribing page images by optical character recognition makes many searching and browsing tasks practical for scholars. But even low OCR error rates compound into high probability of error in a given sentence, and the error rate is even higher for names. We propose to build a prototype system for information extraction and retrieval of noisy OCR. In particular, we will optimize the extraction and retrieval of names, which are highly informative features for detecting topics and events in documents. We will build statistical models of characters and words from scanned books to improve lexical coverage, and we will improve name categorization and disambiguation by linking document contexts to external sources such as Wikipedia. Our testbed comes from over one million scanned books from the Internet Archive
- …