5,947 research outputs found
POS Tagging and its Applications for Mathematics
Content analysis of scientific publications is a nontrivial task, but a
useful and important one for scientific information services. In the Gutenberg
era it was a domain of human experts; in the digital age many machine-based
methods, e.g., graph analysis tools and machine-learning techniques, have been
developed for it. Natural Language Processing (NLP) is a powerful
machine-learning approach to semiautomatic speech and language processing,
which is also applicable to mathematics. The well established methods of NLP
have to be adjusted for the special needs of mathematics, in particular for
handling mathematical formulae. We demonstrate a mathematics-aware part of
speech tagger and give a short overview about our adaptation of NLP methods for
mathematical publications. We show the use of the tools developed for key
phrase extraction and classification in the database zbMATH
An approach to graph-based analysis of textual documents
In this paper a new graph-based model is proposed for the representation of textual documents. Graph-structures are obtained from textual documents by making use of the well-known Part-Of-Speech (POS) tagging technique. More specifically, a simple rule-based (re) classifier is used to map each tag onto graph vertices and edges. As a result, a decomposition of textual documents is obtained where tokens are automatically parsed and attached to either a vertex or an edge. It is shown how textual documents can be aggregated through their graph-structures and finally, it is shown how vertex-ranking methods can be used to find relevant tokens.(1)
A Machine Learning Approach For Opinion Holder Extraction In Arabic Language
Opinion mining aims at extracting useful subjective information from reliable
amounts of text. Opinion mining holder recognition is a task that has not been
considered yet in Arabic Language. This task essentially requires deep
understanding of clauses structures. Unfortunately, the lack of a robust,
publicly available, Arabic parser further complicates the research. This paper
presents a leading research for the opinion holder extraction in Arabic news
independent from any lexical parsers. We investigate constructing a
comprehensive feature set to compensate the lack of parsing structural
outcomes. The proposed feature set is tuned from English previous works coupled
with our proposed semantic field and named entities features. Our feature
analysis is based on Conditional Random Fields (CRF) and semi-supervised
pattern recognition techniques. Different research models are evaluated via
cross-validation experiments achieving 54.03 F-measure. We publicly release our
own research outcome corpus and lexicon for opinion mining community to
encourage further research
- …