51,967 research outputs found
Better Document-level Sentiment Analysis from RST Discourse Parsing
Discourse structure is the hidden link between surface features and
document-level properties, such as sentiment polarity. We show that the
discourse analyses produced by Rhetorical Structure Theory (RST) parsers can
improve document-level sentiment analysis, via composition of local information
up the discourse tree. First, we show that reweighting discourse units
according to their position in a dependency representation of the rhetorical
structure can yield substantial improvements on lexicon-based sentiment
analysis. Next, we present a recursive neural network over the RST structure,
which offers significant improvements over classification-based methods.Comment: Published at Empirical Methods in Natural Language Processing (EMNLP
2015
Designing a structured lexicon for document image analysis
This paper presents a structured, multi-level architecture of a lexicon which is a central component of our knowledge-based document analysis system. Our system has the task to transform incoming business letters into an equivalent electronic representation automatically. Moreover, partial text analysis and understanding of a letter\u27s body and relevant parts are initiated to enrich the conceptual knowledge about the actual document (e.g., by a classification). In such an application domain, a well-designed lexicon has to consider requirements of both, text recognition and text analysis. For that purpose, we propose an appropriate lexicon architecture and the internal structure of corresponding lexical entries being a prerequisite for successful higher-level interpretations of documents
Improving Document Representation Using Retrofitting
Data-driven learning of document vectors that capture linkage between them is of immense importance in natural language processing (NLP). These document vectors can, in turn, be used for tasks like information retrieval, document classification, and clustering. Inherently, documents are linked together in the form of links or citations in case of web pages or academic papers respectively. Methods like PV-DM or PV-DBOW try to capture the semantic representation of the document using only the text information. These methods ignore the network information altogether while learning the representation. Similarly, methods developed for network representation learning like node2vec or DeepWalk, capture the linkage information between the documents but they ignore the text information altogether. In this thesis, we proposed a method based on Retrofit for learning word embeddings using a semantic lexicon, which tries to incorporate both the text and network information together while learning the document representation. We also analyze the optimum weight for adding network information that will give us the best embedding. Our experimentation result shows that our method improves the classification score by 4% and we also introduce a new dataset containing both network and content information
Using NLP tools in the specification phase
The software quality control is one of the main topics in the Software
Engineering area. To put the effort in the quality control during the
specification phase leads us to detect possible mistakes in an early
steps and, easily, to correct them before the design and implementation
steps start. In this framework the goal of SAREL system, a
knowledge-based system, is twofold. On one hand, to help software
engineers in the creation of quality Software Requirements
Specifications. On the other hand, to analyze the correspondence between
two different conceptual representations associated with two different
Software Requirements Specification documents.
For the first goal, a set of NLP and Knowledge management tools is
applied to obtain a conceptual representation that can be validated and
managed by the software engineer.
For the second goal we have established some correspondence measures in
order to get a comparison between two conceptual representations. This
information will be useful during the interaction.Postprint (published version
- …