6 research outputs found
Jointly Modeling Topics and Intents with Global Order Structure
Modeling document structure is of great importance for discourse analysis and
related applications. The goal of this research is to capture the document
intent structure by modeling documents as a mixture of topic words and
rhetorical words. While the topics are relatively unchanged through one
document, the rhetorical functions of sentences usually change following
certain orders in discourse. We propose GMM-LDA, a topic modeling based
Bayesian unsupervised model, to analyze the document intent structure
cooperated with order information. Our model is flexible that has the ability
to combine the annotations and do supervised learning. Additionally, entropic
regularization can be introduced to model the significant divergence between
topics and intents. We perform experiments in both unsupervised and supervised
settings, results show the superiority of our model over several
state-of-the-art baselines.Comment: Accepted by AAAI 201
TechMiner: Extracting Technologies from Academic Publications
In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others. TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision
Automatic Title Generation in Scientific Articles for Authorship Assistance: A Summarization Approach
This paper presents a studyon automatic title generation for scientific articles considering sentence information types known as rhetorical categories. A title can be seenas a high-compression summary of a document. A rhetorical category is an information type conveyed by the author of a text for each textual unit, for example: background, method, or result of the research. The experiment in this studyfocused on extracting the research purpose and research method information for inclusion in a computer-generated title. Sentences are classifiedinto rhetorical categories, after which these sentences are filtered using three methods. Three title candidates whose contents reflect the filtered sentencesare then generated using a template-based or an adaptive K-nearest neighbor approach. The experiment was conducted using two different dataset domains: computational linguistics and chemistry. Our study obtained a 0.109-0.255 F1-measure score on average for computer-generated titles compared to original titles. In a human evaluation the automatically generated titles were deemed 'relatively acceptable' in the computational linguistics domain and 'not acceptable' in the chemistry domain. It can be concluded that rhetorical categories have unexplored potential to improve the performance of summarization tasks in general