Search CORE

330 research outputs found

Automatic Term Identification for Bibliometric Mapping

Author: Buter R.K. (Reindert)
Eck N.J.P. (Nees Jan) van
Noyons E.C.M. (Ed)
Waltman L. (Ludo)
Publication venue: Eck, N.J.P. (Nees Jan) van
Publication date: 03/12/2008
Field of study

A term map is a map that visualizes the structure of a scientific field by showing the relations between important terms in the field. The terms shown in a term map are usually selected manually with the help of domain experts. Manual term selection has the disadvantages of being subjective and labor-intensive. To overcome these disadvantages, we propose a methodology for automatic term identification and we use this methodology to select the terms to be included in a term map. To evaluate the proposed methodology, we use it to construct a term map of the field of operations research. The quality of the map is assessed by a number of operations research experts. It turns out that in general the proposed methodology performs quite well

Erasmus University Digital Repository

Storylines: Visual exploration and analysis in latent semantic spaces

Author: Chaomei Chen
Deerwester
Ding
Freeman
Kamada
Landauer
Shen
Weizhong Zhu
Wilkinson
Wong
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Using Natural Language Parsers for Authorship Attribution

Author: Magnera Westerly A D
Publication venue: RIT Scholar Works
Publication date: 01/01/2003
Field of study

The goal of authorship attribution is to find a set of unconscious writing characteristics or style features that distinguish text written by one person from text written by another. Once these features are found, they can be used to pair a text with the individual who wrote it. It is now well accepted that authors develop distinct and unconscious writing features. Over one thousand stylometric features (style markers) have been proposed in a variety of research disciplines [44] but none of that research has looked at the syntactic structure of the text. I conjectures that the distinct writing features of an author are not limited to these features already studied, but also include syntactic features. To support this hypothesis, I ran experiments using two open source parsing programs and analyzed the results to see if features given to me from these programs were enough for me to determine who is the most probable author of a text. Parsing programs are designed to determine syntactic structures in nat ural language. They take a text or a writing sample and produce output showing the grammatical relationship between the words in the text. They provide a means to test the hypothesis that authors\u27 syntactic use of words provide enough identifying characteristics to differentiate between them. Using two open source natural language parsing programs, the Link Gram mar Parser and Collins\u27 Parser, this research tested to see if an authors sentence structure is unique enough to provide a means of recognizing the probable author of a text. Initial data was collected on a pool of test au thors. Sample texts by each author were run through both parsers. The output of each parser was analyzed using two multivariate analysis methods: discriminant analysis and cluster k- means. My results show that syntactic sentence structures may be a viable method for authorship attribution. The Link Grammar shows promise as a way to augment authorship attribution methods already out there. Collins\u27 Parser provided even better results that should be solid enough to stand on their own as a new and viable alternative to methods that already exist. Collins\u27 parser also provided new predictors that might improve current authorship attribution methods. For example, elements and phrases with wh- words and the length of noun phrases are highly corrolated with authorship in this study

RIT Scholar Works

Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods

Author: Pedersen Ted
Publication venue
Publication date: 01/10/2010
Field of study

Measuring the similarity of short written contexts is a fundamental problem in Natural Language Processing. This article provides a unifying framework by which short context problems can be categorized both by their intended application and proposed solution. The goal is to show that various problems and methodologies that appear quite different on the surface are in fact very closely related. The axes by which these categorizations are made include the format of the contexts (headed versus headless), the way in which the contexts are to be measured (first-order versus second-order similarity), and the information used to represent the features in the contexts (micro versus macro views). The unifying thread that binds together many short context applications and methods is the fact that similarity decisions must be made between contexts that share few (if any) words in common.Comment: 23 page

arXiv.org e-Print Archive

University of Minnesota Digital Conservancy

Leveraging Natural Language Processing to Analyze Scientific Content: Proposal of an NLP pipeline for the field of Computer Vision

Author: Kortum Henrik
Leimkühler Max
Thomas Oliver
Publication venue: AIS Electronic Library (AISeL)
Publication date: 16/02/2021
Field of study

AIS Electronic Library (AISeL)

Term-community-based topic detection with variable resolution

Author: Hamm Andreas
Odrowski Simon
Publication venue: 'MDPI AG'
Publication date: 25/03/2021
Field of study

Network-based procedures for topic detection in huge text collections offer an intuitive alternative to probabilistic topic models. We present in detail a method that is especially designed with the requirements of domain experts in mind. Like similar methods, it employs community detection in term co-occurrence graphs, but it is enhanced by including a resolution parameter that can be used for changing the targeted topic granularity. We also establish a term ranking and use semantic word-embedding for presenting term communities in a way that facilitates their interpretation. We demonstrate the application of our method with a widely used corpus of general news articles and show the results of detailed social-sciences expert evaluations of detected topics at various resolutions. A comparison with topics detected by Latent Dirichlet Allocation is also included. Finally, we discuss factors that influence topic interpretation.Comment: 31 pages, 6 figure

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Multidisciplinary Digital Publishing Institute