1,929 research outputs found
Using noun phrases extraction for the improvement of hybrid clustering with text- and citation-based components. The example of “Information Systems Research”
The hybrid clustering approach combining lexical and link-based similarities suffered for a long time from the different properties of the underlying networks. We propose a method based on noun phrase extraction using natural language processing to improve the measurement of the lexical component. Term shingles of different length are created form each of the extracted noun phrases. Hybrid networks are built based on weighted combination of the two types of similarities with seven different weights. We conclude that removing all single term shingles provides the best results at the level of computational feasibility, comparability with bibliographic coupling and also in a community detection application
Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text Spatialization
Topic models are a class of unsupervised learning algorithms for detecting
the semantic structure within a text corpus. Together with a subsequent
dimensionality reduction algorithm, topic models can be used for deriving
spatializations for text corpora as two-dimensional scatter plots, reflecting
semantic similarity between the documents and supporting corpus analysis.
Although the choice of the topic model, the dimensionality reduction, and their
underlying hyperparameters significantly impact the resulting layout, it is
unknown which particular combinations result in high-quality layouts with
respect to accuracy and perception metrics. To investigate the effectiveness of
topic models and dimensionality reduction methods for the spatialization of
corpora as two-dimensional scatter plots (or basis for landscape-type
visualizations), we present a large-scale, benchmark-based computational
evaluation. Our evaluation consists of (1) a set of corpora, (2) a set of
layout algorithms that are combinations of topic models and dimensionality
reductions, and (3) quality metrics for quantifying the resulting layout. The
corpora are given as document-term matrices, and each document is assigned to a
thematic class. The chosen metrics quantify the preservation of local and
global properties and the perceptual effectiveness of the two-dimensional
scatter plots. By evaluating the benchmark on a computing cluster, we derived a
multivariate dataset with over 45 000 individual layouts and corresponding
quality metrics. Based on the results, we propose guidelines for the effective
design of text spatializations that are based on topic models and
dimensionality reductions. As a main result, we show that interpretable topic
models are beneficial for capturing the structure of text corpora. We
furthermore recommend the use of t-SNE as a subsequent dimensionality
reduction.Comment: To be published at IEEE VIS 2023 conferenc
Pragmatic Ontology Evolution: Reconciling User Requirements and Application Performance
Increasingly, organizations are adopting ontologies to describe their large catalogues of items. These ontologies need to evolve regularly in response to changes in the domain and the emergence of new requirements. An important step of this process is the selection of candidate concepts to include in the new version of the ontology. This operation needs to take into account a variety of factors and in particular reconcile user requirements and application performance. Current ontology evolution methods focus either on ranking concepts according to their relevance or on preserving compatibility with existing applications. However, they do not take in consideration the impact of the ontology evolution process on the performance of computational tasks – e.g., in this work we focus on instance tagging, similarity computation, generation of recommendations, and data clustering. In this paper, we propose the Pragmatic Ontology Evolution (POE) framework, a novel approach for selecting from a group of candidates a set of concepts able to produce a new version of a given ontology that i) is consistent with the a set of user requirements (e.g., max number of concepts in the ontology), ii) is parametrised with respect to a number of dimensions (e.g., topological considerations), and iii) effectively supports relevant computational tasks. Our approach also supports users in navigating the space of possible solutions by showing how certain choices, such as limiting the number of concepts or privileging trendy concepts rather than historical ones, would reflect on the application performance. An evaluation of POE on the real-world scenario of the evolving Springer Nature taxonomy for editorial classification yielded excellent results, demonstrating a significant improvement over alternative approaches
Improving Bag of Visual Words Representations with Genetic Programming
The bag of visual words is a well established representation in diverse computer vision problems. Taking inspiration from the fields of text mining and retrieval, this representation has proved to be very effective in a large number of domains. In most cases, a standard term-frequency weighting scheme is considered for representing images and videos in computer vision. This is somewhat surprising, as there are many alternative ways of generating bag of words representations within the text processing community. This paper explores the use of alternative weighting schemes for landmark tasks in computer vision: image categorization and gesture recognition. We study the suitability of using well-known supervised and unsupervised weighting schemes for such tasks. More importantly, we devise a genetic program that learns new ways of representing images and videos under the bag of visual words representation. The proposed method learns to combine term-weighting primitives trying to maximize the classification performance. Experimental results are reported in standard image and video data sets showing the effectiveness of the proposed evolutionary algorithm
What attracts vehicle consumers’ buying:A Saaty scale-based VIKOR (SSC-VIKOR) approach from after-sales textual perspective?
Purpose:
The increasingly booming e-commerce development has stimulated vehicle consumers to express individual reviews through online forum. The purpose of this paper is to probe into the vehicle consumer consumption behavior and make recommendations for potential consumers from textual comments viewpoint.
Design/methodology/approach:
A big data analytic-based approach is designed to discover vehicle consumer consumption behavior from online perspective. To reduce subjectivity of expert-based approaches, a parallel NaĂŻve Bayes approach is designed to analyze the sentiment analysis, and the Saaty scale-based (SSC) scoring rule is employed to obtain specific sentimental value of attribute class, contributing to the multi-grade sentiment classification. To achieve the intelligent recommendation for potential vehicle customers, a novel SSC-VIKOR approach is developed to prioritize vehicle brand candidates from a big data analytical viewpoint.
Findings:
The big data analytics argue that “cost-effectiveness” characteristic is the most important factor that vehicle consumers care, and the data mining results enable automakers to better understand consumer consumption behavior.
Research limitations/implications:
The case study illustrates the effectiveness of the integrated method, contributing to much more precise operations management on marketing strategy, quality improvement and intelligent recommendation.
Originality/value:
Researches of consumer consumption behavior are usually based on survey-based methods, and mostly previous studies about comments analysis focus on binary analysis. The hybrid SSC-VIKOR approach is developed to fill the gap from the big data perspective
- …