72 research outputs found
Preserving Bibliographic Relationships in Mappings from FRBR to BIBFRAME 2.0
In the environment of the World Wide Web large volumes of library data have been pub-lished following different conceptual models. The navigation through these volumes and the data interlinking require the development of mappings between the conceptual models. Library conceptual models provide constructs for the representation of bibliographic families and the relationships between Works. A key requirement for successful map-pings between different conceptual models is to preserve such content relationships. This paper studies a set of cases (Work with single Expression, Work with multiple Expres-sions, translation, adaptation) to examine if and how bibliographic content relationships and families could be preserved in mappings from FRBR to BIBFRAME 2.0. Even though, relationships between Works of the same bibliographic family may be preserved, the progenitor Work is not always represented in BIBFRAME after mappings
Topic Modeling and Text Analysis for Qualitative Policy Research
This paper contributes to a critical methodological discussion that has direct ramifications for policy studies: how computational methods can be concretely incorporated into existing processes of textual analysis and interpretation without compromising scientific integrity. We focus on the computational method of topic modeling and investigate how it interacts with two larger families of qualitative methods: content and classification methods characterized by interest in words as communication units and discourse and representation methods characterized by interest in the meaning of communicative acts. Based on analysis of recent academic publications that have used topic modeling for textual analysis, our findings show that different mixed‐method research designs are appropriate when combining topic modeling with the two groups of methods. Our main concluding argument is that topic modeling enables scholars to apply policy theories and concepts to much larger sets of data. That said, the use of computational methods requires genuine understanding of these techniques to obtain substantially meaningful results. We encourage policy scholars to reflect carefully on methodological issues, and offer a simple heuristic to help identify and address critical points when designing a study using topic modeling.Peer reviewe
A unified latent variable model for contrastive opinion mining
There are large and growing textual corpora in which people express contrastive opinions about the same topic. This has led to an increasing number of studies about contrastive opinion mining. However, there are several notable issues with the existing studies. They mostly focus on mining contrastive opinions from multiple data collections, which need to be separated into their respective collections beforehand. In addition, existing models are opaque in terms of the relationship between topics that are extracted and the sentences in the corpus which express the topics; this opacity does not help us understand the opinions expressed in the corpus. Finally, contrastive opinion is mostly analysed qualitatively rather than quantitatively. This paper addresses these matters and proposes a novel unified latent variable model (contraLDA), which: mines contrastive opinions from both single and multiple data collections, extracts the sentences that project the contrastive opinion, and measures the strength of opinion contrastiveness towards the extracted topics. Experimental results show the effectiveness of our model in mining contrasted opinions, which outperformed our baselines in extracting coherent and informative sentiment-bearing topics. We further show the accuracy of our model in classifying topics and sentiments of textual data, and we compared our results to five strong baselines
Flying to Quality: Cultural Influences on Online Reviews
Customers increasingly consult opinions expressed online before making their final decisions. However, inherent factors such as culture may moderate the criteria and the weights individuals use to form their expectations and evaluations. Therefore, not all opinions expressed online match customers’ personal preferences, neither can firms use this information to deduce general conclusions. Our study explores this issue in the context of airline services using Hofstede’s framework as a theoretical anchor. We gauge the effect of each dimension as well as that of cultural distance between the passenger and the airline on the overall satisfaction with the flight as well as specific service factors. Using topic modeling, we also capture the effect of culture on review text and identify factors that are not captured by conventional rating scales. Our results provide significant insights for airline managers about service factors that affect more passengers from specific cultures leading to higher satisfaction/dissatisfaction
Mining a Digital Library for Influential Authors
When browsing a digital library of research papers, it is natural to ask which authors are most influential in a particular topic. We present a probabilistic model that ranks authors based on their influence in particular areas of scientific research. This model combines several sources of information: citation information between documents as represented by PageRank scores, authorship data gathered through automatic information extraction, and the words in paper abstracts. We propose a topic model on the words, and compare performance versus a smoothed language model by assessing the number of major award winners in the resulting ranked list of researchers
Recommended from our members
Organizing the OCA: Learning Faceted Subjects from a Library of Digital Books
Large scale library digitization projects such as the Open Content Alliance are producing vast quantities of text, but little has been done to organize this data. Subject headings inherited from card catalogs are useful but limited, while full-text indexing is most appropriate for readers who already know exactly what they want. Statistical topic models provide a complementary function. These models can identify semantically coherent ``topics\u27\u27 that are easily recognizable and meaningful to humans, but they have been too computationally intensive to run on library-scale corpora. This paper presents DCM-LDA, a topic model based on Dirichlet Compound Multinomial distributions. This model is simultaneously better able to represent observed properties of text and more scalable to extremely large text collections. We train the topic model using a form of stochastic EM. We begin by dividing the words in each book into topics independently of the other books. We then gather all the resulting topics and cluster them, learning Dirichlet parameters from each topic cluster. The resulting topical clusters can be interpreted as subject facets, allowing readers to browse the topics of a collection quickly, search for topic clusters using keywords, and explore topical relations between books. We demonstrate this method on 300 million words from 8000 books, and it easily could scale well beyond this
- …