Search CORE

22,099 research outputs found

Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method

Author: Griciūtė Bernadeta
Han Lifeng
Li Hao
Nenadic Goran
Publication venue
Publication date: 10/03/2023
Field of study

Topic Modelling (TM) is from the research branches of natural language understanding (NLU) and natural language processing (NLP) that is to facilitate insightful analysis from large documents and datasets, such as a summarisation of main topics and the topic changes. This kind of discovery is getting more popular in real-life applications due to its impact on big data analytics. In this study, from the social-media and healthcare domain, we apply popular Latent Dirichlet Allocation (LDA) methods to model the topic changes in Swedish newspaper articles about Coronavirus. We describe the corpus we created including 6515 articles, methods applied, and statistics on topic changes over approximately 1 year and two months period of time from 17th January 2020 to 13th March 2021. We hope this work can be an asset for grounding applications of topic modelling and can be inspiring for similar case studies in an era with pandemics, to support socio-economic impact research as well as clinical and healthcare analytics. Our data and source code are openly available at https://github. com/poethan/Swed_Covid_TM Keywords: Latent Dirichlet Allocation (LDA); Topic Modelling; Coronavirus; Pandemics; Natural Language Understanding; BERT-topicComment: 14 pages, 14 figure

arXiv.org e-Print Archive

Is Stack Overflow Overflowing With Questions and Tags

Author: K. Ranjitha R.
Singh Sanjay
Publication venue
Publication date: 01/01/2015
Field of study

Programming question and answer (Q & A) websites, such as Quora, Stack Overflow, and Yahoo! Answer etc. helps us to understand the programming concepts easily and quickly in a way that has been tested and applied by many software developers. Stack Overflow is one of the most frequently used programming Q\&A website where the questions and answers posted are presently analyzed manually, which requires a huge amount of time and resource. To save the effort, we present a topic modeling based technique to analyze the words of the original texts to discover the themes that run through them. We also propose a method to automate the process of reviewing the quality of questions on Stack Overflow dataset in order to avoid ballooning the stack overflow with insignificant questions. The proposed method also recommends the appropriate tags for the new post, which averts the creation of unnecessary tags on Stack Overflow.Comment: 11 pages, 7 figures, 3 tables Presented at Third International Symposium on Women in Computing and Informatics (WCI-2015

arXiv.org e-Print Archive

Crossref

Sequences of purchases in credit card data reveal life styles in urban populations

Author: Di Clemente Riccardo
González Marta C.
Luengo-Oroz Miguel
Travizano Matias
Vaitla Bapu
Xu Sharon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/08/2018
Field of study

Zipf-like distributions characterize a wide set of phenomena in physics, biology, economics and social sciences. In human activities, Zipf-laws describe for example the frequency of words appearance in a text or the purchases types in shopping patterns. In the latter, the uneven distribution of transaction types is bound with the temporal sequences of purchases of individual choices. In this work, we define a framework using a text compression technique on the sequences of credit card purchases to detect ubiquitous patterns of collective behavior. Clustering the consumers by their similarity in purchases sequences, we detect five consumer groups. Remarkably, post checking, individuals in each group are also similar in their age, total expenditure, gender, and the diversity of their social and mobility networks extracted by their mobile phone records. By properly deconstructing transaction data with Zipf-like distributions, this method uncovers sets of significant sequences that reveal insights on collective human behavior.Comment: 30 pages, 26 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

eScholarship - University of California

A comparative study on face recognition techniques and neural network

Author: Rahman Meftah Ur
Publication venue
Publication date: 06/10/2012
Field of study

In modern times, face recognition has become one of the key aspects of computer vision. There are at least two reasons for this trend; the first is the commercial and law enforcement applications, and the second is the availability of feasible technologies after years of research. Due to the very nature of the problem, computer scientists, neuro-scientists and psychologists all share a keen interest in this field. In plain words, it is a computer application for automatically identifying a person from a still image or video frame. One of the ways to accomplish this is by comparing selected features from the image and a facial database. There are hundreds if not thousand factors associated with this. In this paper some of the most common techniques available including applications of neural network in facial recognition are studied and compared with respect to their performance.Comment: 8 page

arXiv.org e-Print Archive

CiteSeerX