73 research outputs found

    Extracting scientific trends by mining topics from Call for Papers

    Get PDF
    © 2019, Emerald Publishing Limited. Purpose: The purpose of this paper is to present a novel approach for mining scientific trends using topics from Call for Papers (CFP). The work contributes a valuable input for researchers, academics, funding institutes and research administration departments by sharing the trends to set directions of research path. Design/methodology/approach: The authors procure an innovative CFP data set to analyse scientific evolution and prestige of conferences that set scientific trends using scientific publications indexed in DBLP. Using the Field of Research code 804 from Australian Research Council, the authors identify 146 conferences (from 2006 to 2015) into different thematic areas by matching the terms extracted from publication titles with the Association for Computing Machinery Computing Classification System. Furthermore, the authors enrich the vocabulary of terms from the WordNet dictionary and Growbag data set. To measure the significance of terms, the authors adopt the following weighting schemas: probabilistic, gram, relative, accumulative and hierarchal. Findings: The results indicate the rise of “big data analytics” from CFP topics in the last few years. Whereas the topics related to “privacy and security” show an exponential increase, the topics related to “semantic web” show a downfall in recent years. While analysing publication output in DBLP that matches CFP indexed in ERA Core A* to C rank conference, the authors identified that A* and A tier conferences not merely set publication trends, since B or C tier conferences target similar CFP. Originality/value: Overall, the analyses presented in this research are prolific for the scientific community and research administrators to study research trends and better data management of digital libraries pertaining to the scientific literature

    Analysis of Family-Health-Related Topics on Wikipedia

    Get PDF
    New concepts, terms, and topics always emerge; and meanings of existing terms and topics keep changing all the time. These phenomena occur more frequently on social media than on conventional media because social media allows a huge number of users to generate information online. Retrieving relevant results in different time periods of a fast-changing topic becomes one of the most difficult challenges in the information retrieval field. Among numerous topics discussed on social media, health-related topics are a major category which attracts increasing attention from the general public. This study investigated and explored the evolution patterns of family-health-related topics on Wikipedia. Three family-health-related topics (Child Maltreatment, Family Planning, and Women’s Health) were selected from the World Health Organization Website and their associated entries were retrieved on Wikipedia. Historical numeric and text data of the entries from 2010 to 2017 were collected from a Wikipedia data dump and the Wikipedia Web pages. Four periods were defined: 2010 to 2011, 2012 to 2013, 2014 to 2015, and 2016 to 2017. Coding, subject analysis, descriptive statistical analysis, inferential statistical analysis, SOM approach, and n-gram approach were employed to explore the internal characteristics and external popularity evolutions of the topics. The findings illustrate that the external popularities of the family-health-related topics declined from 2010 to 2017, although their content on Wikipedia kept increasing. The emerged entries had three features: specialization, summarization, and internationalization. The subjects derived from the entries became increasingly diverse during the investigated periods. Meanwhile, the developing trajectories of the subjects varied from one to another. According to the developing trajectories, the subjects were grouped into three categories: growing subject, diminishing subject, and fluctuating subject. The popularities of the topics among the Wikipedia viewers were consistent, while among the editors were not. For each topic, its popularity trend among the editors and the viewers was inconsistent. Child Maltreatment was the most popular among the three topics, Women’s Health was the second most popular, while Family Planning was the least popular among the three. The implications of this study include: (1) helping health professionals and general users get a more comprehensive understanding of the investigated topics; (2) contributing to the developments of health ontologies and consumer health vocabularies; (3) assisting Website designers in organizing online health information and helping them identify popular family-health-related topics; (4) providing a new approach for query recommendation in information retrieval systems; (5) supporting temporal information retrieval by presenting the temporal changes of family-health-related topics; and (6) providing a new combination of data collection and analysis methods for researchers

    Statistical data mining for Sina Weibo, a Chinese micro-blog: sentiment modelling and randomness reduction for topic modelling

    Get PDF
    Before the arrival of modern information and communication technology, it was not easy to capture people’s thoughts and sentiments; however, the development of statistical data mining techniques and the prevalence of mass social media provide opportunities to capture those trends. Among all types of social media, micro-blogs make use of the word limit of 140 characters to force users to get straight to thepoint, thus making the posts brief but content-rich resources for investigation. The data mining object of this thesis is Weibo, the most popular Chinese micro-blog. In the first part of the thesis, we attempt to perform various exploratory data mining on Weibo. After the literature review of micro-blogs, the initial steps of data collection and data pre-processing are introduced. This is followed by analysis of the time of the posts, analysis between intensity of the post and share price, term frequency and cluster analysis. Secondly, we conduct time series modelling on the sentiment of Weibo posts. Considering the properties of Weibo sentiment, we mainly adopt the framework of ARMA mean with GARCH type conditional variance to fit the patterns. Other distinct models are also considered for negative sentiment for its complexity. Model selection and validation are introduced to verify the fitted models. Thirdly, Latent Dirichlet Allocation (LDA) is explained in depth as a way to discover topics from large sets of textual data. The major contribution is creating a Randomness Reduction Algorithm applied to post-process the output of topic models, filtering out the insignificant topics and utilising topic distributions to find out the most persistent topics. At the end of this chapter, evidence of the effectiveness of the Randomness Reduction is presented from empirical studies. The topic classification and evolution is also unveiled

    Detecting consumer emotions on social networking websites

    Get PDF
    The social networking environment goes beyond connecting friends. It also connects customers with companies and vice versa. Customers share their experience with friends, followers, and companies and these experiences carry sentiments and emotions thereby creating big data. There is an ocean of data that is available for companies to extract and make meaning out of it by applying to different business contexts such as consumer feedback analysis and marketing & communications. For companies to benefit from consumer emotion data, they must make use of computational methods that can save time and work consumed by traditional consumer research methods such as questionnaires and interviews. The objective of this research is to explore existing literatures on detecting consumer emotions from social networking data. The author carried out a systematic literature review on research articles from three bibliographic databases with the intent to find out social networking data extraction process, dataset sizes, computational methods used, consumer sentiments, emotions studied, limitations and its application in a managerial context. To further understand consumer emotion detection, a case study in the form of a Twitter marketing campaign was conducted to emulate the process of consumer emotion detection on a company that is selling stress management products and services. The results indicate that most companies use Twitter networking platform to carry out consumer emotion analysis. The dataset sizes range from small to very large. The studies have used variety of computational methods, some with accuracies to measure the performance. These methods have been applied in various industries such as travel, restaurant, healthcare, and finance to name a few. Managerial applications include marketing, supply chain, feedback analysis, product development, and customer satisfaction. There are few limitations that were identified from using these methods. The case study results and discussion with the case company CIO communicated the potential for the use of some of the methods for consumer behavior research. The valuable feedback from the CIO revealed that by customizing existing methods, their company can create new tools and methods to understand their customers by providing better recommendations and customize their offerings to individual customers

    Event detection in social networks

    Get PDF

    Sentiment Analysis of Twitter Data

    Get PDF
    The rapid expansion and acceptance of social media has opened doors into users’ opinions and perceptions that were never as accessible as they are with today\u27s prevalence of mobile technology. Harvested data, analyzed for opinions and sentiment can provide powerful insight into a population. This research utilizes Twitter data due to its widespread global use, in order to examine the sentiment associated with tweets. An approach utilizing Twitter #hashtags and Latent Dirichlet Allocation topic modeling were utilized to differentiate between tweet topics. A lexicographical dictionary was then utilized to classify sentiment. This method provides a framework for an analyst to ingest Twitter data, conduct an analysis and provide insight into the sentiment contained within the data

    AI approaches to understand human deceptions, perceptions, and perspectives in social media

    Get PDF
    Social media platforms have created virtual space for sharing user generated information, connecting, and interacting among users. However, there are research and societal challenges: 1) The users are generating and sharing the disinformation 2) It is difficult to understand citizens\u27 perceptions or opinions expressed on wide variety of topics; and 3) There are overloaded information and echo chamber problems without overall understanding of the different perspectives taken by different people or groups. This dissertation addresses these three research challenges with advanced AI and Machine Learning approaches. To address the fake news, as deceptions on the facts, this dissertation presents Machine Learning approaches for fake news detection models, and a hybrid method for topic identification, whether they are fake or real. To understand the user\u27s perceptions or attitude toward some topics, this study analyzes the sentiments expressed in social media text. The sentiment analysis of posts can be used as an indicator to measure how topics are perceived by the users and how their perceptions as a whole can affect decision makers in government and industry, especially during the COVID-19 pandemic. It is difficult to measure the public perception of government policies issued during the pandemic. The citizen responses to the government policies are diverse, ranging from security or goodwill to confusion, fear, or anger. This dissertation provides a near real-time approach to track and monitor public reactions toward government policies by continuously collecting and analyzing Twitter posts about the COVID-19 pandemic. To address the social media\u27s overwhelming number of posts, content echo-chamber, and information isolation issue, this dissertation provides a multiple view-based summarization framework where the same contents can be summarized according to different perspectives. This framework includes components of choosing the perspectives, and advanced text summarization approaches. The proposed approaches in this dissertation are demonstrated with a prototype system to continuously collect Twitter data about COVID-19 government health policies and provide analysis of citizen concerns toward the policies, and the data is analyzed for fake news detection and for generating multiple-view summaries

    Statistical models for the analysis of short user-generated documents: author identification for conversational documents

    Get PDF
    In recent years short user-generated documents have been gaining popularity on the Internet and attention in the research communities. This kind of documents are generated by users of the various online services: platforms for instant messaging communication, for real-time status posting, for discussing and for writing reviews. Each of these services allows users to generate written texts with particular properties and which might require specific algorithms for being analysed. In this dissertation we are presenting our work which aims at analysing this kind of documents. We conducted qualitative and quantitative studies to identify the properties that might allow for characterising them. We compared the properties of these documents with the properties of standard documents employed in the literature, such as newspaper articles, and defined a set of characteristics that are distinctive of the documents generated online. We also observed two classes within the online user-generated documents: the conversational documents and those involving group discussions. We later focused on the class of conversational documents, that are short and spontaneous. We created a novel collection of real conversational documents retrieved online (e.g. Internet Relay Chat) and distributed it as part of an international competition (PAN @ CLEF'12). The competition was about author characterisation, which is one of the possible studies of authorship attribution documented in the literature. Another field of study is authorship identification, that became our main topic of research. We approached the authorship identification problem in its closed-class variant. For each problem we employed documents from the collection we released and from a collection of Twitter messages, as representative of conversational or short user-generated documents. We proved the unsuitability of standard authorship identification techniques for conversational documents and proposed novel methods capable of reaching better accuracy rates. As opposed to standard methods that worked well only for few authors, the proposed technique allowed for reaching significant results even for hundreds of users
    • …
    corecore