7 research outputs found

    Tracking Events in Social Media

    Get PDF
    Tracking topical events in social media streams, such as Twitter, provides a means for users to keep up-to-date on topics of interest to them. This tracking may last a period of days, or even weeks. These events and topics might be provided by users explicitly, or generated for users from selected news articles. Push notification from social media provides a method to push the updates directly to the users on their mobile devices or desktops. In this thesis, we start with a lexical comparison between carefully edited prose and social media posts, providing an improved understanding of word usage within social media. Compared with carefully edited prose, such as news articles and Wikipedia articles, the language of social media is informal in the extreme. By using word embeddings, we identify words whose usage differs greatly between a Wikipedia corpus and a Twitter corpus. Following from this work, we explore a general method for developing succinct queries, reflecting the topic of a given news article, for the purpose of tracking the associated news event within a social media stream. A series of probe queries are generated from an initial set of candidate keywords extracted from the article. By analyzing the results of these probes, we rank and trim the candidate set to create a succinct query. The method can also be used for linking and searching among different collections. Given a query for topical events, push notification to users directly from social media streams provides a method for them to keep up-to-date on topics of personal interest. We determine that the key to effective notification lies in controlling of update volume, by establishing and maintaining appropriate thresholds for pushing updates. We explore and evaluate multiple threshold setting strategies. Push notifications should be relevant to the personal interest, and timely, with pushes occurring as soon as after the actual event occurrence as possible and novel for providing non-duplicate information. An analysis of existing evaluation metrics for push notification reflects different assumptions regarding user requirements. This analysis leads to a framework that places different weights and penalties on different behaviours and can guide the future development of a family of evaluation metrics that more accurately models user needs. Throughout the thesis, rank similarity measures are applied to compare rankings generated by various experiments. As a final component, we develop a family of rank similarity metrics based on maximized effectiveness difference, each derived from a traditional information retrieval evaluation measure. Computing this maximized effectiveness difference (MED) requires the solution of an optimization problem that varies in difficulty, depending on the associated measure. We present solutions for several standard effectiveness measures, including nDCG, MAP, and ERR. Through experimental validation, we show that MED reveals meaningful differences between retrieval runs. Mathematically, MED is a metric, regardless of the associated measure. Prior work has established a number of other desiderata for rank similarity in the context of search, and we demonstrate that MED satisfies these requirements. Unlike previous proposals, MED allows us to directly translate assumptions about user behavior from any established effectiveness measure to create a corresponding rank similarity measure. In addition, MED cleanly accommodates partial relevance judgments, and if complete relevance information is available, it reduces to a simple difference between effectiveness values

    Natural Language Processing: Emerging Neural Approaches and Applications

    Get PDF
    This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains

    Exploring the value of big data analysis of Twitter tweets and share prices

    Get PDF
    Over the past decade, the use of social media (SM) such as Facebook, Twitter, Pinterest and Tumblr has dramatically increased. Using SM, millions of users are creating large amounts of data every day. According to some estimates ninety per cent of the content on the Internet is now user generated. Social Media (SM) can be seen as a distributed content creation and sharing platform based on Web 2.0 technologies. SM sites make it very easy for its users to publish text, pictures, links, messages or videos without the need to be able to program. Users post reviews on products and services they bought, write about their interests and intentions or give their opinions and views on political subjects. SM has also been a key factor in mass movements such as the Arab Spring and the Occupy Wall Street protests and is used for human aid and disaster relief (HADR). There is a growing interest in SM analysis from organisations for detecting new trends, getting user opinions on their products and services or finding out about their online reputation. Companies such as Amazon or eBay use SM data for their recommendation engines and to generate more business. TV stations buy data about opinions on their TV programs from Facebook to find out what the popularity of a certain TV show is. Companies such as Topsy, Gnip, DataSift and Zoomph have built their entire business models around SM analysis. The purpose of this thesis is to explore the economic value of Twitter tweets. The economic value is determined by trying to predict the share price of a company. If the share price of a company can be predicted using SM data, it should be possible to deduce a monetary value. There is limited research on determining the economic value of SM data for “nowcasting”, predicting the present, and for forecasting. This study aims to determine the monetary value of Twitter by correlating the daily frequencies of positive and negative Tweets about the Apple company and some of its most popular products with the development of the Apple Inc. share price. If the number of positive tweets about Apple increases and the share price follows this development, the tweets have predictive information about the share price. A literature review has found that there is a growing interest in analysing SM data from different industries. A lot of research is conducted studying SM from various perspectives. Many studies try to determine the impact of online marketing campaigns or try to quantify the value of social capital. Others, in the area of behavioural economics, focus on the influence of SM on decision-making. There are studies trying to predict financial indicators such as the Dow Jones Industrial Average (DJIA). However, the literature review has indicated that there is no study correlating sentiment polarity on products and companies in tweets with the share price of the company. The theoretical framework used in this study is based on Computational Social Science (CSS) and Big Data. Supporting theories of CSS are Social Media Mining (SMM) and sentiment analysis. Supporting theories of Big Data are Data Mining (DM) and Predictive Analysis (PA). Machine learning (ML) techniques have been adopted to analyse and classify the tweets. In the first stage of the study, a body of tweets was collected and pre-processed, and then analysed for their sentiment polarity towards Apple Inc., the iPad and the iPhone. Several datasets were created using different pre-processing and analysis methods. The tweet frequencies were then represented as time series. The time series were analysed against the share price time series using the Granger causality test to determine if one time series has predictive information about the share price time series over the same period of time. For this study, several Predictive Analytics (PA) techniques on tweets were evaluated to predict the Apple share price. To collect and analyse the data, a framework has been developed based on the LingPipe (LingPipe 2015) Natural Language Processing (NLP) tool kit for sentiment analysis, and using R, the functional language and environment for statistical computing, for correlation analysis. Twitter provides an API (Application Programming Interface) to access and collect its data programmatically. Whereas no clear correlation could be determined, at least one dataset was showed to have some predictive information on the development of the Apple share price. The other datasets did not show to have any predictive capabilities. There are many data analysis and PA techniques. The techniques applied in this study did not indicate a direct correlation. However, some results suggest that this is due to noise or asymmetric distributions in the datasets. The study contributes to the literature by providing a quantitative analysis of SM data, for example tweets about Apple and its most popular products, the iPad and iPhone. It shows how SM data can be used for PA. It contributes to the literature on Big Data and SMM by showing how SM data can be collected, analysed and classified and explore if the share price of a company can be determined based on sentiment time series. It may ultimately lead to better decision making, for instance for investments or share buyback

    On the Mono- and Cross-Language Detection of Text Re-Use and Plagiarism

    Full text link
    Barrón Cedeño, LA. (2012). On the Mono- and Cross-Language Detection of Text Re-Use and Plagiarism [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/16012Palanci

    Hot topic detection in news blogs from the perspective of W2T

    Get PDF
    News blog hot topics are important for the information recommendation service and marketing. However, information overload and personalized management make the information arrangement more difficult. Moreover, what influences the formation and development of blog hot topics is seldom paid attention to. In order to correctly detect news blog hot topics, the paper first analyzes the development of topics in a new perspective based on W2T (Wisdom Web of Things) methodology. Namely, the characteristics of blog users, context of topic propagation and information granularity are unified to analyze the related problems. Some factors such as the user behavior pattern, network opinion and opinion leader are subsequently identified to be important for the development of topics. Then the topic model based on the view of event reports is constructed. At last, hot topics are identified by the duration, topic novelty, degree of topic growth and degree of user attention. The experimental results show that the proposed method is feasible and effective

    Collected Papers (on Neutrosophic Theory and Applications), Volume VI

    Get PDF
    This sixth volume of Collected Papers includes 74 papers comprising 974 pages on (theoretic and applied) neutrosophics, written between 2015-2021 by the author alone or in collaboration with the following 121 co-authors from 19 countries: Mohamed Abdel-Basset, Abdel Nasser H. Zaied, Abduallah Gamal, Amir Abdullah, Firoz Ahmad, Nadeem Ahmad, Ahmad Yusuf Adhami, Ahmed Aboelfetouh, Ahmed Mostafa Khalil, Shariful Alam, W. Alharbi, Ali Hassan, Mumtaz Ali, Amira S. Ashour, Asmaa Atef, Assia Bakali, Ayoub Bahnasse, A. A. Azzam, Willem K.M. Brauers, Bui Cong Cuong, Fausto Cavallaro, Ahmet Çevik, Robby I. Chandra, Kalaivani Chandran, Victor Chang, Chang Su Kim, Jyotir Moy Chatterjee, Victor Christianto, Chunxin Bo, Mihaela Colhon, Shyamal Dalapati, Arindam Dey, Dunqian Cao, Fahad Alsharari, Faruk Karaaslan, Aleksandra Fedajev, Daniela Gîfu, Hina Gulzar, Haitham A. El-Ghareeb, Masooma Raza Hashmi, Hewayda El-Ghawalby, Hoang Viet Long, Le Hoang Son, F. Nirmala Irudayam, Branislav Ivanov, S. Jafari, Jeong Gon Lee, Milena Jevtić, Sudan Jha, Junhui Kim, Ilanthenral Kandasamy, W.B. Vasantha Kandasamy, Darjan Karabašević, Songül Karabatak, Abdullah Kargın, M. Karthika, Ieva Meidute-Kavaliauskiene, Madad Khan, Majid Khan, Manju Khari, Kifayat Ullah, K. Kishore, Kul Hur, Santanu Kumar Patro, Prem Kumar Singh, Raghvendra Kumar, Tapan Kumar Roy, Malayalan Lathamaheswari, Luu Quoc Dat, T. Madhumathi, Tahir Mahmood, Mladjan Maksimovic, Gunasekaran Manogaran, Nivetha Martin, M. Kasi Mayan, Mai Mohamed, Mohamed Talea, Muhammad Akram, Muhammad Gulistan, Raja Muhammad Hashim, Muhammad Riaz, Muhammad Saeed, Rana Muhammad Zulqarnain, Nada A. Nabeeh, Deivanayagampillai Nagarajan, Xenia Negrea, Nguyen Xuan Thao, Jagan M. Obbineni, Angelo de Oliveira, M. Parimala, Gabrijela Popovic, Ishaani Priyadarshini, Yaser Saber, Mehmet Șahin, Said Broumi, A. A. Salama, M. Saleh, Ganeshsree Selvachandran, Dönüș Șengür, Shio Gai Quek, Songtao Shao, Dragiša Stanujkić, Surapati Pramanik, Swathi Sundari Sundaramoorthy, Mirela Teodorescu, Selçuk Topal, Muhammed Turhan, Alptekin Ulutaș, Luige Vlădăreanu, Victor Vlădăreanu, Ştefan Vlăduţescu, Dan Valeriu Voinea, Volkan Duran, Navneet Yadav, Yanhui Guo, Naveed Yaqoob, Yongquan Zhou, Young Bae Jun, Xiaohong Zhang, Xiao Long Xin, Edmundas Kazimieras Zavadskas

    Synthesis report

    Get PDF
    corecore