127 research outputs found

    Predictive Analysis on Twitter: Techniques and Applications

    Full text link
    Predictive analysis of social media data has attracted considerable attention from the research community as well as the business world because of the essential and actionable information it can provide. Over the years, extensive experimentation and analysis for insights have been carried out using Twitter data in various domains such as healthcare, public health, politics, social sciences, and demographics. In this chapter, we discuss techniques, approaches and state-of-the-art applications of predictive analysis of Twitter data. Specifically, we present fine-grained analysis involving aspects such as sentiment, emotion, and the use of domain knowledge in the coarse-grained analysis of Twitter data for making decisions and taking actions, and relate a few success stories

    Contribution to Financial Modeling and Financial Forecasting

    Get PDF
    This thesis consists of three chapters. Each chapter is independent research that is conducted during my study. This research is concentrated on financial time series modeling and forecasting. On first chapter, the research aims to prove that any abnormal behavior in debt level is a signal of future unexpected return for firms that is listed in indexes in this study, hence it is a signal to buy. In order to prove this theory multiple indexes from around the world were taken into consideration. This behavior is consistent in most of indexes around the word. The second chapter investigate the effect of United State president speech on value of United State Currency in Foreign Exchange Rate market. In this analysis it is shown that during the time the president is delivering a speech there is distinctive changes in USD value and volatility in global markets. This chapter implies that this effect cannot be captured by linear models, and the impact of the presidential speech is short term. Finally, the third chapter which is the major research of this thesis, suggest two new methods that potentially enhance the financial time series forecasting. Firstly, the new ARMA-RNN model is presented. The suggested model is inheriting the process of Autoregressive Moving Average model which is extensively studied, and train a recurrent neural network based on it to benefit from unique ability of ARMA model as well as strength and nonlinearity of artificial neural network. Secondly the research investigates the use of different frequency of data for input layer to predict the same data on output layer. In other words, artificial neural networks are trained on higher frequency data to predict lower frequency. Finally, both stated method is combined to achieve more superior predictive model

    On the Detection of False Information: From Rumors to Fake News

    Full text link
    Tesis por compendio[ES] En tiempos recientes, el desarrollo de las redes sociales y de las agencias de noticias han traído nuevos retos y amenazas a la web. Estas amenazas han llamado la atención de la comunidad investigadora en Procesamiento del Lenguaje Natural (PLN) ya que están contaminando las plataformas de redes sociales. Un ejemplo de amenaza serían las noticias falsas, en las que los usuarios difunden y comparten información falsa, inexacta o engañosa. La información falsa no se limita a la información verificable, sino que también incluye información que se utiliza con fines nocivos. Además, uno de los desafíos a los que se enfrentan los investigadores es la gran cantidad de usuarios en las plataformas de redes sociales, donde detectar a los difusores de información falsa no es tarea fácil. Los trabajos previos que se han propuesto para limitar o estudiar el tema de la detección de información falsa se han centrado en comprender el lenguaje de la información falsa desde una perspectiva lingüística. En el caso de información verificable, estos enfoques se han propuesto en un entorno monolingüe. Además, apenas se ha investigado la detección de las fuentes o los difusores de información falsa en las redes sociales. En esta tesis estudiamos la información falsa desde varias perspectivas. En primer lugar, dado que los trabajos anteriores se centraron en el estudio de la información falsa en un entorno monolingüe, en esta tesis estudiamos la información falsa en un entorno multilingüe. Proponemos diferentes enfoques multilingües y los comparamos con un conjunto de baselines monolingües. Además, proporcionamos estudios sistemáticos para los resultados de la evaluación de nuestros enfoques para una mejor comprensión. En segundo lugar, hemos notado que el papel de la información afectiva no se ha investigado en profundidad. Por lo tanto, la segunda parte de nuestro trabajo de investigación estudia el papel de la información afectiva en la información falsa y muestra cómo los autores de contenido falso la emplean para manipular al lector. Aquí, investigamos varios tipos de información falsa para comprender la correlación entre la información afectiva y cada tipo (Propaganda, Trucos / Engaños, Clickbait y Sátira). Por último, aunque no menos importante, en un intento de limitar su propagación, también abordamos el problema de los difusores de información falsa en las redes sociales. En esta dirección de la investigación, nos enfocamos en explotar varias características basadas en texto extraídas de los mensajes de perfiles en línea de tales difusores. Estudiamos diferentes conjuntos de características que pueden tener el potencial de ayudar a discriminar entre difusores de información falsa y verificadores de hechos.[CA] En temps recents, el desenvolupament de les xarxes socials i de les agències de notícies han portat nous reptes i amenaces a la web. Aquestes amenaces han cridat l'atenció de la comunitat investigadora en Processament de Llenguatge Natural (PLN) ja que estan contaminant les plataformes de xarxes socials. Un exemple d'amenaça serien les notícies falses, en què els usuaris difonen i comparteixen informació falsa, inexacta o enganyosa. La informació falsa no es limita a la informació verificable, sinó que també inclou informació que s'utilitza amb fins nocius. A més, un dels desafiaments als quals s'enfronten els investigadors és la gran quantitat d'usuaris en les plataformes de xarxes socials, on detectar els difusors d'informació falsa no és tasca fàcil. Els treballs previs que s'han proposat per limitar o estudiar el tema de la detecció d'informació falsa s'han centrat en comprendre el llenguatge de la informació falsa des d'una perspectiva lingüística. En el cas d'informació verificable, aquests enfocaments s'han proposat en un entorn monolingüe. A més, gairebé no s'ha investigat la detecció de les fonts o els difusors d'informació falsa a les xarxes socials. En aquesta tesi estudiem la informació falsa des de diverses perspectives. En primer lloc, atès que els treballs anteriors es van centrar en l'estudi de la informació falsa en un entorn monolingüe, en aquesta tesi estudiem la informació falsa en un entorn multilingüe. Proposem diferents enfocaments multilingües i els comparem amb un conjunt de baselines monolingües. A més, proporcionem estudis sistemàtics per als resultats de l'avaluació dels nostres enfocaments per a una millor comprensió. En segon lloc, hem notat que el paper de la informació afectiva no s'ha investigat en profunditat. Per tant, la segona part del nostre treball de recerca estudia el paper de la informació afectiva en la informació falsa i mostra com els autors de contingut fals l'empren per manipular el lector. Aquí, investiguem diversos tipus d'informació falsa per comprendre la correlació entre la informació afectiva i cada tipus (Propaganda, Trucs / Enganys, Clickbait i Sàtira). Finalment, però no menys important, en un intent de limitar la seva propagació, també abordem el problema dels difusors d'informació falsa a les xarxes socials. En aquesta direcció de la investigació, ens enfoquem en explotar diverses característiques basades en text extretes dels missatges de perfils en línia de tals difusors. Estudiem diferents conjunts de característiques que poden tenir el potencial d'ajudar a discriminar entre difusors d'informació falsa i verificadors de fets.[EN] In the recent years, the development of social media and online news agencies has brought several challenges and threats to the Web. These threats have taken the attention of the Natural Language Processing (NLP) research community as they are polluting the online social media platforms. One of the examples of these threats is false information, in which false, inaccurate, or deceptive information is spread and shared by online users. False information is not limited to verifiable information, but it also involves information that is used for harmful purposes. Also, one of the challenges that researchers have to face is the massive number of users in social media platforms, where detecting false information spreaders is not an easy job. Previous work that has been proposed for limiting or studying the issue of detecting false information has focused on understanding the language of false information from a linguistic perspective. In the case of verifiable information, approaches have been proposed in a monolingual setting. Moreover, detecting the sources or the spreaders of false information in social media has not been investigated much. In this thesis we study false information from several aspects. First, since previous work focused on studying false information in a monolingual setting, in this thesis we study false information in a cross-lingual one. We propose different cross-lingual approaches and we compare them to a set of monolingual baselines. Also, we provide systematic studies for the evaluation results of our approaches for better understanding. Second, we noticed that the role of affective information was not investigated in depth. Therefore, the second part of our research work studies the role of the affective information in false information and shows how the authors of false content use it to manipulate the reader. Here, we investigate several types of false information to understand the correlation between affective information and each type (Propaganda, Hoax, Clickbait, Rumor, and Satire). Last but not least, in an attempt to limit its spread, we also address the problem of detecting false information spreaders in social media. In this research direction, we focus on exploiting several text-based features extracted from the online profile messages of those spreaders. We study different feature sets that can have the potential to help to identify false information spreaders from fact checkers.Ghanem, BHH. (2020). On the Detection of False Information: From Rumors to Fake News [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/158570TESISCompendi

    Emotion classification and crowd source sensing; a lexicon based approach

    Get PDF
    In today's world, social media provides a valuable platform for conveying expressions, thoughts, point-of-views, and communication between people, from diverse walks of life. There are currently approximately 2.62 billion active users' social networks, and this is expected to exceed 3 billion users by 2021. Social networks used to share ideas and information, allowing interaction across communities, organizations, and so forth. Recent studies have found that the typical individual uses these platforms between 2 and 3 h a day. This creates a vast and rich source of data that can play a critical role in decision-making for companies, political campaigns, and administrative management and welfare. Twitter is one of the important players in the social network arena. Every scale of companies, celebrities, different types of organizations, and leaders use Twitter as an instrument for communicating and engaging with their followers. In this paper, we build upon the idea that Twitter data can be analyzed for crowd source sensing and decision-making. In this paper, a new framework is presented that uses Twitter data and performs crowd source sensing. For the proposed framework, real-time data are obtained and then analyzed for emotion classification using a lexicon-based approach. Previous work has found that weather, understandably, has an impact on mood, and we consider these effects on crowd mood. For the experiments, weather data are collected through an application-programming-interface in R and the impact of weather on human sentiments is analyzed. Visualizations of the data are presented and their usefulness for policy/decision makers in different applications is discussed

    Twitter and social bots : an analysis of the 2021 Canadian election

    Full text link
    Les médias sociaux sont désormais des outils de communication incontournables, notamment lors de campagnes électorales. La prévalence de l’utilisation de plateformes de communication en ligne suscite néanmoins des inquiétudes au sein des démocraties occidentales quant aux risques de manipulation des électeurs, notamment par le biais de robots sociaux. Les robots sociaux sont des comptes automatisés qui peuvent être utilisés pour produire ou amplifier le contenu en ligne tout en se faisant passer pour de réels utilisateurs. Certaines études, principalement axées sur le cas des États-Unis, ont analysé la propagation de contenus de désinformation par les robots sociaux en période électorale, alors que d’autres ont également examiné le rôle de l’affiliation partisane sur les comportements et les tactiques favorisées par les robots sociaux. Toutefois, la question à savoir si l'orientation partisane des robots sociaux a un impact sur la quantité de désinformation politique qu’ils propagent demeure sans réponse. Par conséquent, l’objectif principal de ce travail de recherche est de déterminer si des différences partisanes peuvent être observées dans (i) le nombre de robots sociaux actifs pendant la campagne électorale canadienne de 2021, (ii) leurs interactions avec les comptes réels, et (iii) la quantité de contenu de désinformation qu’ils ont propagé. Afin d’atteindre cet objectif de recherche, ce mémoire de maîtrise s’appuie sur un ensemble de données Twitter de plus de 11,3 millions de tweets en anglais provenant d’environ 1,1 million d'utilisateurs distincts, ainsi que sur divers modèles pour distinguer les comptes de robots sociaux des comptes humains, déterminer l’orientation partisane des utilisateurs et détecter le contenu de désinformation politique véhiculé. Les résultats de ces méthodes distinctes indiquent des différences limitées dans le comportement des robots sociaux lors des dernières élections fédérales. Il a tout de même été possible d'observer que les robots sociaux de tendance conservatrice étaient plus nombreux que leurs homologues de tendance libérale, mais que les robots sociaux d’orientation libérale étaient ceux qui ont interagi le plus avec les comptes authentiques par le biais de retweets et de réponses directes, et qui ont propagé le plus de contenu de désinformation.Social media have now become essential communication tools, including within the context of electoral campaigns. However, the prevalence of online communication platforms has raised concerns in Western democracies about the risks of voter manipulation, particularly through social bot accounts. Social bots are automated computer algorithms which can be used to produce or amplify online content while posing as authentic users. Some studies, mostly focused on the case of the United States, analyzed the propagation of disinformation content by social bots during electoral periods, while others have also examined the role of partisanship on social bots’ behaviors and activities. However, the question of whether social bots’ partisan-leaning impacts the amount of political disinformation content they generate online remains unanswered. Therefore, the main goal of this study is to determine whether partisan differences could be observed in (i) the number of active social bots during the 2021 Canadian election campaign, (ii) their interactions with humans, and (iii) the amount of disinformation content they propagated. In order to reach this research objective, this master’s thesis relies on an original Twitter dataset of more than 11.3 million English tweets from roughly 1.1 million distinct users, as well as diverse models to distinguish between social bot and human accounts, determine the partisan-leaning of users, and detect political disinformation content. Based on these distinct methods, the results indicate limited differences in the behavior of social bots in the 2021 federal election. It was however possible to observe that conservative-leaning social bots were more numerous than their liberal-leaning counterparts, but liberal-leaning accounts were those who interacted more with authentic accounts through retweets and replies and shared the most disinformation content

    Data science methods for the analysis of controversial social dedia discussions

    Get PDF
    Social media communities like Reddit and Twitter allow users to express their views on topics of their interest, and to engage with other users who may share or oppose these views. This can lead to productive discussions towards a consensus, or to contended debates, where disagreements frequently arise. Prior work on such settings has primarily focused on identifying notable instances of antisocial behavior such as hate-speech and “trolling”, which represent possible threats to the health of a community. These, however, are exceptionally severe phenomena, and do not encompass controversies stemming from user debates, differences of opinions, and off-topic content, all of which can naturally come up in a discussion without going so far as to compromise its development. This dissertation proposes a framework for the systematic analysis of social media discussions that take place in the presence of controversial themes, disagreements, and mixed opinions from participating users. For this, we develop a feature-based model to describe key elements of a discussion, such as its salient topics, the level of activity from users, the sentiments it expresses, and the user feedback it receives. Initially, we build our feature model to characterize adversarial discussions surrounding political campaigns on Twitter, with a focus on the factual and sentimental nature of their topics and the role played by different users involved. We then extend our approach to Reddit discussions, leveraging community feedback signals to define a new notion of controversy and to highlight conversational archetypes that arise from frequent and interesting interaction patterns. We use our feature model to build logistic regression classifiers that can predict future instances of controversy in Reddit communities centered on politics, world news, sports, and personal relationships. Finally, our model also provides the basis for a comparison of different communities in the health domain, where topics and activity vary considerably despite their shared overall focus. In each of these cases, our framework provides insight into how user behavior can shape a community’s individual definition of controversy and its overall identity.Social-Media Communities wie Reddit und Twitter ermöglichen es Nutzern, ihre Ansichten zu eigenen Themen zu äußern und mit anderen Nutzern in Kontakt zu treten, die diese Ansichten teilen oder ablehnen. Dies kann zu produktiven Diskussionen mit einer Konsensbildung führen oder zu strittigen Auseinandersetzungen über auftretende Meinungsverschiedenheiten. Frühere Arbeiten zu diesem Komplex konzentrierten sich in erster Linie darauf, besondere Fälle von asozialem Verhalten wie Hassrede und "Trolling" zu identifizieren, da diese eine Gefahr für die Gesprächskultur und den Wert einer Community darstellen. Die sind jedoch außergewöhnlich schwerwiegende Phänomene, die keinesfalls bei jeder Kontroverse auftreten die sich aus einfachen Diskussionen, Meinungsverschiedenheiten und themenfremden Inhalten ergeben. All diese Reibungspunkte können auch ganz natürlich in einer Diskussion auftauchen, ohne dass diese gleich den ganzen Gesprächsverlauf gefährden. Diese Dissertation stellt ein Framework für die systematische Analyse von Social-Media Diskussionen vor, die vornehmlich von kontroversen Themen, strittigen Standpunkten und Meinungsverschiedenheiten der teilnehmenden Nutzer geprägt sind. Dazu entwickeln wir ein Feature-Modell, um Schlüsselelemente einer Diskussion zu beschreiben. Dazu zählen der Aktivitätsgrad der Benutzer, die Wichtigkeit der einzelnen Aspekte, die Stimmung, die sie ausdrückt, und das Benutzerfeedback. Zunächst bauen wir unser Feature-Modell so auf, um bei Diskussionen gegensätzlicher politischer Kampagnen auf Twitter die oben genannten Schlüsselelemente zu bestimmen. Der Schwerpunkt liegt dabei auf den sachlichen und emotionalen Aspekten der Themen im Bezug auf die Rollen verschiedener Nutzer. Anschließend erweitern wir unseren Ansatz auf Reddit-Diskussionen und nutzen das Community-Feedback, um einen neuen Begriff der Kontroverse zu definieren und Konversationsarchetypen hervorzuheben, die sich aus Interaktionsmustern ergeben. Wir nutzen unser Feature-Modell, um ein Logistischer Regression Verfahren zu entwickeln, das zukünftige Kontroversen in Reddit-Communities in den Themenbereichen Politik, Weltnachrichten, Sport und persönliche Beziehungen vorhersagen kann. Schlussendlich bietet unser Modell auch die Grundlage für eine Vergleichbarkeit verschiedener Communities im Gesundheitsbereich, auch wenn dort die Themen und die Nutzeraktivität, trotz des gemeinsamen Gesamtfokus, erheblich variieren. In jedem der genannten Themenbereiche gibt unser Framework Erkenntnisgewinne, wie das Verhalten der Nutzer die spezifisch Definition von Kontroversen der Community prägt

    Towards the Understanding of Private Content – Content-based Privacy Assessment and Protection in Social Networks

    Get PDF
    In the wake of the Facebook data breach scandal, users begin to realize how vulnerable their per-sonal data is and how blindly they trust the online social networks (OSNs) by giving them an inordinate amount of private data that touch on unlimited areas of their lives. In particular, stud-ies show that users sometimes reveal too much information or unintentionally release regretful messages, especially when they are careless, emotional, or unaware of privacy risks. Additionally, friends on social media platforms are also found to be adversarial and may leak one’s private in-formation. Threats from within users’ friend networks – insider threats by human or bots – may be more concerning because they are much less likely to be mitigated through existing solutions, e.g., the use of privacy settings. Therefore, we argue that the key component of privacy protection in social networks is protecting sensitive/private content, i.e. privacy as having the ability to control dissemination of information. A mechanism to automatically identify potentially sensitive/private posts and alert users before they are posted is urgently needed. In this dissertation, we propose a context-aware, text-based quantitative model for private in-formation assessment, namely PrivScore, which is expected to serve as the foundation of a privacy leakage alerting mechanism. We first explicitly research and study topics that might contain private content. Based on this knowledge, we solicit diverse opinions on the sensitiveness of private infor-mation from crowdsourcing workers, and examine the responses to discover a perceptual model behind the consensuses and disagreements. We then develop a computational scheme using deep neural networks to compute a context-free PrivScore (i.e., the “consensus” privacy score among average users). Finally, we integrate tweet histories, topic preferences and social contexts to gener-ate a personalized context-aware PrivScore. This privacy scoring mechanism could be employed to identify potentially-private messages and alert users to think again before posting them to OSNs. It could also benefit non-human users such as social media chatbots

    14th Conference on DATA ANALYSIS METHODS for Software Systems

    Get PDF
    DAMSS-2023 is the 14th International Conference on Data Analysis Methods for Software Systems, held in Druskininkai, Lithuania. Every year at the same venue and time. The exception was in 2020, when the world was gripped by the Covid-19 pandemic and the movement of people was severely restricted. After a year’s break, the conference was back on track, and the next conference was successful in achieving its primary goal of lively scientific communication. The conference focuses on live interaction among participants. For better efficiency of communication among participants, most of the presentations are poster presentations. This format has proven to be highly effective. However, we have several oral sections, too. The history of the conference dates back to 2009 when 16 papers were presented. It began as a workshop and has evolved into a well-known conference. The idea of such a workshop originated at the Institute of Mathematics and Informatics, now the Institute of Data Science and Digital Technologies of Vilnius University. The Lithuanian Academy of Sciences and the Lithuanian Computer Society supported this idea, which gained enthusiastic acceptance from both the Lithuanian and international scientific communities. This year’s conference features 84 presentations, with 137 registered participants from 11 countries. The conference serves as a gathering point for researchers from six Lithuanian universities, making it the main annual meeting for Lithuanian computer scientists. The primary aim of the conference is to showcase research conducted at Lithuanian and foreign universities in the fields of data science and software engineering. The annual organization of the conference facilitates the rapid exchange of new ideas within the scientific community. Seven IT companies supported the conference this year, indicating the relevance of the conference topics to the business sector. In addition, the conference is supported by the Lithuanian Research Council and the National Science and Technology Council (Taiwan, R. O. C.). The conference covers a wide range of topics, including Applied Mathematics, Artificial Intelligence, Big Data, Bioinformatics, Blockchain Technologies, Business Rules, Software Engineering, Cybersecurity, Data Science, Deep Learning, High-Performance Computing, Data Visualization, Machine Learning, Medical Informatics, Modelling Educational Data, Ontological Engineering, Optimization, Quantum Computing, Signal Processing. This book provides an overview of all presentations from the DAMSS-2023 conference
    corecore