8 research outputs found

    DEVELOPING A REAL-TIME DATA ANALYTICS FRAMEWORK FOR TWITTER STREAMING DATA

    Get PDF
    Twitter is an online social networking service with more than 300 million users, generating a huge amount of information every day. Twitter's most important characteristic is its ability for users to tweet about events, situations, feelings, opinions, or even something totally new, in real time. Currently there are different workflows offering real-time data analysis for Twitter, presenting general processing over streaming data. This study will attempt to develop an analytical framework with the ability of in-memory processing to extract and analyze structured and unstructured Twitter data. The proposed framework includes data ingestion and stream processing and data visualization components with the Apache Kafka messaging system that is used to perform data ingestion task. Furthermore, Spark makes it possible to perform sophisticated data processing and machine learning algorithms in real time. We have conducted a case study on tweets about the earthquake in Japan and the reactions of people around the world with analysis on the time and origin of the tweets

    Towards an automated journalism framework for social data monitoring

    Get PDF
    Presented at: Nordic AI young researcher symposium, Oslo, 14.11. - 15.11.22News and information dissemination have long been a vital human practice. Concurrent with the traditional media channels such as radio and television, online social networks (OSNs), are regarded as the new generation of media that seem to have the ability to compete with traditional media. Millions of individuals around the world can communicate breaking news on social media platforms during the hours after midnight. The spread of misinformation and disinformation aside, the process of publishing news on OSNs, to a very good extent, happens more openly and unbiasedly. Automated journalism or according to [1] “the auto generation of journalistic stories through software and algorithms, without any human input”, can be used in newsrooms to supplement or replace traditional journalism in a variety of ways, such as providing real-time reporting of events or generating stories from data that would be otherwise difficult to mine. Due to their real-time and open nature, OSNs, particularly Twitter, are among the greatest candidate data sources to be explored in this context. MediaFutures, Centre for Research-Based Innovation (SFI), is a research centre in Bergen, Norway, which is a consortium of the most important media players in Norway and beyond. The centre is hosted and lead by the University of Bergen’s Department of Information Science and Media Studies. In this research, in collaboration with MediaFuture SFI, we are developing a platform that can assist journalists in newsrooms in real time and enables them to easily obtain and monitor their desired newsworthy content from the mass volume of unverified content from Twitter platform. AI techniques have been applied for analysing social media data but many of them do not function in real time. In MediaFuture SFI we are involved in developing innovative tools which could be used by the journalists in the newsroom daily, secondly, most of prior works, either focus on collecting, filtering, and analysing tweets using predefined metrics [2] (such as number of replies, likes, etc.) or are only focused on analysing tweets’ content [3][4]. Considering the lack of a comprehensive framework suited to the needs of journalists, we present our own visual analytical framework that is not only based on information retrieval from Twitter but also enriched by machine learning and network science. In this work, we intend to use state of the art techniques such as community detection, influential node identification and monitoring, fake news, deepfake and cheapfake detection, etc

    An adaptive clustering and classification algorithm for Twitter data streaming in Apache Spark

    Get PDF
    On-going big data from social networks sites alike Twitter or Facebook has been an entrancing hotspot for investigation by researchers in current decades as a result of various aspects including up-to-date-ness, accessibility and popularity; however anyway there may be a trade off in accuracy. Moreover, clustering of twitter data has caught the attention of researchers. As such, an algorithm which can cluster data within a lesser computational time, especially for data streaming is needed. The presented adaptive clustering and classification algorithm is used for data streaming in Apache spark to overcome the existing problems is processed in two phases. In the first phase, the input pre-processed twitter data is viably clustered utilizing an Improved Fuzzy C-means clustering and the proposed clustering is additionally improved by an Adaptive Particle swarm optimization (PSO) algorithm. Further the clustered data streaming is assessed utilizing spark engine. In the second phase, the input pre-processed Higgs data is classified utilizing the modified support vector machine (MSVM) classifier with grid search optimization. At long last the optimized information is assessed in spark engine and the assessed esteem is utilized to discover an accomplished confusion matrix. The proposed work is utilizing Twitter dataset and Higgs dataset for the data streaming in Apache Spark. The computational examinations exhibit the superiority ofpresented approach comparing with the existing methods in terms of precision, recall, F-score, convergence, ROC curve and accuracy

    Benefits of using data mining techniques to extract and analyze Twitter data for higher education applications: a systematic literature review

    Get PDF
    In recent years, there has been a growing interest by education actors to include TIC in their institutions; as well as social networks, far from being a problem and their use aimed, permit innovate traditional classes and improve communication between teachers and students This study has two objectives: (1) conduct a systematic literature review through searching papers published between January/2007 and March/2019 in data bases like as ACM, IEEE, ScienceDirect, Springer and others, to evidence researches that apply data mining techniques to extract and analyze Twitters data in higher education; and (2) to emphasize pedagogic practices that include Twitter and data mining to improve education process. From 315 papers obtained, only 65 fulfilled inclusion criteria. The main results indicate that: (1) the most used data mining techniques are predictive with classification tasks; (2) Twitter is principally used to: (a) determinate perception; (b) share information, materials and resources; (c) generate communication and participation; (d) promote abilities and (e) improve oral expression and academic performance; (3) United States has the most numbers of researches in this area; however, in Latin-American countries findings are not enough, so, there a new area to investigate in this region and (4) researches used models, methods, strategies, theories and instruments as a pedagogic practice; so that, there wasn’t an agreement about a shape to include Twitter data extracting in higher education to improve teaching and learning process.En los últimos años, existe un creciente interés por los actores de la educación en la inclusión de las TIC en sus instituciones, como es el caso de las redes sociales, que lejos de ser un problema y mediante un uso guiado de las mismas, permiten innovar las sesiones de clases tradicionales y mejorar la comunicación entre docentes y estudiantes. En el presente estudio se plantearon dos objetivos: (1) realizar una revisión sistemática de la literatura, mediante la búsqueda de artículos publicados entre Enero/2007 y Marzo/2019, en bases de datos como ACM, IEEE, ScienceDirect, Springer, entre otras, para identificar las investigaciones que han aplicado técnicas de minería de datos, para la extracción y análisis de datos de Twitter en la educación superior; y, (2) destacar las prácticas pedagógicas que han incorporado Twitter y minería de datos para mejorar los procesos educativos. De los 315 artículos obtenidos, fueron seleccionados 65 que cumplieron con los criterios de inclusión. Los principales resultados indican que: (1) las técnicas de minería de datos más utilizadas son predictivas con tareas de clasificación; (2) Twitter se usa principalmente para: (a) determinar percepción estudiantil; (b) compartir información, material y recursos; (c) generar comunicación y participación; (d) fomentar habilidades; y (e) mejorar la expresión oral y el rendimiento académico; (3) Estados Unidos es el país con mayor número de trabajos; sin embargo, en países de Latinoamérica los hallazgos son pocos, por lo que, se apertura un campo de investigación en esta región; y (4) los estudios incluyeron modelos, métodos, estrategias, teorías o instrumentos como práctica pedagógica; de modo que, no existe un consenso en la forma en que los datos extraídos de Twitter podrían ser incorporados en la educación superior para mejorar los procesos de enseñanza y aprendizaje

    Benefits of using data mining techniques to extract and analyze Twitter data for higher education applications: a systematic literature review

    Get PDF
    En los últimos años, existe un creciente interés por los actores de la educación en la inclusión de las TIC en sus instituciones, como es el caso de las redes sociales, que lejos de ser un problema y mediante un uso guiado de las mismas, permiten innovar las sesiones de clases tradicionales y mejorar la comunicación entre docentes y estudiantes. En el presente estudio se plantearon dos objetivos: (1) realizar una revisión sistemática de la literatura, mediante la búsqueda de artículos publicados entre enero/2007 y marzo/2019, en bases de datos como ACM, IEEE, ScienceDirect, Springer, entre otras, para identificar las investigaciones que han aplicado técnicas de minería de datos, para la extracción y análisis de datos de Twitter en la educación superior; y, (2) destacar las prácticas pedagógicas que han incorporado Twitter y minería de datos para mejorar los procesos educativos. De los 315 artículos obtenidos, fueron seleccionados 65 que cumplieron con los criterios de inclusión. Los principales resultados indican que: (1) las técnicas de minería de datos más utilizadas son predictivas con tareas de clasificación; (2) Twitter se usa principalmente para: (a) determinar percepción estudiantil; (b) compartir información, material y recursos; (c) generar comunicación y participación; (d) fomentar habilidades; y (e) mejorar la expresión oral y el rendimiento académico; (3) Estados Unidos es el país con mayor número de trabajos; sin embargo, en países de Latinoamérica los hallazgos son pocos, por lo que se apertura un campo de investigación en esta región; y (4) los estudios incluyeron modelos, métodos, estrategias, teorías o instrumentos como práctica pedagógica; de modo que no existe un consenso en la forma en que los datos extraídos de Twitter podrían ser incorporados en la educación superior para mejorar los procesos de enseñanza y aprendizaje.In recent years, there has been a growing interest by education actors to include TIC in their institutions; as well as social networks, far from being a problem and their use aimed, permit innovate traditional classes and improve communication between teachers and students This study has two objectives: (1) conduct a systematic literature review through searching papers published between January/2007 and March/2019 in data bases like as ACM, IEEE, ScienceDirect, Springer and others, to evidence researches that apply data mining techniques to extract and analyze Twitters data in higher education; and (2) to emphasize pedagogic practices that include Twitter and data mining to improve education process. From 315 papers obtained, only 65 fulfilled inclusion criteria. The main results indicate that: (1) the most used data mining techniques are predictive with classification tasks; (2) Twitter is principally used to: (a) determinate perception; (b) share information, materials and resources; (c) generate communication and participation; (d) promote abilities and (e) improve oral expression and academic performance; (3) United States has the most numbers of researches in this area; however, in Latin-American countries findings are not enough, so, there a new area to investigate in this region and (4) researches used models, methods, strategies, theories and instruments as a pedagogic practice; so that, there wasn’t an agreement about a shape to include Twitter data extracting in higher education to improve teaching and learning process.Instituto de Investigación en Informátic

    A Comparison of Real Time Stream Processing Frameworks

    Get PDF
    The need to process the ever-expanding volumes of information being generated daily in the modern world is driving radical changes in traditional data analysis techniques. As a result of this, a number of open source tools for handling real-time data streams has become available in recent years. Four, in particular, have gained significant traction: Apache Flink, Apache Samza, Apache Spark and Apache Storm. Despite the rising popularity of these frameworks, however, there are few studies that analyse their performance in terms of important metrics, such as throughput and latency. This study aims to correct this, by running several benchmarks against these frameworks

    DEVELOPING A REAL-TIME DATA ANALYTICS FRAMEWORK FOR TWITTER STREAMING DATA

    No full text
    Twitter is an online social networking service with more than 300 million users, generating a huge amount of information every day. Twitter's most important characteristic is its ability for users to tweet about events, situations, feelings, opinions, or even something totally new, in real time. Currently there are different workflows offering real-time data analysis for Twitter, presenting general processing over streaming data. This study will attempt to develop an analytical framework with the ability of in-memory processing to extract and analyze structured and unstructured Twitter data. The proposed framework includes data ingestion and stream processing and data visualization components with the Apache Kafka messaging system that is used to perform data ingestion task. Furthermore, Spark makes it possible to perform sophisticated data processing and machine learning algorithms in real time. We have conducted a case study on tweets about the earthquake in Japan and the reactions of people around the world with analysis on the time and origin of the tweets

    Developing a Real-Time Data Analytics Framework for Twitter Streaming Data

    No full text
    corecore