822 research outputs found

    Stance detection on social media: State of the art and trends

    Get PDF
    Stance detection on social media is an emerging opinion mining paradigm for various social and political applications in which sentiment analysis may be sub-optimal. There has been a growing research interest for developing effective methods for stance detection methods varying among multiple communities including natural language processing, web science, and social computing. This paper surveys the work on stance detection within those communities and situates its usage within current opinion mining techniques in social media. It presents an exhaustive review of stance detection techniques on social media, including the task definition, different types of targets in stance detection, features set used, and various machine learning approaches applied. The survey reports state-of-the-art results on the existing benchmark datasets on stance detection, and discusses the most effective approaches. In addition, this study explores the emerging trends and different applications of stance detection on social media. The study concludes by discussing the gaps in the current existing research and highlights the possible future directions for stance detection on social media.Comment: We request withdrawal of this article sincerely. We will re-edit this paper. Please withdraw this article before we finish the new versio

    Academic Writing and the Pedagogical Practices of Effective Teachers

    Get PDF
    Composition, particularly when academic register is required, is a complex task. Because cognitive flexibility theory explains how humans can spontaneously restructure knowledge and adapt to situational demands, it is ideally suited to the ill-structured domain of transactional writing. Global aspects related to paragraph and whole-text structure and local operations related to word and sentence-level features define academic writing. A mixed-methods design used quantitative methods for investigation of five corpora of 10th grade students\u27 work. Qualitative methods were used to explore the means teachers used in promoting academic writing and the interactions they intended to promote via teaching cues, including prompts. Students\u27 perceptions were similarly explored for contrastive purposes. Descriptive statistical and qualitative analysis of five corpora of student writing samples, high school exit exam results, surveys of students and teachers, and interviews with students and teachers were employed. This study suggests that interaction with students, while they compose, is critical to successful academic writing on the part of students. Systems are slow to change; however, this study may provide some models and descriptions of successful performance needed to encourage teachers and school systems to improve practice and academic outcomes in writing and content areas that include writing as a means of learning and assessment. Increased instructional precision may be of more value than simple prescription. Results suggest that cross-disciplinary activities may improve the uptake of academic words found on an academic word list. In addition, the type and quality of the prompts or directions for writing students are given affect the quality of students\u27 written work. As well, students and teachers valued the cues and oral feedback provided on drafts of student compositions. The results of this study suggest that when students are provided a contextually rich environment, challenging writing tasks, and support with appropriate cues, they may succeed as writers and thinkers about complex topics within and across disciplines

    Comparative Summarization of Document Collections

    Get PDF
    Comparing documents is an important task that help us in understanding the differences between documents. Example of document comparisons include comparing laws on same related subject matter in different jurisdictions, comparing the specifications of similar product from different manufacturers. One can see that the need for comparison does not stop at individual documents, and extends to large collections of documents. For example comparing the writing styles of an author early vs late in their life, identifying linguistic and lexical patterns of different political ideologies, or discover commonalities of political arguments in disparate events. Comparing large document collections calls for automated algorithms to do so. Every day a huge volume of documents are produced in social and news media. There has been a lot of research in summarizing individual document such as a news article, or document collections such as a collection of news articles on a related topic or event. However, comparatively summarizing different document collections, or comparative summarization is under-explored problem in terms of methodology, datasets, evaluations and applicability in different domains. To address this, in this thesis, we make three types of contributions to comparative summarization, methodology, datasets and evaluation, and empirical measurements on a range of settings where comparative summarization can be applied. We propose a new formulation the problem of comparative summarization as competing binary classifiers. This formulation help us to develop new unsupervised and supervised methods for comparative summarization. Our methods are based on Maximum Mean Discrepancy (MMD), a metric that measures the distance between two sets of datapoints (or documents). The unsupervised methods incorporate information coverage, information diversity and discriminativeness of the prototypes based on global-model of sentence-sentence similarity, and be optimized with greedy and gradient methods. We show the efficacy of the approach in summarizing a long running news topic over time. Our supervised method improves the unsupervised methods, and can learn the importance of prototypes based on surface features (e.g., position, length, presence of cue words) and combine different text feature representations. Our supervised method meets or exceeds the state-of-the-art performance in benchmark datasets. We design new scalable automatic and crowd-sourced extrinsic evaluations of comparative summaries when human written ground truth summaries are not available. To evaluate our methods, we develop two new datasets on controversial news topics -- CONTROVNEWS2017 and NEWS2019+BIAS datasets which we use in different experiments. We use CONTROVNEWS2017, which consists of news articles on controversial topics to evaluate our unsupervised methods in summarizing over time. We use NEWS2019+BIAS, which again consists of news articles on controversial news topics, along with media bias labels to empirically study the applicability of methods. Finally, we measure the distinguishability and summarizability of document collections to quantify the applicability of our methods in different domains. We measure these metrics in a newly curated NEWS2019+BIAS dataset in comparing articles over time, and across ideological leanings of media outlets. First, we observe that the summarizability is proportional to the distinguishability, and identify the groups of articles that are less or more distinguishable.Second, better distinguishability and summarizability is amenable to the choice of document representations according to the comparisons we make, either over time, or across ideological leanings of media outlets. We also apply the comparative summarization method to the task of comparing stances in the social media domain

    Attention-based Approaches for Text Analytics in Social Media and Automatic Summarization

    Full text link
    [ES] Hoy en día, la sociedad tiene acceso y posibilidad de contribuir a grandes cantidades de contenidos presentes en Internet, como redes sociales, periódicos online, foros, blogs o plataformas de contenido multimedia. Todo este tipo de medios han tenido, durante los últimos años, un impacto abrumador en el día a día de individuos y organizaciones, siendo actualmente medios predominantes para compartir, debatir y analizar contenidos online. Por este motivo, resulta de interés trabajar sobre este tipo de plataformas, desde diferentes puntos de vista, bajo el paraguas del Procesamiento del Lenguaje Natural. En esta tesis nos centramos en dos áreas amplias dentro de este campo, aplicadas al análisis de contenido en línea: análisis de texto en redes sociales y resumen automático. En paralelo, las redes neuronales también son un tema central de esta tesis, donde toda la experimentación se ha realizado utilizando enfoques de aprendizaje profundo, principalmente basados en mecanismos de atención. Además, trabajamos mayoritariamente con el idioma español, por ser un idioma poco explorado y de gran interés para los proyectos de investigación en los que participamos. Por un lado, para el análisis de texto en redes sociales, nos enfocamos en tareas de análisis afectivo, incluyendo análisis de sentimientos y detección de emociones, junto con el análisis de la ironía. En este sentido, se presenta un enfoque basado en Transformer Encoders, que consiste en contextualizar \textit{word embeddings} pre-entrenados con tweets en español, para abordar tareas de análisis de sentimiento y detección de ironía. También proponemos el uso de métricas de evaluación como funciones de pérdida, con el fin de entrenar redes neuronales, para reducir el impacto del desequilibrio de clases en tareas \textit{multi-class} y \textit{multi-label} de detección de emociones. Adicionalmente, se presenta una especialización de BERT tanto para el idioma español como para el dominio de Twitter, que tiene en cuenta la coherencia entre tweets en conversaciones de Twitter. El desempeño de todos estos enfoques ha sido probado con diferentes corpus, a partir de varios \textit{benchmarks} de referencia, mostrando resultados muy competitivos en todas las tareas abordadas. Por otro lado, nos centramos en el resumen extractivo de artículos periodísticos y de programas televisivos de debate. Con respecto al resumen de artículos, se presenta un marco teórico para el resumen extractivo, basado en redes jerárquicas siamesas con mecanismos de atención. También presentamos dos instancias de este marco: \textit{Siamese Hierarchical Attention Networks} y \textit{Siamese Hierarchical Transformer Encoders}. Estos sistemas han sido evaluados en los corpora CNN/DailyMail y NewsRoom, obteniendo resultados competitivos en comparación con otros enfoques extractivos coetáneos. Con respecto a los programas de debate, se ha propuesto una tarea que consiste en resumir las intervenciones transcritas de los ponentes, sobre un tema determinado, en el programa "La Noche en 24 Horas". Además, se propone un corpus de artículos periodísticos, recogidos de varios periódicos españoles en línea, con el fin de estudiar la transferibilidad de los enfoques propuestos, entre artículos e intervenciones de los participantes en los debates. Este enfoque muestra mejores resultados que otras técnicas extractivas, junto con una transferibilidad de dominio muy prometedora.[CA] Avui en dia, la societat té accés i possibilitat de contribuir a grans quantitats de continguts presents a Internet, com xarxes socials, diaris online, fòrums, blocs o plataformes de contingut multimèdia. Tot aquest tipus de mitjans han tingut, durant els darrers anys, un impacte aclaparador en el dia a dia d'individus i organitzacions, sent actualment mitjans predominants per compartir, debatre i analitzar continguts en línia. Per aquest motiu, resulta d'interès treballar sobre aquest tipus de plataformes, des de diferents punts de vista, sota el paraigua de l'Processament de el Llenguatge Natural. En aquesta tesi ens centrem en dues àrees àmplies dins d'aquest camp, aplicades a l'anàlisi de contingut en línia: anàlisi de text en xarxes socials i resum automàtic. En paral·lel, les xarxes neuronals també són un tema central d'aquesta tesi, on tota l'experimentació s'ha realitzat utilitzant enfocaments d'aprenentatge profund, principalment basats en mecanismes d'atenció. A més, treballem majoritàriament amb l'idioma espanyol, per ser un idioma poc explorat i de gran interès per als projectes de recerca en els que participem. D'una banda, per a l'anàlisi de text en xarxes socials, ens enfoquem en tasques d'anàlisi afectiu, incloent anàlisi de sentiments i detecció d'emocions, juntament amb l'anàlisi de la ironia. En aquest sentit, es presenta una aproximació basada en Transformer Encoders, que consisteix en contextualitzar \textit{word embeddings} pre-entrenats amb tweets en espanyol, per abordar tasques d'anàlisi de sentiment i detecció d'ironia. També proposem l'ús de mètriques d'avaluació com a funcions de pèrdua, per tal d'entrenar xarxes neuronals, per reduir l'impacte de l'desequilibri de classes en tasques \textit{multi-class} i \textit{multi-label} de detecció d'emocions. Addicionalment, es presenta una especialització de BERT tant per l'idioma espanyol com per al domini de Twitter, que té en compte la coherència entre tweets en converses de Twitter. El comportament de tots aquests enfocaments s'ha provat amb diferents corpus, a partir de diversos \textit{benchmarks} de referència, mostrant resultats molt competitius en totes les tasques abordades. D'altra banda, ens centrem en el resum extractiu d'articles periodístics i de programes televisius de debat. Pel que fa a l'resum d'articles, es presenta un marc teòric per al resum extractiu, basat en xarxes jeràrquiques siameses amb mecanismes d'atenció. També presentem dues instàncies d'aquest marc: \textit{Siamese Hierarchical Attention Networks} i \textit{Siamese Hierarchical Transformer Encoders}. Aquests sistemes s'han avaluat en els corpora CNN/DailyMail i Newsroom, obtenint resultats competitius en comparació amb altres enfocaments extractius coetanis. Pel que fa als programes de debat, s'ha proposat una tasca que consisteix a resumir les intervencions transcrites dels ponents, sobre un tema determinat, al programa "La Noche en 24 Horas". A més, es proposa un corpus d'articles periodístics, recollits de diversos diaris espanyols en línia, per tal d'estudiar la transferibilitat dels enfocaments proposats, entre articles i intervencions dels participants en els debats. Aquesta aproximació mostra millors resultats que altres tècniques extractives, juntament amb una transferibilitat de domini molt prometedora.[EN] Nowadays, society has access, and the possibility to contribute, to large amounts of the content present on the internet, such as social networks, online newspapers, forums, blogs, or multimedia content platforms. These platforms have had, during the last years, an overwhelming impact on the daily life of individuals and organizations, becoming the predominant ways for sharing, discussing, and analyzing online content. Therefore, it is very interesting to work with these platforms, from different points of view, under the umbrella of Natural Language Processing. In this thesis, we focus on two broad areas inside this field, applied to analyze online content: text analytics in social media and automatic summarization. Neural networks are also a central topic in this thesis, where all the experimentation has been performed by using deep learning approaches, mainly based on attention mechanisms. Besides, we mostly work with the Spanish language, due to it is an interesting and underexplored language with a great interest in the research projects we participated in. On the one hand, for text analytics in social media, we focused on affective analysis tasks, including sentiment analysis and emotion detection, along with the analysis of the irony. In this regard, an approach based on Transformer Encoders, based on contextualizing pretrained Spanish word embeddings from Twitter, to address sentiment analysis and irony detection tasks, is presented. We also propose the use of evaluation metrics as loss functions, in order to train neural networks for reducing the impact of the class imbalance in multi-class and multi-label emotion detection tasks. Additionally, a specialization of BERT both for the Spanish language and the Twitter domain, that takes into account inter-sentence coherence in Twitter conversation flows, is presented. The performance of all these approaches has been tested with different corpora, from several reference evaluation benchmarks, showing very competitive results in all the tasks addressed. On the other hand, we focused on extractive summarization of news articles and TV talk shows. Regarding the summarization of news articles, a theoretical framework for extractive summarization, based on siamese hierarchical networks with attention mechanisms, is presented. Also, we present two instantiations of this framework: Siamese Hierarchical Attention Networks and Siamese Hierarchical Transformer Encoders. These systems were evaluated on the CNN/DailyMail and the NewsRoom corpora, obtaining competitive results in comparison to other contemporary extractive approaches. Concerning the TV talk shows, we proposed a text summarization task, for summarizing the transcribed interventions of the speakers, about a given topic, in the Spanish TV talk shows of the ``La Noche en 24 Horas" program. In addition, a corpus of news articles, collected from several Spanish online newspapers, is proposed, in order to study the domain transferability of siamese hierarchical approaches, between news articles and interventions of debate participants. This approach shows better results than other extractive techniques, along with a very promising domain transferability.González Barba, JÁ. (2021). Attention-based Approaches for Text Analytics in Social Media and Automatic Summarization [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/172245TESI

    Text Analysis of Air Force References in Twitter

    Get PDF
    Social media has grown to become a rich source for opinions, authored by individuals who volunteer them, unedited and in real-time. Armed with this information, an organization like the Air Force can understand the perceptions of consumers and learn to better serve the American taxpayer. To accomplish this goal, this research takes a qualitative approach, utilizing social media analytics in combination with various Text Mining methodologies (word frequency, word relationships, sentiment analysis, topic modeling) to provide insight on Air Force related content shared on Twitter. To provide a well-rounded analysis of the overall perception of the Air Force enterprise, the methods mentioned are conducted on Tweets related to the Air Force’s five core missions: Space/Cyberspace, Nuclear Deterrence, Air Superiority, Advancements in Technology, and Intelligence, Surveillance, Reconnaissance. This research also identifies the key players that publish the most engaged Tweets related to the Air Force. By understanding the types of users who possess the most influence (Regular Users, Bloggers, Celebrities, Military Leaders, Politicians, Professional Organizations), Air Force leaders are better equipped to react to content and protect the Air Force brand

    Mining user viewpoints in online discussions

    Get PDF
    corecore