3,109 research outputs found
Statistical analysis of grouped text documents
L'argomento di questa tesi sono i modelli statistici per l'analisi dei dati testuali, con particolare attenzione ai contesti in cui i campioni di testo sono raggruppati.
Quando si ha a che fare con dati testuali, il primo problema è quello di elaborarli, per renderli compatibili dal punto di vista computazionale e metodologico con i metodi matematici e statistici prodotti e continuamente sviluppati dalla comunità scientifica. Per questo motivo, la tesi passa in rassegna i metodi esistenti per la rappresentazione analitica e l'elaborazione di campioni di dati testuali, compresi i "Vector Space Models", le "rappresentazioni distribuite" di parole e documenti e i "contextualized embeddings". Questa rassegna comporta la standardizzazione di una notazione che, anche all'interno dello stesso approccio di rappresentazione, appare molto eterogenea in letteratura.
Vengono poi esplorati due domini di applicazione: i social media e il turismo culturale. Per quanto riguarda il primo, viene proposto uno studio sull'autodescrizione di gruppi diversi di individui sulla piattaforma StockTwits, dove i mercati finanziari sono gli argomenti dominanti. La metodologia proposta ha integrato diversi tipi di dati, sia testuali che variabili categoriche. Questo studio ha agevolato la comprensione sul modo in cui le persone si presentano online e ha trovato stutture di comportamento ricorrenti all'interno di gruppi di utenti.
Per quanto riguarda il turismo culturale, la tesi approfondisce uno studio condotto nell'ambito del progetto "Data Science for Brescia - Arts and Cultural Places", in cui è stato addestrato un modello linguistico per classificare le recensioni online scritte in italiano in quattro aree semantiche distinte relative alle attrazioni culturali della città di Brescia. Il modello proposto permette di identificare le attrazioni nei documenti di testo, anche quando non sono esplicitamente menzionate nei metadati del documento, aprendo così la possibilità di espandere il database relativo a queste attrazioni culturali con nuove fonti, come piattaforme di social media, forum e altri spazi online.
Infine, la tesi presenta uno studio metodologico che esamina la specificità di gruppo delle parole, analizzando diversi stimatori di specificità di gruppo proposti in letteratura. Lo studio ha preso in considerazione documenti testuali raggruppati con variabile di "outcome" e variabile di gruppo. Il suo contributo consiste nella proposta di modellare il corpus di documenti come una distribuzione multivariata, consentendo la simulazione di corpora di documenti di testo con caratteristiche predefinite. La simulazione ha fornito preziose indicazioni sulla relazione tra gruppi di documenti e parole. Inoltre, tutti i risultati possono essere liberamente esplorati attraverso un'applicazione web, i cui componenti sono altresì descritti in questo manoscritto.
In conclusione, questa tesi è stata concepita come una raccolta di studi, ognuno dei quali suggerisce percorsi di ricerca futuri per affrontare le sfide dell'analisi dei dati testuali raggruppati.The topic of this thesis is statistical models for the analysis of textual data, emphasizing contexts in which text samples are grouped.
When dealing with text data, the first issue is to process it, making it computationally and methodologically compatible with the existing mathematical and statistical methods produced and continually developed by the scientific community. Therefore, the thesis firstly reviews existing methods for analytically representing and processing textual datasets, including Vector Space Models, distributed representations of words and documents, and contextualized embeddings. It realizes this review by standardizing a notation that, even within the same representation approach, appears highly heterogeneous in the literature.
Then, two domains of application are explored: social media and cultural tourism. About the former, a study is proposed about self-presentation among diverse groups of individuals on the StockTwits platform, where finance and stock markets are the dominant topics. The methodology proposed integrated various types of data, including textual and categorical data. This study revealed insights into how people present themselves online and found recurring patterns within groups of users.
About the latter, the thesis delves into a study conducted as part of the "Data Science for Brescia - Arts and Cultural Places" Project, where a language model was trained to classify Italian-written online reviews into four distinct semantic areas related to cultural attractions in the Italian city of Brescia. The model proposed allows for the identification of attractions in text documents, even when not explicitly mentioned in document metadata, thus opening possibilities for expanding the database related to these cultural attractions with new sources, such as social media platforms, forums, and other online spaces.
Lastly, the thesis presents a methodological study examining the group-specificity of words, analyzing various group-specificity estimators proposed in the literature. The study considered grouped text documents with both outcome and group variables. Its contribution consists of the proposal of modeling the corpus of documents as a multivariate distribution, enabling the simulation of corpora of text documents with predefined characteristics. The simulation provided valuable insights into the relationship between groups of documents and words. Furthermore, all its results can be freely explored through a web application, whose components are also described in this manuscript.
In conclusion, this thesis has been conceived as a collection of papers. It aimed to contribute to the field with both applications and methodological proposals, and each study presented here suggests paths for future research to address the challenges in the analysis of grouped textual data
UMSL Bulletin 2023-2024
The 2023-2024 Bulletin and Course Catalog for the University of Missouri St. Louis.https://irl.umsl.edu/bulletin/1088/thumbnail.jp
Multidisciplinary perspectives on Artificial Intelligence and the law
This open access book presents an interdisciplinary, multi-authored, edited collection of chapters on Artificial Intelligence (‘AI’) and the Law. AI technology has come to play a central role in the modern data economy. Through a combination of increased computing power, the growing availability of data and the advancement of algorithms, AI has now become an umbrella term for some of the most transformational technological breakthroughs of this age. The importance of AI stems from both the opportunities that it offers and the challenges that it entails. While AI applications hold the promise of economic growth and efficiency gains, they also create significant risks and uncertainty. The potential and perils of AI have thus come to dominate modern discussions of technology and ethics – and although AI was initially allowed to largely develop without guidelines or rules, few would deny that the law is set to play a fundamental role in shaping the future of AI. As the debate over AI is far from over, the need for rigorous analysis has never been greater. This book thus brings together contributors from different fields and backgrounds to explore how the law might provide answers to some of the most pressing questions raised by AI. An outcome of the Católica Research Centre for the Future of Law and its interdisciplinary working group on Law and Artificial Intelligence, it includes contributions by leading scholars in the fields of technology, ethics and the law.info:eu-repo/semantics/publishedVersio
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
UMSL Bulletin 2022-2023
The 2022-2023 Bulletin and Course Catalog for the University of Missouri St. Louis.https://irl.umsl.edu/bulletin/1087/thumbnail.jp
UNPUBLISHING THE NEWS: AN ASSESSMENT OF U.S. PUBLIC OPINION, NEWSROOM ACCOUNTABILITY, AND JOURNALISTS’ AUTHORITY AS “THE FIRST DRAFT OF HISTORY”
Unpublishing, or the manipulation, deindexing, or removal of published content on a news organization’s website, is a hotly debated issue in the news industry that disrupts fundamental beliefs about the nature of news and the roles of journalists. This dissertation’s premise is that unpublishing as a phenomenon challenges the authority of journalism as “the first draft of history,” questions the assumed relevance of traditional norms, and creates an opportunity to reconsider how news organizations demonstrate their accountability to the public. The study identifies public opinions related to unpublishing practices and approval of related journalistic norms through a public opinion survey of 1,350 U.S. adults. In tandem, a qualitative analysis of 62 editorial policies related to unpublishing offers the first inventory and assessment of emerging journalistic practices and the normative values journalists demonstrate through them. These contributions are valuable to both the academy and the news industry, as they identify a path forward for future research and provide desired guidance to U.S. news organizations. Findings suggest that in response to the unpublishing phenomenon, American journalists defend their professionalism primarily through the traditional professional paradigm of accuracy, invoking it to legitimize new guidelines whether those policies permitted or denounced unpublishing as a newsroom practice. Findings also show newsrooms are pledging increased levels of accountability to their communities and society at large, but how they might demonstrate that accountability more tactically was absent from policy discourse. In addition, both American adults and news organizations place a high value on the accuracy of previously published news content, yet the groups’ temporal conceptions of accuracy must be reconciled. Ultimately, the unpublishing phenomenon presents an opportunity for journalists to redefine their notions of accountability to their communities. Based on these findings, the study concludes with a call for American news organizations to abandon claims as the “first draft of history” in the digital era and assume the role of information custodians, proactively establishing and managing the lifecycle of content.Doctor of Philosoph
2023-2024 Catalog
The 2023-2024 Governors State University Undergraduate and Graduate Catalog is a comprehensive listing of current information regarding:Degree RequirementsCourse OfferingsUndergraduate and Graduate Rules and Regulation
Investigating the learning potential of the Second Quantum Revolution: development of an approach for secondary school students
In recent years we have witnessed important changes: the Second Quantum Revolution is in the spotlight of many countries, and it is creating a new generation of technologies.
To unlock the potential of the Second Quantum Revolution, several countries have launched strategic plans and research programs that finance and set the pace of research and development of these new technologies (like the Quantum Flagship, the National Quantum Initiative Act and so on).
The increasing pace of technological changes is also challenging science education and institutional systems, requiring them to help to prepare new generations of experts.
This work is placed within physics education research and contributes to the challenge by developing an approach and a course about the Second Quantum Revolution. The aims are to promote quantum literacy and, in particular, to value from a cultural and educational perspective the Second Revolution.
The dissertation is articulated in two parts. In the first, we unpack the Second Quantum Revolution from a cultural perspective and shed light on the main revolutionary aspects that are elevated to the rank of principles implemented in the design of a course for secondary school students, prospective and in-service teachers. The design process and the educational reconstruction of the activities are presented as well as the results of a pilot study conducted to investigate the impact of the approach on students' understanding and to gather feedback to refine and improve the instructional materials.
The second part consists of the exploration of the Second Quantum Revolution as a context to introduce some basic concepts of quantum physics. We present the results of an implementation with secondary school students to investigate if and to what extent external representations could play any role to promote students’ understanding and acceptance of quantum physics as a personal reliable description of the world
NEMISA Digital Skills Conference (Colloquium) 2023
The purpose of the colloquium and events centred around the central role that data plays
today as a desirable commodity that must become an important part of massifying digital
skilling efforts. Governments amass even more critical data that, if leveraged, could
change the way public services are delivered, and even change the social and economic
fortunes of any country. Therefore, smart governments and organisations increasingly
require data skills to gain insights and foresight, to secure themselves, and for improved
decision making and efficiency. However, data skills are scarce, and even more
challenging is the inconsistency of the associated training programs with most curated for
the Science, Technology, Engineering, and Mathematics (STEM) disciplines.
Nonetheless, the interdisciplinary yet agnostic nature of data means that there is
opportunity to expand data skills into the non-STEM disciplines as well.College of Engineering, Science and Technolog
A Review of Deep Learning Models for Twitter Sentiment Analysis: Challenges and Opportunities
Microblogging site Twitter (re-branded to X since July 2023) is one of the most influential online social media websites, which offers a platform for the masses to communicate, expresses their opinions, and shares information on a wide range of subjects and products, resulting in the creation of a large amount of unstructured data. This has attracted significant attention from researchers who seek to understand and analyze the sentiments contained within this massive user-generated text. The task of sentiment analysis (SA) entails extracting and identifying user opinions from the text, and various lexicon-and machine learning-based methods have been developed over the years to accomplish this. However, deep learning (DL)-based approaches have recently become dominant due to their superior performance. This study briefs on standard preprocessing techniques and various word embeddings for data preparation. It then delves into a taxonomy to provide a comprehensive summary of DL-based approaches. In addition, the work compiles popular benchmark datasets and highlights evaluation metrics employed for performance measures and the resources available in the public domain to aid SA tasks. Furthermore, the survey discusses domain-specific practical applications of SA tasks. Finally, the study concludes with various research challenges and outlines future outlooks for further investigation
- …