7,345 research outputs found
Statistical analysis of grouped text documents
L'argomento di questa tesi sono i modelli statistici per l'analisi dei dati testuali, con particolare attenzione ai contesti in cui i campioni di testo sono raggruppati.
Quando si ha a che fare con dati testuali, il primo problema è quello di elaborarli, per renderli compatibili dal punto di vista computazionale e metodologico con i metodi matematici e statistici prodotti e continuamente sviluppati dalla comunità scientifica. Per questo motivo, la tesi passa in rassegna i metodi esistenti per la rappresentazione analitica e l'elaborazione di campioni di dati testuali, compresi i "Vector Space Models", le "rappresentazioni distribuite" di parole e documenti e i "contextualized embeddings". Questa rassegna comporta la standardizzazione di una notazione che, anche all'interno dello stesso approccio di rappresentazione, appare molto eterogenea in letteratura.
Vengono poi esplorati due domini di applicazione: i social media e il turismo culturale. Per quanto riguarda il primo, viene proposto uno studio sull'autodescrizione di gruppi diversi di individui sulla piattaforma StockTwits, dove i mercati finanziari sono gli argomenti dominanti. La metodologia proposta ha integrato diversi tipi di dati, sia testuali che variabili categoriche. Questo studio ha agevolato la comprensione sul modo in cui le persone si presentano online e ha trovato stutture di comportamento ricorrenti all'interno di gruppi di utenti.
Per quanto riguarda il turismo culturale, la tesi approfondisce uno studio condotto nell'ambito del progetto "Data Science for Brescia - Arts and Cultural Places", in cui è stato addestrato un modello linguistico per classificare le recensioni online scritte in italiano in quattro aree semantiche distinte relative alle attrazioni culturali della città di Brescia. Il modello proposto permette di identificare le attrazioni nei documenti di testo, anche quando non sono esplicitamente menzionate nei metadati del documento, aprendo così la possibilità di espandere il database relativo a queste attrazioni culturali con nuove fonti, come piattaforme di social media, forum e altri spazi online.
Infine, la tesi presenta uno studio metodologico che esamina la specificità di gruppo delle parole, analizzando diversi stimatori di specificità di gruppo proposti in letteratura. Lo studio ha preso in considerazione documenti testuali raggruppati con variabile di "outcome" e variabile di gruppo. Il suo contributo consiste nella proposta di modellare il corpus di documenti come una distribuzione multivariata, consentendo la simulazione di corpora di documenti di testo con caratteristiche predefinite. La simulazione ha fornito preziose indicazioni sulla relazione tra gruppi di documenti e parole. Inoltre, tutti i risultati possono essere liberamente esplorati attraverso un'applicazione web, i cui componenti sono altresì descritti in questo manoscritto.
In conclusione, questa tesi è stata concepita come una raccolta di studi, ognuno dei quali suggerisce percorsi di ricerca futuri per affrontare le sfide dell'analisi dei dati testuali raggruppati.The topic of this thesis is statistical models for the analysis of textual data, emphasizing contexts in which text samples are grouped.
When dealing with text data, the first issue is to process it, making it computationally and methodologically compatible with the existing mathematical and statistical methods produced and continually developed by the scientific community. Therefore, the thesis firstly reviews existing methods for analytically representing and processing textual datasets, including Vector Space Models, distributed representations of words and documents, and contextualized embeddings. It realizes this review by standardizing a notation that, even within the same representation approach, appears highly heterogeneous in the literature.
Then, two domains of application are explored: social media and cultural tourism. About the former, a study is proposed about self-presentation among diverse groups of individuals on the StockTwits platform, where finance and stock markets are the dominant topics. The methodology proposed integrated various types of data, including textual and categorical data. This study revealed insights into how people present themselves online and found recurring patterns within groups of users.
About the latter, the thesis delves into a study conducted as part of the "Data Science for Brescia - Arts and Cultural Places" Project, where a language model was trained to classify Italian-written online reviews into four distinct semantic areas related to cultural attractions in the Italian city of Brescia. The model proposed allows for the identification of attractions in text documents, even when not explicitly mentioned in document metadata, thus opening possibilities for expanding the database related to these cultural attractions with new sources, such as social media platforms, forums, and other online spaces.
Lastly, the thesis presents a methodological study examining the group-specificity of words, analyzing various group-specificity estimators proposed in the literature. The study considered grouped text documents with both outcome and group variables. Its contribution consists of the proposal of modeling the corpus of documents as a multivariate distribution, enabling the simulation of corpora of text documents with predefined characteristics. The simulation provided valuable insights into the relationship between groups of documents and words. Furthermore, all its results can be freely explored through a web application, whose components are also described in this manuscript.
In conclusion, this thesis has been conceived as a collection of papers. It aimed to contribute to the field with both applications and methodological proposals, and each study presented here suggests paths for future research to address the challenges in the analysis of grouped textual data
Hybrid energy system integration and management for solar energy: a review
The conventional grid is increasingly integrating renewable energy sources like solar energy to lower carbon emissions and other greenhouse gases. While energy management systems support grid integration by balancing power supply with demand, they are usually either predictive or real-time and therefore unable to utilise the full array of supply and demand responses, limiting grid integration of renewable energy sources. This limitation is overcome by an integrated energy management system. This review examines various concepts related to the integrated energy management system such as the power system configurations it operates in, and the types of supply and demand side responses. These concepts and approaches are particularly relevant for power systems that rely heavily on solar energy and have constraints on energy supply and costs. Building on from there, a comprehensive overview of current research and progress regarding the development of integrated energy management system frameworks, that have both predictive and real-time energy management capabilities, is provided. The potential benefits of an energy management system that integrates solar power forecasting, demand-side management, and supply-side management are explored. Furthermore, design considerations are proposed for creating solar energy forecasting models. The findings from this review have the potential to inform ongoing studies on the design and implementation of integrated energy management system, and their effect on power systems
Recommended from our members
Mitigating Data Scarcity for Neural Language Models
In recent years, pretrained neural language models (PNLMs) have taken the field of natural language processing by storm, achieving new benchmarks and state-of-theart performances. These models often rely heavily on annotated data, which may not always be available. Data scarcity are commonly found in specialized domains, such as medical, or in low-resource languages that are underexplored by AI research. In this dissertation, we focus on mitigating data scarcity using data augmentation and neural ensemble learning techniques for neural language models. In both research directions, we implement neural network algorithms and evaluate their impact on assisting neural language models in downstream NLP tasks. Specifically, for data augmentation, we explore two techniques: 1) creating positive training data by moving an answer span around its original context and 2) using text simplification techniques to introduce a variety of writing styles to the original training data. Our results indicate that these simple and effective solutions improve the performance of neural language models considerably in low-resource NLP domains and tasks. For neural ensemble learning, we use a multi-label neural classifier to select the best prediction outcome from a variety of individual pretrained neural language models trained for a low-resource medical text simplification task
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
A Critical Review Of Post-Secondary Education Writing During A 21st Century Education Revolution
Educational materials are effective instruments which provide information and report new discoveries uncovered by researchers in specific areas of academia. Higher education, like other education institutions, rely on instructional materials to inform its practice of educating adult learners. In post-secondary education, developmental English programs are tasked with meeting the needs of dynamic populations, thus there is a continuous need for research in this area to support its changing landscape. However, the majority of scholarly thought in this area centers on K-12 reading and writing. This paucity presents a phenomenon to the post-secondary community. This research study uses a qualitative content analysis to examine peer-reviewed journals from 2003-2017, developmental online websites, and a government issued document directed toward reforming post-secondary developmental education programs. These highly relevant sources aid educators in discovering informational support to apply best practices for student success. Developmental education serves the purpose of addressing literacy gaps for students transitioning to college-level work. The findings here illuminate the dearth of material offered to developmental educators. This study suggests the field of literacy research is fragmented and highlights an apparent blind spot in scholarly literature with regard to English writing instruction. This poses a quandary for post-secondary literacy researchers in the 21st century and establishes the necessity for the literacy research community to commit future scholarship toward equipping college educators teaching writing instruction to underprepared adult learners
VIRD: Immersive Match Video Analysis for High-Performance Badminton Coaching
Badminton is a fast-paced sport that requires a strategic combination of
spatial, temporal, and technical tactics. To gain a competitive edge at
high-level competitions, badminton professionals frequently analyze match
videos to gain insights and develop game strategies. However, the current
process for analyzing matches is time-consuming and relies heavily on manual
note-taking, due to the lack of automatic data collection and appropriate
visualization tools. As a result, there is a gap in effectively analyzing
matches and communicating insights among badminton coaches and players. This
work proposes an end-to-end immersive match analysis pipeline designed in close
collaboration with badminton professionals, including Olympic and national
coaches and players. We present VIRD, a VR Bird (i.e., shuttle) immersive
analysis tool, that supports interactive badminton game analysis in an
immersive environment based on 3D reconstructed game views of the match video.
We propose a top-down analytic workflow that allows users to seamlessly move
from a high-level match overview to a detailed game view of individual rallies
and shots, using situated 3D visualizations and video. We collect 3D spatial
and dynamic shot data and player poses with computer vision models and
visualize them in VR. Through immersive visualizations, coaches can
interactively analyze situated spatial data (player positions, poses, and shot
trajectories) with flexible viewpoints while navigating between shots and
rallies effectively with embodied interaction. We evaluated the usefulness of
VIRD with Olympic and national-level coaches and players in real matches.
Results show that immersive analytics supports effective badminton match
analysis with reduced context-switching costs and enhances spatial
understanding with a high sense of presence.Comment: To Appear in IEEE Transactions on Visualization and Computer Graphics
(IEEE VIS), 202
Psychosocial aspects of living with a visible neurological condition
This thesis examines the psychosocial aspects of experience for people living with visible neurological conditions. Section one reports on a systematic literature review of qualitative studies exploring how individuals and families cope with Tourette’s syndrome. A systematic search using keywords related to coping and Tourette’s syndrome was conducted on four academic databases. A meta-ethnographic approach led to the construction of three themes: redefining the self and social identity; controlling the body; and challenging the narrative. The findings support a biopsychosocial approach to understanding the condition. This has clinical implications for the treatment of Tourette’s syndrome and future research should seek to expand on this knowledge. Section two reports on an empirical study exploring how people with neck dystonia navigate the social world. Ten participants were interviewed using a semi-structured, qualitative approach. Three themes were constructed from the data: dismissed by others for having an unfamiliar condition; negotiating a new social identity; and managing the stigma of a visible condition. The findings highlight the importance of social identity and the impact of stigma on people with visible health conditions. Further research should seek to explore the nature of distress arising from these psychosocial difficulties with the aim of tailoring clinical interventions for people with neck dystonia. Section three includes a critical appraisal with reflections on the process of conducting this project. Consideration is also given to the role of psychology in addressing systematic societal concerns such as stigma
Generalized Planning as Heuristic Search: A new planning search-space that leverages pointers over objects
Planning as heuristic search is one of the most successful approaches to
classical planning but unfortunately, it does not extend trivially to
Generalized Planning (GP). GP aims to compute algorithmic solutions that are
valid for a set of classical planning instances from a given domain, even if
these instances differ in the number of objects, the number of state variables,
their domain size, or their initial and goal configuration. The generalization
requirements of GP make it impractical to perform the state-space search that
is usually implemented by heuristic planners. This paper adapts the planning as
heuristic search paradigm to the generalization requirements of GP, and
presents the first native heuristic search approach to GP. First, the paper
introduces a new pointer-based solution space for GP that is independent of the
number of classical planning instances in a GP problem and the size of those
instances (i.e. the number of objects, state variables and their domain sizes).
Second, the paper defines a set of evaluation and heuristic functions for
guiding a combinatorial search in our new GP solution space. The computation of
these evaluation and heuristic functions does not require grounding states or
actions in advance. Therefore our GP as heuristic search approach can handle
large sets of state variables with large numerical domains, e.g.~integers.
Lastly, the paper defines an upgraded version of our novel algorithm for GP
called Best-First Generalized Planning (BFGP), that implements a best-first
search in our pointer-based solution space, and that is guided by our
evaluation/heuristic functions for GP.Comment: Under review in the Artificial Intelligence Journal (AIJ
Current issues of the management of socio-economic systems in terms of globalization challenges
The authors of the scientific monograph have come to the conclusion that the management of socio-economic systems in the terms of global challenges requires the use of mechanisms to ensure security, optimise the use of resource potential, increase competitiveness, and provide state support to economic entities. Basic research focuses on assessment of economic entities in the terms of global challenges, analysis of the financial system, migration flows, logistics and product exports, territorial development. The research results have been implemented in the different decision-making models in the context of global challenges, strategic planning, financial and food security, education management, information technology and innovation. The results of the study can be used in the developing of directions, programmes and strategies for sustainable development of economic entities and regions, increasing the competitiveness of products and services, decision-making at the level of ministries and agencies that regulate the processes of managing socio-economic systems. The results can also be used by students and young scientists in the educational process and conducting scientific research on the management of socio-economic systems in the terms of global challenges
- …