8,280 research outputs found

    Undergraduate Catalog of Studies, 2023-2024

    Get PDF

    Integrating expert-based objectivist and nonexpert-based subjectivist paradigms in landscape assessment

    Get PDF
    This thesis explores the integration of objective and subjective measures of landscape aesthetics, particularly focusing on crowdsourced geo-information. It addresses the increasing importance of considering public perceptions in national landscape governance, in line with the European Landscape Convention's emphasis on public involvement. Despite this, national landscape assessments often remain expert-centric and top-down, facing challenges in resource constraints and limited public engagement. The thesis leverages Web 2.0 technologies and crowdsourced geographic information, examining correlations between expert-based metrics of landscape quality and public perceptions. The Scenic-Or-Not initiative for Great Britain, GIS-based Wildness spatial layers, and LANDMAP dataset for Wales serve as key datasets for analysis. The research investigates the relationships between objective measures of landscape wildness quality and subjective measures of aesthetics. Multiscale geographically weighted regression (MGWR) reveals significant correlations, with different wildness components exhibiting varying degrees of association. The study suggests the feasibility of incorporating wildness and scenicness measures into formal landscape aesthetic assessments. Comparing expert and public perceptions, the research identifies preferences for water-related landforms and variations in upland and lowland typologies. The study emphasizes the agreement between experts and non-experts on extreme scenic perceptions but notes discrepancies in mid-spectrum landscapes. To overcome limitations in systematic landscape evaluations, an integrative approach is proposed. Utilizing XGBoost models, the research predicts spatial patterns of landscape aesthetics across Great Britain, based on the Scenic-Or-Not initiatives, Wildness spatial layers, and LANDMAP data. The models achieve comparable accuracy to traditional statistical models, offering insights for Landscape Character Assessment practices and policy decisions. While acknowledging data limitations and biases in crowdsourcing, the thesis discusses the necessity of an aggregation strategy to manage computational challenges. Methodological considerations include addressing the modifiable areal unit problem (MAUP) associated with aggregating point-based observations. The thesis comprises three studies published or submitted for publication, each contributing to the understanding of the relationship between objective and subjective measures of landscape aesthetics. The concluding chapter discusses the limitations of data and methods, providing a comprehensive overview of the research

    Deep generative models for network data synthesis and monitoring

    Get PDF
    Measurement and monitoring are fundamental tasks in all networks, enabling the down-stream management and optimization of the network. Although networks inherently have abundant amounts of monitoring data, its access and effective measurement is another story. The challenges exist in many aspects. First, the inaccessibility of network monitoring data for external users, and it is hard to provide a high-fidelity dataset without leaking commercial sensitive information. Second, it could be very expensive to carry out effective data collection to cover a large-scale network system, considering the size of network growing, i.e., cell number of radio network and the number of flows in the Internet Service Provider (ISP) network. Third, it is difficult to ensure fidelity and efficiency simultaneously in network monitoring, as the available resources in the network element that can be applied to support the measurement function are too limited to implement sophisticated mechanisms. Finally, understanding and explaining the behavior of the network becomes challenging due to its size and complex structure. Various emerging optimization-based solutions (e.g., compressive sensing) or data-driven solutions (e.g. deep learning) have been proposed for the aforementioned challenges. However, the fidelity and efficiency of existing methods cannot yet meet the current network requirements. The contributions made in this thesis significantly advance the state of the art in the domain of network measurement and monitoring techniques. Overall, we leverage cutting-edge machine learning technology, deep generative modeling, throughout the entire thesis. First, we design and realize APPSHOT , an efficient city-scale network traffic sharing with a conditional generative model, which only requires open-source contextual data during inference (e.g., land use information and population distribution). Second, we develop an efficient drive testing system — GENDT, based on generative model, which combines graph neural networks, conditional generation, and quantified model uncertainty to enhance the efficiency of mobile drive testing. Third, we design and implement DISTILGAN, a high-fidelity, efficient, versatile, and real-time network telemetry system with latent GANs and spectral-temporal networks. Finally, we propose SPOTLIGHT , an accurate, explainable, and efficient anomaly detection system of the Open RAN (Radio Access Network) system. The lessons learned through this research are summarized, and interesting topics are discussed for future work in this domain. All proposed solutions have been evaluated with real-world datasets and applied to support different applications in real systems

    Robust myoelectric pattern recognition methods for reducing users’ calibration burden: challenges and future

    Get PDF
    Myoelectric pattern recognition (MPR) has evolved into a sophisticated technology widely employed in controlling myoelectric interface (MI) devices like prosthetic and orthotic robots. Current MIs not only enable multi-degree-of-freedom control of prosthetic limbs but also demonstrate substantial potential in consumer electronics. However, the non-stationary random characteristics of myoelectric signals poses challenges, leading to performance degradation in practical scenarios such as electrode shifting and switching new users. Conventional MIs often necessitate meticulous calibration, imposing a significant burden on users. To address user frustration during the calibration process, researchers have focused on identifying MPR methods that alleviate this burden. This article categorizes common scenarios that incur calibration burdens as based on data distribution shift and based on dynamic data categories. Then further investigated and summarized the popular robust MPR algorithms used to reduce the user’s calibration burden. We categorize these algorithms as based on data manipulate, feature manipulation and, model structure. And describes the scenarios to which each method is applicable and the conditions required for calibration. Finally, this review is concluded with the advantages of robust MPR and the remaining challenges and future opportunities

    CFD Modelling of the Mixture Preparation in a Modern Gasoline Direct Injection Engine and Correlations with Experimental PN Emissions

    Get PDF
    A detailed 3D CFD analysis of a modern gasoline direct injection (GDI) engine is carried out to reveal the connections between pre-combustion mixture indicators and PN emissions. Firstly, a novel calibration methodology is introduced to accurately predict the widely used characteristics of the high-pressure fuel spray. The methodology utilised the Siemens STAR-CD 3D CFD software environment and employed a combination of statistical and optimization methods supported by experimental data. The calibration process identified dominant factors influencing spray properties and established their optimal levels. The two most used models for fuel atomisation were investigated. The Kelvin–Helmholtz/Rayleigh–Taylor (KH–RT) and Reitz–Diwakar (RD) break-up models were calibrated in conjunction with the Rosin–Rammler (RR) mono-modal droplet size distribution. RD outperformed KH–RT in terms of prediction when comparing numerical spray tip penetration and droplet size characteristics to the experimental counterparts. Then, the modelling protocol incorporated droplet-wall interaction models and a multi-component surrogate fuel blend model. The comprehensive digital model was validated using published data and applied to a modern small-capacity GDI engine. The study explored various engine operating conditions and highlights the contribution of fuel mal-distribution and liquid film retention at spark timing to Particle Number (PN) emissions. Finally, a novel surrogate model was developed to predict the engine-out PN. An extensive CFD analysis was conducted considering part-load operating conditions and variations of engine control variables. The PN surrogate model was developed using an Elastic Net (EN) regression technique, establishing relationships between experimental PN emission levels and modelled, pre-combustion, air-fuel mixture quality indicators. The approach enabled the reliable prediction of engine sooting tendencies without relying on complex measurements of combustion characteristics. These research efforts aim to enhance engine efficiency, reduce emissions, and contribute to the development of a reliable and cost-effective digital toolset for engine development and diagnostics

    Statistical analysis of grouped text documents

    Get PDF
    L'argomento di questa tesi sono i modelli statistici per l'analisi dei dati testuali, con particolare attenzione ai contesti in cui i campioni di testo sono raggruppati. Quando si ha a che fare con dati testuali, il primo problema è quello di elaborarli, per renderli compatibili dal punto di vista computazionale e metodologico con i metodi matematici e statistici prodotti e continuamente sviluppati dalla comunità scientifica. Per questo motivo, la tesi passa in rassegna i metodi esistenti per la rappresentazione analitica e l'elaborazione di campioni di dati testuali, compresi i "Vector Space Models", le "rappresentazioni distribuite" di parole e documenti e i "contextualized embeddings". Questa rassegna comporta la standardizzazione di una notazione che, anche all'interno dello stesso approccio di rappresentazione, appare molto eterogenea in letteratura. Vengono poi esplorati due domini di applicazione: i social media e il turismo culturale. Per quanto riguarda il primo, viene proposto uno studio sull'autodescrizione di gruppi diversi di individui sulla piattaforma StockTwits, dove i mercati finanziari sono gli argomenti dominanti. La metodologia proposta ha integrato diversi tipi di dati, sia testuali che variabili categoriche. Questo studio ha agevolato la comprensione sul modo in cui le persone si presentano online e ha trovato stutture di comportamento ricorrenti all'interno di gruppi di utenti. Per quanto riguarda il turismo culturale, la tesi approfondisce uno studio condotto nell'ambito del progetto "Data Science for Brescia - Arts and Cultural Places", in cui è stato addestrato un modello linguistico per classificare le recensioni online scritte in italiano in quattro aree semantiche distinte relative alle attrazioni culturali della città di Brescia. Il modello proposto permette di identificare le attrazioni nei documenti di testo, anche quando non sono esplicitamente menzionate nei metadati del documento, aprendo cosÏ la possibilità di espandere il database relativo a queste attrazioni culturali con nuove fonti, come piattaforme di social media, forum e altri spazi online. Infine, la tesi presenta uno studio metodologico che esamina la specificità di gruppo delle parole, analizzando diversi stimatori di specificità di gruppo proposti in letteratura. Lo studio ha preso in considerazione documenti testuali raggruppati con variabile di "outcome" e variabile di gruppo. Il suo contributo consiste nella proposta di modellare il corpus di documenti come una distribuzione multivariata, consentendo la simulazione di corpora di documenti di testo con caratteristiche predefinite. La simulazione ha fornito preziose indicazioni sulla relazione tra gruppi di documenti e parole. Inoltre, tutti i risultati possono essere liberamente esplorati attraverso un'applicazione web, i cui componenti sono altresÏ descritti in questo manoscritto. In conclusione, questa tesi è stata concepita come una raccolta di studi, ognuno dei quali suggerisce percorsi di ricerca futuri per affrontare le sfide dell'analisi dei dati testuali raggruppati.The topic of this thesis is statistical models for the analysis of textual data, emphasizing contexts in which text samples are grouped. When dealing with text data, the first issue is to process it, making it computationally and methodologically compatible with the existing mathematical and statistical methods produced and continually developed by the scientific community. Therefore, the thesis firstly reviews existing methods for analytically representing and processing textual datasets, including Vector Space Models, distributed representations of words and documents, and contextualized embeddings. It realizes this review by standardizing a notation that, even within the same representation approach, appears highly heterogeneous in the literature. Then, two domains of application are explored: social media and cultural tourism. About the former, a study is proposed about self-presentation among diverse groups of individuals on the StockTwits platform, where finance and stock markets are the dominant topics. The methodology proposed integrated various types of data, including textual and categorical data. This study revealed insights into how people present themselves online and found recurring patterns within groups of users. About the latter, the thesis delves into a study conducted as part of the "Data Science for Brescia - Arts and Cultural Places" Project, where a language model was trained to classify Italian-written online reviews into four distinct semantic areas related to cultural attractions in the Italian city of Brescia. The model proposed allows for the identification of attractions in text documents, even when not explicitly mentioned in document metadata, thus opening possibilities for expanding the database related to these cultural attractions with new sources, such as social media platforms, forums, and other online spaces. Lastly, the thesis presents a methodological study examining the group-specificity of words, analyzing various group-specificity estimators proposed in the literature. The study considered grouped text documents with both outcome and group variables. Its contribution consists of the proposal of modeling the corpus of documents as a multivariate distribution, enabling the simulation of corpora of text documents with predefined characteristics. The simulation provided valuable insights into the relationship between groups of documents and words. Furthermore, all its results can be freely explored through a web application, whose components are also described in this manuscript. In conclusion, this thesis has been conceived as a collection of papers. It aimed to contribute to the field with both applications and methodological proposals, and each study presented here suggests paths for future research to address the challenges in the analysis of grouped textual data

    Online semi-supervised learning in non-stationary environments

    Get PDF
    Existing Data Stream Mining (DSM) algorithms assume the availability of labelled and balanced data, immediately or after some delay, to extract worthwhile knowledge from the continuous and rapid data streams. However, in many real-world applications such as Robotics, Weather Monitoring, Fraud Detection Systems, Cyber Security, and Computer Network Traffic Flow, an enormous amount of high-speed data is generated by Internet of Things sensors and real-time data on the Internet. Manual labelling of these data streams is not practical due to time consumption and the need for domain expertise. Another challenge is learning under Non-Stationary Environments (NSEs), which occurs due to changes in the data distributions in a set of input variables and/or class labels. The problem of Extreme Verification Latency (EVL) under NSEs is referred to as Initially Labelled Non-Stationary Environment (ILNSE). This is a challenging task because the learning algorithms have no access to the true class labels directly when the concept evolves. Several approaches exist that deal with NSE and EVL in isolation. However, few algorithms address both issues simultaneously. This research directly responds to ILNSE’s challenge in proposing two novel algorithms “Predictor for Streaming Data with Scarce Labels” (PSDSL) and Heterogeneous Dynamic Weighted Majority (HDWM) classifier. PSDSL is an Online Semi-Supervised Learning (OSSL) method for real-time DSM and is closely related to label scarcity issues in online machine learning. The key capabilities of PSDSL include learning from a small amount of labelled data in an incremental or online manner and being available to predict at any time. To achieve this, PSDSL utilises both labelled and unlabelled data to train the prediction models, meaning it continuously learns from incoming data and updates the model as new labelled or unlabelled data becomes available over time. Furthermore, it can predict under NSE conditions under the scarcity of class labels. PSDSL is built on top of the HDWM classifier, which preserves the diversity of the classifiers. PSDSL and HDWM can intelligently switch and adapt to the conditions. The PSDSL adapts to learning states between self-learning, micro-clustering and CGC, whichever approach is beneficial, based on the characteristics of the data stream. HDWM makes use of “seed” learners of different types in an ensemble to maintain its diversity. The ensembles are simply the combination of predictive models grouped to improve the predictive performance of a single classifier. PSDSL is empirically evaluated against COMPOSE, LEVELIW, SCARGC and MClassification on benchmarks, NSE datasets as well as Massive Online Analysis (MOA) data streams and real-world datasets. The results showed that PSDSL performed significantly better than existing approaches on most real-time data streams including randomised data instances. PSDSL performed significantly better than ‘Static’ i.e. the classifier is not updated after it is trained with the first examples in the data streams. When applied to MOA-generated data streams, PSDSL ranked highest (1.5) and thus performed significantly better than SCARGC, while SCARGC performed the same as the Static. PSDSL achieved better average prediction accuracies in a short time than SCARGC. The HDWM algorithm is evaluated on artificial and real-world data streams against existing well-known approaches such as the heterogeneous WMA and the homogeneous Dynamic DWM algorithm. The results showed that HDWM performed significantly better than WMA and DWM. Also, when recurring concept drifts were present, the predictive performance of HDWM showed an improvement over DWM. In both drift and real-world streams, significance tests and post hoc comparisons found significant differences between algorithms, HDWM performed significantly better than DWM and WMA when applied to MOA data streams and 4 real-world datasets Electric, Spam, Sensor and Forest cover. The seeding mechanism and dynamic inclusion of new base learners in the HDWM algorithms benefit from the use of both forgetting and retaining the models. The algorithm also provides the independence of selecting the optimal base classifier in its ensemble depending on the problem. A new approach, Envelope-Clustering is introduced to resolve the cluster overlap conflicts during the cluster labelling process. In this process, PSDSL transforms the centroids’ information of micro-clusters into micro-instances and generates new clusters called Envelopes. The nearest envelope clusters assist the conflicted micro-clusters and successfully guide the cluster labelling process after the concept drifts in the absence of true class labels. PSDSL has been evaluated on real-world problem ‘keystroke dynamics’, and the results show that PSDSL achieved higher prediction accuracy (85.3%) and SCARGC (81.6%), while the Static (49.0%) significantly degrades the performance due to changes in the users typing pattern. Furthermore, the predictive accuracies of SCARGC are found highly fluctuated between (41.1% to 81.6%) based on different values of parameter ‘k’ (number of clusters), while PSDSL automatically determine the best values for this parameter

    Self-supervised learning for transferable representations

    Get PDF
    Machine learning has undeniably achieved remarkable advances thanks to large labelled datasets and supervised learning. However, this progress is constrained by the labour-intensive annotation process. It is not feasible to generate extensive labelled datasets for every problem we aim to address. Consequently, there has been a notable shift in recent times toward approaches that solely leverage raw data. Among these, self-supervised learning has emerged as a particularly powerful approach, offering scalability to massive datasets and showcasing considerable potential for effective knowledge transfer. This thesis investigates self-supervised representation learning with a strong focus on computer vision applications. We provide a comprehensive survey of self-supervised methods across various modalities, introducing a taxonomy that categorises them into four distinct families while also highlighting practical considerations for real-world implementation. Our focus thenceforth is on the computer vision modality, where we perform a comprehensive benchmark evaluation of state-of-the-art self supervised models against many diverse downstream transfer tasks. Our findings reveal that self-supervised models often outperform supervised learning across a spectrum of tasks, albeit with correlations weakening as tasks transition beyond classification, particularly for datasets with distribution shifts. Digging deeper, we investigate the influence of data augmentation on the transferability of contrastive learners, uncovering a trade-off between spatial and appearance-based invariances that generalise to real-world transformations. This begins to explain the differing empirical performances achieved by self-supervised learners on different downstream tasks, and it showcases the advantages of specialised representations produced with tailored augmentation. Finally, we introduce a novel self-supervised pre-training algorithm for object detection, aligning pre-training with downstream architecture and objectives, leading to reduced localisation errors and improved label efficiency. In conclusion, this thesis contributes a comprehensive understanding of self-supervised representation learning and its role in enabling effective transfer across computer vision tasks

    UMSL Bulletin 2023-2024

    Get PDF
    The 2023-2024 Bulletin and Course Catalog for the University of Missouri St. Louis.https://irl.umsl.edu/bulletin/1088/thumbnail.jp

    Evaluation of Data Processing and Artifact Removal Approaches Used for Physiological Signals Captured Using Wearable Sensing Devices during Construction Tasks

    Get PDF
    Wearable sensing devices (WSDs) have enormous promise for monitoring construction worker safety. They can track workers and send safety-related information in real time, allowing for more effective and preventative decision making. WSDs are particularly useful on construction sites since they can track workers’ health, safety, and activity levels, among other metrics that could help optimize their daily tasks. WSDs may also assist workers in recognizing health-related safety risks (such as physical fatigue) and taking appropriate action to mitigate them. The data produced by these WSDs, however, is highly noisy and contaminated with artifacts that could have been introduced by the surroundings, the experimental apparatus, or the subject’s physiological state. These artifacts are very strong and frequently found during field experiments. So, when there is a lot of artifacts, the signal quality drops. Recently, artifacts removal has been greatly enhanced by developments in signal processing, which has vastly enhanced the performance. Thus, the proposed review aimed to provide an in-depth analysis of the approaches currently used to analyze data and remove artifacts from physiological signals obtained via WSDs during construction-related tasks. First, this study provides an overview of the physiological signals that are likely to be recorded from construction workers to monitor their health and safety. Second, this review identifies the most prevalent artifacts that have the most detrimental effect on the utility of the signals. Third, a comprehensive review of existing artifact-removal approaches were presented. Fourth, each identified artifact detection and removal approach was analyzed for its strengths and weaknesses. Finally, in conclusion, this review provides a few suggestions for future research for improving the quality of captured physiological signals for monitoring the health and safety of construction workers using artifact removal approaches
    • …
    corecore