9,056 research outputs found

    The PREVENT Study: Preventing hospital admissions attributable to gout

    Get PDF
    BackgroundGout is the most common form of inflammatory arthritis, affecting 1 in 40 people in the UK. Despite highly effective treatments, hospital admissions for gout flares have doubled in England over the last 20 years. Many of these admissions may have been prevented if optimal gout management had been delivered to patients.Objectives1. Describe the epidemiology of gout management in primary and secondary care in the UK.2. Develop an intervention package for implementation during hospitalisations for gout flares, with the aim of improving care and reducing hospitalisations.3. Implement and evaluate this intervention in people hospitalised for gout.MethodsI used population-level health datasets (CPRD, OpenSAFELY, NHS Digital Hospital Episode Statistics) to evaluate outcomes for people with incident gout diagnoses over a 20-year period. I used multivariable regression and survival modelling to analyse factors associated with outcomes, including: i) initiation of urate-lowering therapies (ULT); ii) attainment of serum urate targets; and iii) hospitalisations for gout flares.With extensive stakeholder input, I developed an evidence-based intervention package to optimise hospital gout care. This incorporated the findings of a systematic literature review and process mapping of the admitted patient journey in a cohort of hospitalised gout patients. My intervention consisted of a care pathway, based upon British (BSR), European (EULAR) and American (ACR) gout management guidelines, which encouraged ULT initiation prior to discharge, followed by a nurse-led, post-discharge review to facilitate handover to primary care. I implemented this intervention in patients hospitalised for gout flares at King’s College Hospital over a 12-month period, and evaluated outcomes including ULT initiation, urate target attainment and re-admission rates.ResultsIn the UK, between 2004 and 2020, I showed that only 29% of patients with gout were initiated on ULT within 12 months of diagnosis, while only 36% attained urate targets. No significant improvements in these outcomes were observed after publication of updated BSR and EULAR gout management guidelines. Comorbidities, including chronic kidney disease, heart failure and obesity, associated with increased odds of ULT initiation but decreased odds of attaining urate targets. For patients who were diagnosed with gout during the COVID-19 pandemic, I showed that ULT initiation improved modestly, relative to before the pandemic, while urate target attainment trends were similar. Underlying these trends was a 31% decrease in incident gout diagnoses in England during the first year of the pandemic.Using linked primary and secondary care data, I showed that the risk of hospitalisations for gout flares is greatest within the first 6 months after diagnosis. ULT initiation is associated with more hospitalisations for flares within the first 6 months of diagnosis, but a reduced risk of hospitalisations beyond 12 months; particularly when urate targets are attained.After process mapping the admitted patient journey and systematically appraising the evidence base, I developed and implemented a multi-faceted intervention at King’s College Hospital, with the aim of improving hospital gout care. Following implementation of this intervention, the proportion of hospitalised gout patients who initiated ULT increased from 49% to 92%; more patients achieved serum urate targets; and there were 38% fewer repeat hospitalisations for gout flares.ConclusionsAt a population level, ULT initiation and urate target attainment remain sub-optimal for people with gout in the UK, despite updated management guidelines. Initiation of ULT is associated with long-term reductions in hospitalisations for flares; however, only a minority of patients hospitalised for gout flares are initiated on ULT. After designing and implementing a strategy to optimise hospital gout care, over 90% of patients were initiated on ULT, urate target attainment improved, and repeat hospitalisations decreased. My findings suggest that improved primary-secondary care integration is essential if we are to reverse the epidemic of gout hospitalisations

    Statistical analysis of grouped text documents

    Get PDF
    L'argomento di questa tesi sono i modelli statistici per l'analisi dei dati testuali, con particolare attenzione ai contesti in cui i campioni di testo sono raggruppati. Quando si ha a che fare con dati testuali, il primo problema è quello di elaborarli, per renderli compatibili dal punto di vista computazionale e metodologico con i metodi matematici e statistici prodotti e continuamente sviluppati dalla comunità scientifica. Per questo motivo, la tesi passa in rassegna i metodi esistenti per la rappresentazione analitica e l'elaborazione di campioni di dati testuali, compresi i "Vector Space Models", le "rappresentazioni distribuite" di parole e documenti e i "contextualized embeddings". Questa rassegna comporta la standardizzazione di una notazione che, anche all'interno dello stesso approccio di rappresentazione, appare molto eterogenea in letteratura. Vengono poi esplorati due domini di applicazione: i social media e il turismo culturale. Per quanto riguarda il primo, viene proposto uno studio sull'autodescrizione di gruppi diversi di individui sulla piattaforma StockTwits, dove i mercati finanziari sono gli argomenti dominanti. La metodologia proposta ha integrato diversi tipi di dati, sia testuali che variabili categoriche. Questo studio ha agevolato la comprensione sul modo in cui le persone si presentano online e ha trovato stutture di comportamento ricorrenti all'interno di gruppi di utenti. Per quanto riguarda il turismo culturale, la tesi approfondisce uno studio condotto nell'ambito del progetto "Data Science for Brescia - Arts and Cultural Places", in cui è stato addestrato un modello linguistico per classificare le recensioni online scritte in italiano in quattro aree semantiche distinte relative alle attrazioni culturali della città di Brescia. Il modello proposto permette di identificare le attrazioni nei documenti di testo, anche quando non sono esplicitamente menzionate nei metadati del documento, aprendo così la possibilità di espandere il database relativo a queste attrazioni culturali con nuove fonti, come piattaforme di social media, forum e altri spazi online. Infine, la tesi presenta uno studio metodologico che esamina la specificità di gruppo delle parole, analizzando diversi stimatori di specificità di gruppo proposti in letteratura. Lo studio ha preso in considerazione documenti testuali raggruppati con variabile di "outcome" e variabile di gruppo. Il suo contributo consiste nella proposta di modellare il corpus di documenti come una distribuzione multivariata, consentendo la simulazione di corpora di documenti di testo con caratteristiche predefinite. La simulazione ha fornito preziose indicazioni sulla relazione tra gruppi di documenti e parole. Inoltre, tutti i risultati possono essere liberamente esplorati attraverso un'applicazione web, i cui componenti sono altresì descritti in questo manoscritto. In conclusione, questa tesi è stata concepita come una raccolta di studi, ognuno dei quali suggerisce percorsi di ricerca futuri per affrontare le sfide dell'analisi dei dati testuali raggruppati.The topic of this thesis is statistical models for the analysis of textual data, emphasizing contexts in which text samples are grouped. When dealing with text data, the first issue is to process it, making it computationally and methodologically compatible with the existing mathematical and statistical methods produced and continually developed by the scientific community. Therefore, the thesis firstly reviews existing methods for analytically representing and processing textual datasets, including Vector Space Models, distributed representations of words and documents, and contextualized embeddings. It realizes this review by standardizing a notation that, even within the same representation approach, appears highly heterogeneous in the literature. Then, two domains of application are explored: social media and cultural tourism. About the former, a study is proposed about self-presentation among diverse groups of individuals on the StockTwits platform, where finance and stock markets are the dominant topics. The methodology proposed integrated various types of data, including textual and categorical data. This study revealed insights into how people present themselves online and found recurring patterns within groups of users. About the latter, the thesis delves into a study conducted as part of the "Data Science for Brescia - Arts and Cultural Places" Project, where a language model was trained to classify Italian-written online reviews into four distinct semantic areas related to cultural attractions in the Italian city of Brescia. The model proposed allows for the identification of attractions in text documents, even when not explicitly mentioned in document metadata, thus opening possibilities for expanding the database related to these cultural attractions with new sources, such as social media platforms, forums, and other online spaces. Lastly, the thesis presents a methodological study examining the group-specificity of words, analyzing various group-specificity estimators proposed in the literature. The study considered grouped text documents with both outcome and group variables. Its contribution consists of the proposal of modeling the corpus of documents as a multivariate distribution, enabling the simulation of corpora of text documents with predefined characteristics. The simulation provided valuable insights into the relationship between groups of documents and words. Furthermore, all its results can be freely explored through a web application, whose components are also described in this manuscript. In conclusion, this thesis has been conceived as a collection of papers. It aimed to contribute to the field with both applications and methodological proposals, and each study presented here suggests paths for future research to address the challenges in the analysis of grouped textual data

    Redefining Disproportionate Arrest Rates: An Exploratory Quasi-Experiment that Reassesses the Role of Skin Tone

    Get PDF
    The New York Times reported that Black Lives Matter was the third most-read subject of 2020. These articles brought to the forefront the question of disparity in arrest rates for darker-skinned people. Questioning arrest disparity is understandable because virtually everything known about disproportionate arrest rates has been a guess, and virtually all prior research on disproportionate arrest rates is questionable because of improper benchmarking (the denominator effect). Current research has highlighted the need to switch from demographic data to skin tone data and start over on disproportionate arrest rate research; therefore, this study explored the relationship between skin tone and disproportionate arrest rates. This study also sought to determine which of the three theories surrounding disproportionate arrests is most predictive of disproportionate rates. The current theories are that disproportionate arrests increase as skin tone gets darker (stereotype threat theory), disproportionate rates are different for Black and Brown people (self-categorization theory), or disproportionate rates apply equally across all darker skin colors (social dominance theory). This study used a quantitative exploratory quasi-experimental design using linear spline regression to analyze arrest rates in Alachua County, Florida, before and after the county’s mandate to reduce arrests as much as possible during the COVID-19 pandemic to protect the prison population. The study was exploratory as no previous study has used skin tone analysis to examine arrest disparity. The findings of this study redefines the understanding of the existence and nature of disparities in arrest rates and offer a solid foundation for additional studies about the relationship between disproportionate arrest rates and skin color

    Essays on Corporate Disclosure of Value Creation

    Get PDF
    Information on a firm’s business model helps investors understand an entity’s resource requirements, priorities for action, and prospects (FASB, 2001, pp. 14-15; IASB, 2010, p. 12). Disclosures of strategy and business model (SBM) are therefore considered a central element of effective annual report commentary (Guillaume, 2018; IIRC, 2011). By applying natural language processing techniques, I explore what SBM disclosures look like when management are pressed to say something, analyse determinants of cross-sectional variation in SBM reporting properties, and assess whether and how managers respond to regulatory interventions seeking to promote SBM annual report commentary. This dissertation contains three main chapters. Chapter 2 presents a systematic review of the academic literature on non-financial reporting and the emerging literature on SBM reporting. Here, I also introduce my institutional setting. Chapter 3 and Chapter 4 form the empirical sections of this thesis. In Chapter 3, I construct the first large sample corpus of SBM annual report commentary and provide the first systematic analysis of the properties of such disclosures. My topic modelling analysis rejects the hypothesis that such disclosure is merely padding; instead finding themes align with popular strategy frameworks and management tailor the mix of SBM topics to reflect their unique approach to value creation. However, SBM commentary is less specific, less precise about time horizon (short- and long-term), and less balanced (more positive) in tone relative to general management commentary. My findings suggest symbolic compliance and legitimisation characterize the typical annual report discussion of SBM. Further analysis identifies proprietary cost considerations and obfuscation incentives as key determinants of symbolic reporting. In Chapter 4, I seek evidence on how managers respond to regulatory mandates by adapting the properties of disclosure and investigate whether the form of the mandate matters. Using a differences-in-differences research design, my results suggest a modest incremental response by treatment firms to the introduction of a comply or explain provision to provide disclosure on strategy and business model. In contrast, I find a substantial response to enacting the same requirements in law. My analysis provides clear and consistent evidence that treatment firms incrementally increase the volume of SBM disclosure, improve coverage across a broad range of topics as well as providing commentary with greater focus on the long term. My results point to substantial changes in SBM reporting properties following regulatory mandates, but the form of the mandate does matter. Overall, this dissertation contributes to the accounting literature by examining how firms discuss a central topic to economic decision making in annual reports and how firms respond to different forms of disclosure mandate. Furthermore, the results of my analysis are likely to be of value for regulators and policymakers currently reviewing or considering mandating disclosure requirements. By examining how companies adapt their reporting to different types of regulations, this study provides an empirical basis for recalibrating SBM disclosure mandates, thereby enhancing the information set of capital market participants and promoting stakeholder engagement in a landscape increasingly shaped by non-financial information

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    A Simple and Effective Method of Cross-Lingual Plagiarism Detection

    Full text link
    We present a simple cross-lingual plagiarism detection method applicable to a large number of languages. The presented approach leverages open multilingual thesauri for candidate retrieval task and pre-trained multilingual BERT-based language models for detailed analysis. The method does not rely on machine translation and word sense disambiguation when in use, and therefore is suitable for a large number of languages, including under-resourced languages. The effectiveness of the proposed approach is demonstrated for several existing and new benchmarks, achieving state-of-the-art results for French, Russian, and Armenian languages

    Low- and high-resource opinion summarization

    Get PDF
    Customer reviews play a vital role in the online purchasing decisions we make. The reviews express user opinions that are useful for setting realistic expectations and uncovering important details about products. However, some products receive hundreds or even thousands of reviews, making them time-consuming to read. Moreover, many reviews contain uninformative content, such as irrelevant personal experiences. Automatic summarization offers an alternative – short text summaries capturing the essential information expressed in reviews. Automatically produced summaries can reflect overall or particular opinions and be tailored to user preferences. Besides being presented on major e-commerce platforms, home assistants can also vocalize them. This approach can improve user satisfaction by assisting in making faster and better decisions. Modern summarization approaches are based on neural networks, often requiring thousands of annotated samples for training. However, human-written summaries for products are expensive to produce because annotators need to read many reviews. This has led to annotated data scarcity where only a few datasets are available. Data scarcity is the central theme of our works, and we propose a number of approaches to alleviate the problem. The thesis consists of two parts where we discuss low- and high-resource data settings. In the first part, we propose self-supervised learning methods applied to customer reviews and few-shot methods for learning from small annotated datasets. Customer reviews without summaries are available in large quantities, contain a breadth of in-domain specifics, and provide a powerful training signal. We show that reviews can be used for learning summarizers via a self-supervised objective. Further, we address two main challenges associated with learning from small annotated datasets. First, large models rapidly overfit on small datasets leading to poor generalization. Second, it is not possible to learn a wide range of in-domain specifics (e.g., product aspects and usage) from a handful of gold samples. This leads to subtle semantic mistakes in generated summaries, such as ‘great dead on arrival battery.’ We address the first challenge by explicitly modeling summary properties (e.g., content coverage and sentiment alignment). Furthermore, we leverage small modules – adapters – that are more robust to overfitting. As we show, despite their size, these modules can be used to store in-domain knowledge to reduce semantic mistakes. Lastly, we propose a simple method for learning personalized summarizers based on aspects, such as ‘price,’ ‘battery life,’ and ‘resolution.’ This task is harder to learn, and we present a few-shot method for training a query-based summarizer on small annotated datasets. In the second part, we focus on the high-resource setting and present a large dataset with summaries collected from various online resources. The dataset has more than 33,000 humanwritten summaries, where each is linked up to thousands of reviews. This, however, makes it challenging to apply an ‘expensive’ deep encoder due to memory and computational costs. To address this problem, we propose selecting small subsets of informative reviews. Only these subsets are encoded by the deep encoder and subsequently summarized. We show that the selector and summarizer can be trained end-to-end via amortized inference and policy gradient methods

    The Application of Data Analytics Technologies for the Predictive Maintenance of Industrial Facilities in Internet of Things (IoT) Environments

    Get PDF
    In industrial production environments, the maintenance of equipment has a decisive influence on costs and on the plannability of production capacities. In particular, unplanned failures during production times cause high costs, unplanned downtimes and possibly additional collateral damage. Predictive Maintenance starts here and tries to predict a possible failure and its cause so early that its prevention can be prepared and carried out in time. In order to be able to predict malfunctions and failures, the industrial plant with its characteristics, as well as wear and ageing processes, must be modelled. Such modelling can be done by replicating its physical properties. However, this is very complex and requires enormous expert knowledge about the plant and about wear and ageing processes of each individual component. Neural networks and machine learning make it possible to train such models using data and offer an alternative, especially when very complex and non-linear behaviour is evident. In order for models to make predictions, as much data as possible about the condition of a plant and its environment and production planning data is needed. In Industrial Internet of Things (IIoT) environments, the amount of available data is constantly increasing. Intelligent sensors and highly interconnected production facilities produce a steady stream of data. The sheer volume of data, but also the steady stream in which data is transmitted, place high demands on the data processing systems. If a participating system wants to perform live analyses on the incoming data streams, it must be able to process the incoming data at least as fast as the continuous data stream delivers it. If this is not the case, the system falls further and further behind in processing and thus in its analyses. This also applies to Predictive Maintenance systems, especially if they use complex and computationally intensive machine learning models. If sufficiently scalable hardware resources are available, this may not be a problem at first. However, if this is not the case or if the processing takes place on decentralised units with limited hardware resources (e.g. edge devices), the runtime behaviour and resource requirements of the type of neural network used can become an important criterion. This thesis addresses Predictive Maintenance systems in IIoT environments using neural networks and Deep Learning, where the runtime behaviour and the resource requirements are relevant. The question is whether it is possible to achieve better runtimes with similarly result quality using a new type of neural network. The focus is on reducing the complexity of the network and improving its parallelisability. Inspired by projects in which complexity was distributed to less complex neural subnetworks by upstream measures, two hypotheses presented in this thesis emerged: a) the distribution of complexity into simpler subnetworks leads to faster processing overall, despite the overhead this creates, and b) if a neural cell has a deeper internal structure, this leads to a less complex network. Within the framework of a qualitative study, an overall impression of Predictive Maintenance applications in IIoT environments using neural networks was developed. Based on the findings, a novel model layout was developed named Sliced Long Short-Term Memory Neural Network (SlicedLSTM). The SlicedLSTM implements the assumptions made in the aforementioned hypotheses in its inner model architecture. Within the framework of a quantitative study, the runtime behaviour of the SlicedLSTM was compared with that of a reference model in the form of laboratory tests. The study uses synthetically generated data from a NASA project to predict failures of modules of aircraft gas turbines. The dataset contains 1,414 multivariate time series with 104,897 samples of test data and 160,360 samples of training data. As a result, it could be proven for the specific application and the data used that the SlicedLSTM delivers faster processing times with similar result accuracy and thus clearly outperforms the reference model in this respect. The hypotheses about the influence of complexity in the internal structure of the neuronal cells were confirmed by the study carried out in the context of this thesis

    TOWARD AUTOMATED THREAT MODELING BY ADVERSARY NETWORK INFRASTRUCTURE DISCOVERY

    Get PDF
    Threat modeling can help defenders ascertain potential attacker capabilities and resources, allowing better protection of critical networks and systems from sophisticated cyber-attacks. One aspect of the adversary profile that is of interest to defenders is the means to conduct a cyber-attack, including malware capabilities and network infrastructure. Even though most defenders collect data on cyber incidents, extracting knowledge about adversaries to build and improve the threat model can be time-consuming. This thesis applies machine learning methods to historical cyber incident data to enable automated threat modeling of adversary network infrastructure. Using network data of attacker command and control servers based on real-world cyber incidents, specific adversary datasets can be created and enriched using the capabilities of internet-scanning search engines. Mixing these datasets with data from benign or non-associated hosts with similar port-service mappings allows for building an interpretable machine learning model of attackers. Additionally, creating internet-scanning search engine queries based on machine learning model predictions allows for automating threat modeling of adversary infrastructure. Automated threat modeling of adversary network infrastructure allows searching for unknown or emerging threat actor network infrastructure on the Internet.Major, Ukrainian Ground ForcesApproved for public release. Distribution is unlimited

    DECEPTION BASED TECHNIQUES AGAINST RANSOMWARES: A SYSTEMATIC REVIEW

    Get PDF
    Ransomware is the most prevalent emerging business risk nowadays. It seriously affects business continuity and operations. According to Deloitte Cyber Security Landscape 2022, up to 4000 ransomware attacks occur daily, while the average number of days an organization takes to identify a breach is 191. Sophisticated cyber-attacks such as ransomware typically must go through multiple consecutive phases (initial foothold, network propagation, and action on objectives) before accomplishing its final objective. This study analyzed decoy-based solutions as an approach (detection, prevention, or mitigation) to overcome ransomware. A systematic literature review was conducted, in which the result has shown that deception-based techniques have given effective and significant performance against ransomware with minimal resources. It is also identified that contrary to general belief, deception techniques mainly involved in passive approaches (i.e., prevention, detection) possess other active capabilities such as ransomware traceback and obstruction (thwarting), file decryption, and decryption key recovery. Based on the literature review, several evaluation methods are also analyzed to measure the effectiveness of these deception-based techniques during the implementation process
    • …
    corecore