806 research outputs found
Data- og ekspertdreven variabelseleksjon for prediktive modeller i helsevesenet : mot økt tolkbarhet i underbestemte maskinlæringsproblemer
Modern data acquisition techniques in healthcare generate large collections of data from multiple sources, such as novel diagnosis and treatment methodologies. Some concrete examples are electronic healthcare record systems, genomics, and medical images. This leads to situations with often unstructured, high-dimensional heterogeneous patient cohort data where classical statistical methods may not be sufficient for optimal utilization of the data and informed decision-making. Instead, investigating such data structures with modern machine learning techniques promises to improve the understanding of patient health issues and may provide a better platform for informed decision-making by clinicians. Key requirements for this purpose include (a) sufficiently accurate predictions and (b) model interpretability. Achieving both aspects in parallel is difficult, particularly for datasets with few patients, which are common in the healthcare domain. In such cases, machine learning models encounter mathematically underdetermined systems and may overfit easily on the training data. An important approach to overcome this issue is feature selection, i.e., determining a subset of informative features from the original set of features with respect to the target variable. While potentially raising the predictive performance, feature selection fosters model interpretability by identifying a low number of relevant model parameters to better understand the underlying biological processes that lead to health issues.
Interpretability requires that feature selection is stable, i.e., small changes in the dataset do not lead to changes in the selected feature set. A concept to address instability is ensemble feature selection, i.e. the process of repeating the feature selection multiple times on subsets of samples of the original dataset and aggregating results in a meta-model. This thesis presents two approaches for ensemble feature selection, which are tailored towards high-dimensional data in healthcare: the Repeated Elastic Net Technique for feature selection (RENT) and the User-Guided Bayesian Framework for feature selection (UBayFS). While RENT is purely data-driven and builds upon elastic net regularized models, UBayFS is a general framework for ensembles with the capabilities to include expert knowledge in the feature selection process via prior weights and side constraints. A case study modeling the overall survival of cancer patients compares these novel feature selectors and demonstrates their potential in clinical practice.
Beyond the selection of single features, UBayFS also allows for selecting whole feature groups (feature blocks) that were acquired from multiple data sources, as those mentioned above. Importance quantification of such feature blocks plays a key role in tracing information about the target variable back to the acquisition modalities. Such information on feature block importance may lead to positive effects on the use of human, technical, and financial resources if systematically integrated into the planning of patient treatment by excluding the acquisition of non-informative features. Since a generalization of feature importance measures to block importance is not trivial, this thesis also investigates and compares approaches for feature block importance rankings.
This thesis demonstrates that high-dimensional datasets from multiple data sources in the medical domain can be successfully tackled by the presented approaches for feature selection. Experimental evaluations demonstrate favorable properties of both predictive performance, stability, as well as interpretability of results, which carries a high potential for better data-driven decision support in clinical practice.Moderne datainnsamlingsteknikker i helsevesenet genererer store datamengder fra flere kilder, som for eksempel nye diagnose- og behandlingsmetoder. Noen konkrete eksempler er elektroniske helsejournalsystemer, genomikk og medisinske bilder. Slike pasientkohortdata er ofte ustrukturerte, høydimensjonale og heterogene og hvor klassiske statistiske metoder ikke er tilstrekkelige for optimal utnyttelse av dataene og god informasjonsbasert beslutningstaking. Derfor kan det være lovende å analysere slike datastrukturer ved bruk av moderne maskinlæringsteknikker for å øke forståelsen av pasientenes helseproblemer og for å gi klinikerne en bedre plattform for informasjonsbasert beslutningstaking. Sentrale krav til dette formålet inkluderer (a) tilstrekkelig nøyaktige prediksjoner og (b) modelltolkbarhet. Å oppnå begge aspektene samtidig er vanskelig, spesielt for datasett med få pasienter, noe som er vanlig for data i helsevesenet. I slike tilfeller må maskinlæringsmodeller håndtere matematisk underbestemte systemer og dette kan lett føre til at modellene overtilpasses treningsdataene. Variabelseleksjon er en viktig tilnærming for å håndtere dette ved å identifisere en undergruppe av informative variabler med hensyn til responsvariablen. Samtidig som variabelseleksjonsmetoder kan lede til økt prediktiv ytelse, fremmes modelltolkbarhet ved å identifisere et lavt antall relevante modellparametere. Dette kan gi bedre forståelse av de underliggende biologiske prosessene som fører til helseproblemer.
Tolkbarhet krever at variabelseleksjonen er stabil, dvs. at små endringer i datasettet ikke fører til endringer i hvilke variabler som velges. Et konsept for å adressere ustabilitet er ensemblevariableseleksjon, dvs. prosessen med å gjenta variabelseleksjon flere ganger på en delmengde av prøvene i det originale datasett og aggregere resultater i en metamodell. Denne avhandlingen presenterer to tilnærminger for ensemblevariabelseleksjon, som er skreddersydd for høydimensjonale data i helsevesenet: "Repeated Elastic Net Technique for feature selection" (RENT) og "User-Guided Bayesian Framework for feature selection" (UBayFS). Mens RENT er datadrevet og bygger på elastic net-regulariserte modeller, er UBayFS et generelt rammeverk for ensembler som muliggjør inkludering av ekspertkunnskap i variabelseleksjonsprosessen gjennom forhåndsbestemte vekter og sidebegrensninger. En case-studie som modellerer overlevelsen av kreftpasienter sammenligner disse nye variabelseleksjonsmetodene og demonstrerer deres potensiale i klinisk praksis.
Utover valg av enkelte variabler gjør UBayFS det også mulig å velge blokker eller grupper av variabler som representerer de ulike datakildene som ble nevnt over. Kvantifisering av viktigheten av variabelgrupper spiller en nøkkelrolle for forståelsen av hvorvidt datakildene er viktige for responsvariablen. Tilgang til slik informasjon kan føre til at bruken av menneskelige, tekniske og økonomiske ressurser kan forbedres dersom informasjonen integreres systematisk i planleggingen av pasientbehandlingen. Slik kan man redusere innsamling av ikke-informative variabler. Siden generaliseringen av viktighet av variabelgrupper ikke er triviell, undersøkes og sammenlignes også tilnærminger for rangering av viktigheten til disse variabelgruppene.
Denne avhandlingen viser at høydimensjonale datasett fra flere datakilder fra det medisinske domenet effektivt kan håndteres ved bruk av variabelseleksjonmetodene som er presentert i avhandlingen. Eksperimentene viser at disse kan ha positiv en effekt på både prediktiv ytelse, stabilitet og tolkbarhet av resultatene. Bruken av disse variabelseleksjonsmetodene bærer et stort potensiale for bedre datadrevet beslutningsstøtte i klinisk praksis
La traduzione specializzata all’opera per una piccola impresa in espansione: la mia esperienza di internazionalizzazione in cinese di Bioretics© S.r.l.
Global markets are currently immersed in two all-encompassing and unstoppable processes: internationalization and globalization. While the former pushes companies to look beyond the borders of their country of origin to forge relationships with foreign trading partners, the latter fosters the standardization in all countries, by reducing spatiotemporal distances and breaking down geographical, political, economic and socio-cultural barriers. In recent decades, another domain has appeared to propel these unifying drives: Artificial Intelligence, together with its high technologies aiming to implement human cognitive abilities in machinery. The “Language Toolkit – Le lingue straniere al servizio dell’internazionalizzazione dell’impresa” project, promoted by the Department of Interpreting and Translation (Forlì Campus) in collaboration with the Romagna Chamber of Commerce (Forlì-Cesena and Rimini), seeks to help Italian SMEs make their way into the global market. It is precisely within this project that this dissertation has been conceived. Indeed, its purpose is to present the translation and localization project from English into Chinese of a series of texts produced by Bioretics© S.r.l.: an investor deck, the company website and part of the installation and use manual of the Aliquis© framework software, its flagship product. This dissertation is structured as follows: Chapter 1 presents the project and the company in detail; Chapter 2 outlines the internationalization and globalization processes and the Artificial Intelligence market both in Italy and in China; Chapter 3 provides the theoretical foundations for every aspect related to Specialized Translation, including website localization; Chapter 4 describes the resources and tools used to perform the translations; Chapter 5 proposes an analysis of the source texts; Chapter 6 is a commentary on translation strategies and choices
Talking about personal recovery in bipolar disorder: Integrating health research, natural language processing, and corpus linguistics to analyse peer online support forum posts
Background: Personal recovery, ‘living a satisfying, hopeful and contributing lifeeven with the limitations caused by the illness’ (Anthony, 1993) is of particular value in bipolar disorder where symptoms often persist despite treatment. So far, personal recovery has only been studied in researcher-constructed environments (interviews, focus groups). Support forum posts can serve as a complementary naturalistic data source. Objective: The overarching aim of this thesis was to study personal recovery experiences that people living with bipolar disorder have shared in online support forums through integrating health research, NLP, and corpus linguistics in a mixed methods approach within a pragmatic research paradigm, while considering ethical issues and involving people with lived experience. Methods: This mixed-methods study analysed: 1) previous qualitative evidence on personal recovery in bipolar disorder from interviews and focus groups 2) who self-reports a bipolar disorder diagnosis on the online discussion platform Reddit 3) the relationship of mood and posting in mental health-specific Reddit forums (subreddits) 4) discussions of personal recovery in bipolar disorder subreddits. Results: A systematic review of qualitative evidence resulted in the first framework for personal recovery in bipolar disorder, POETIC (Purpose & meaning, Optimism & hope, Empowerment, Tensions, Identity, Connectedness). Mainly young or middle-aged US-based adults self-report a bipolar disorder diagnosis on Reddit. Of these, those experiencing more intense emotions appear to be more likely to post in mental health support subreddits. Their personal recovery-related discussions in bipolar disorder subreddits primarily focussed on three domains: Purpose & meaning (particularly reproductive decisions, work), Connectedness (romantic relationships, social support), Empowerment (self-management, personal responsibility). Support forum data highlighted personal recovery issues that exclusively or more frequently came up online compared to previous evidence from interviews and focus groups. Conclusion: This project is the first to analyse non-reactive data on personal recovery in bipolar disorder. Indicating the key areas that people focus on in personal recovery when posting freely and the language they use provides a helpful starting point for formal and informal carers to understand the concerns of people diagnosed with bipolar disorder and to consider how best to offer support
The use of scRNA-seq to characterise the tumour microenvironment of high grade serous ovarian carincoma (HGSOC)
High Grade Serous Ovarian Carcinoma (HGSOC) is the most common type of ovarian cancer. Patients with this disease typically experience relapse in their disease following surgical debulking and initially effective chemotherapy. HGSOC has been intensely studied at the genomic and transcriptomic levels in efforts to advance knowledge of the biological mechanisms that drive the behaviour of this malignancy, and so that new treatment strategies may curb the disease progression relapse.
This body of work contributes an optimised protocol for generating robust 10X scRNA-seq libraries from fresh and preserved HGSOC tissue, aiming to dissect the cellular heterogeneity of HGSOC’s Tumour microenvironment (TME). Through unsupervised clustering analysis, it uncovers distinct cellular communities, elucidates transcriptomic signatures across HGSOC tumours, and augments bulk RNA-seq datasets via computational deconvolution, enhancing understanding of HGSOC's cellular complexity across an expanded clinical cohort.
The sequencing and analysis of these HGSOC patient tumours revealed 11 distinct cell types, including 2 that are novel in this tumour type; namely ciliated epithelial cells and metallothionein expressing T-cells. These 11 distinct cell types can be broadly categorised into 3 TME components (Tumour, Stroma and Immune) as in other previous tumour scRNA-seq studies. An additional analysis of these components examined the copy number variation (CNV) in the profiled cells and revealed HGSOC tumour cells to be mostly aneuploid while ciliated epithelial cells were diploid. A novel integrative subcluster analysis of HGSOC aneuploid tumour cells identified several apparently tumourigenic gene expression signatures. These include a KRT17+, protease inhibitory signature, an increased cellular metabolism signature, and an immune-reactive signature. Additionally, a ciliated cluster re-emerged within the HGSOC tumour cells, even though the diploid ciliated epithelial cells were not included in the integrative analysis.
Finally, the high granularity of HGSOC cellular composition revealed by scRNA-seq is utilised to perform deconvolution analyses to estimate cellular proportions and infer the TME of earlier bulk RNA-seq profiled HGSOC tumour samples. This investigation of earlier sequenced HGSOC samples revealed heterogeneity in the proportions of the TME compartments across the patient cohorts. Survival analysis using these inferred cellular proportions suggest that immune cell presence alone is not associated with survival, but metastatic fibroblast burden in tumour samples is significantly associated with worsen overall survival in HGSOC patients.
In conclusion, the laboratory protocol, the scRNA-seq datasets produced, and their analysis and application presented in this work expands the collective knowledge base of HGSOC. Specifically by characterising the cells of the HGSOC tumour microenvironment, and nuances of expression signatures of the malignant cells. The deconvolution approach showcases how scRNA-seq data can expand the clinical utility of earlier RNA-seq HGSOC datasets in a way that is scalable
Performance Analysis Of Data-Driven Algorithms In Detecting Intrusions On Smart Grid
The traditional power grid is no longer a practical solution for power delivery due to several shortcomings, including chronic blackouts, energy storage issues, high cost of assets, and high carbon emissions. Therefore, there is a serious need for better, cheaper, and cleaner power grid technology that addresses the limitations of traditional power grids. A smart grid is a holistic solution to these issues that consists of a variety of operations and energy measures. This technology can deliver energy to end-users through a two-way flow of communication. It is expected to generate reliable, efficient, and clean power by integrating multiple technologies. It promises reliability, improved functionality, and economical means of power transmission and distribution. This technology also decreases greenhouse emissions by transferring clean, affordable, and efficient energy to users. Smart grid provides several benefits, such as increasing grid resilience, self-healing, and improving system performance. Despite these benefits, this network has been the target of a number of cyber-attacks that violate the availability, integrity, confidentiality, and accountability of the network. For instance, in 2021, a cyber-attack targeted a U.S. power system that shut down the power grid, leaving approximately 100,000 people without power. Another threat on U.S. Smart Grids happened in March 2018 which targeted multiple nuclear power plants and water equipment. These instances represent the obvious reasons why a high level of security approaches is needed in Smart Grids to detect and mitigate sophisticated cyber-attacks. For this purpose, the US National Electric Sector Cybersecurity Organization and the Department of Energy have joined their efforts with other federal agencies, including the Cybersecurity for Energy Delivery Systems and the Federal Energy Regulatory Commission, to investigate the security risks of smart grid networks. Their investigation shows that smart grid requires reliable solutions to defend and prevent cyber-attacks and vulnerability issues. This investigation also shows that with the emerging technologies, including 5G and 6G, smart grid may become more vulnerable to multistage cyber-attacks. A number of studies have been done to identify, detect, and investigate the vulnerabilities of smart grid networks. However, the existing techniques have fundamental limitations, such as low detection rates, high rates of false positives, high rates of misdetection, data poisoning, data quality and processing, lack of scalability, and issues regarding handling huge volumes of data. Therefore, these techniques cannot ensure safe, efficient, and dependable communication for smart grid networks. Therefore, the goal of this dissertation is to investigate the efficiency of machine learning in detecting cyber-attacks on smart grids. The proposed methods are based on supervised, unsupervised machine and deep learning, reinforcement learning, and online learning models. These models have to be trained, tested, and validated, using a reliable dataset. In this dissertation, CICDDoS 2019 was used to train, test, and validate the efficiency of the proposed models. The results show that, for supervised machine learning models, the ensemble models outperform other traditional models. Among the deep learning models, densely neural network family provides satisfactory results for detecting and classifying intrusions on smart grid. Among unsupervised models, variational auto-encoder, provides the highest performance compared to the other unsupervised models. In reinforcement learning, the proposed Capsule Q-learning provides higher detection and lower misdetection rates, compared to the other model in literature. In online learning, the Online Sequential Euclidean Distance Routing Capsule Network model provides significantly better results in detecting intrusion attacks on smart grid, compared to the other deep online models
Security and Privacy for Modern Wireless Communication Systems
The aim of this reprint focuses on the latest protocol research, software/hardware development and implementation, and system architecture design in addressing emerging security and privacy issues for modern wireless communication networks. Relevant topics include, but are not limited to, the following: deep-learning-based security and privacy design; covert communications; information-theoretical foundations for advanced security and privacy techniques; lightweight cryptography for power constrained networks; physical layer key generation; prototypes and testbeds for security and privacy solutions; encryption and decryption algorithm for low-latency constrained networks; security protocols for modern wireless communication networks; network intrusion detection; physical layer design with security consideration; anonymity in data transmission; vulnerabilities in security and privacy in modern wireless communication networks; challenges of security and privacy in node–edge–cloud computation; security and privacy design for low-power wide-area IoT networks; security and privacy design for vehicle networks; security and privacy design for underwater communications networks
Proceedings XXIII Congresso SIAMOC 2023
Il congresso annuale della Società Italiana di Analisi del Movimento in Clinica (SIAMOC), giunto quest’anno alla sua ventitreesima edizione, approda nuovamente a Roma.
Il congresso SIAMOC, come ogni anno, è l’occasione per tutti i professionisti che operano nell’ambito dell’analisi del movimento di incontrarsi, presentare i risultati delle proprie ricerche e rimanere aggiornati sulle più recenti innovazioni riguardanti le procedure e le tecnologie per l’analisi del movimento nella pratica clinica.
Il congresso SIAMOC 2023 di Roma si propone l’obiettivo di fornire ulteriore impulso ad una già eccellente attività di ricerca italiana nel settore dell’analisi del movimento e di conferirle ulteriore respiro ed impatto internazionale.
Oltre ai qualificanti temi tradizionali che riguardano la ricerca di base e applicata in ambito clinico e sportivo, il congresso SIAMOC 2023 intende approfondire ulteriori tematiche di particolare interesse scientifico e di impatto sulla società. Tra questi temi anche quello dell’inserimento lavorativo di persone affette da disabilità anche grazie alla diffusione esponenziale in ambito clinico-occupazionale delle tecnologie robotiche collaborative e quello della protesica innovativa a supporto delle persone con amputazione. Verrà infine affrontato il tema dei nuovi algoritmi di intelligenza artificiale per l’ottimizzazione della classificazione in tempo reale dei pattern motori nei vari campi di applicazione
Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2021R1A2C1011198) , (Institute for Information & communications Technology Planning & Evaluation) (IITP) grant funded by the Korea government (MSIT) under the ICT Creative Consilience Program (IITP-2021-2020-0-01821) , and AI Platform to Fully Adapt and Reflect Privacy-Policy Changes (No. 2022-0-00688).Artificial intelligence (AI) is currently being utilized in a wide range of sophisticated applications, but the outcomes of many AI models are challenging to comprehend and trust due to their black-box nature. Usually, it is essential to understand the reasoning behind an AI mode ľs decision-making. Thus, the need for eXplainable AI (XAI) methods for improving trust in AI models has arisen. XAI has become a popular research subject within the AI field in recent years. Existing survey papers have tackled the concepts of XAI, its general terms, and post-hoc explainability methods but there have not been any reviews that have looked at the assessment methods, available tools, XAI datasets, and other related aspects. Therefore, in this comprehensive study, we provide readers with an overview of the current research and trends in this rapidly emerging area with a case study example. The study starts by explaining the background of XAI, common definitions, and summarizing recently proposed techniques in XAI for supervised machine learning. The review divides XAI techniques into four axes using a hierarchical categorization system: (i) data explainability, (ii) model explainability, (iii) post-hoc explainability, and (iv) assessment of explanations. We also introduce available evaluation metrics as well as open-source packages and datasets with future research directions. Then, the significance of explainability in terms of legal demands, user viewpoints, and application orientation is outlined, termed as XAI concerns. This paper advocates for tailoring explanation content to specific user types. An examination of XAI techniques and evaluation was conducted by looking at 410 critical articles, published between January 2016 and October 2022, in reputed journals and using a wide range of research databases as a source of information. The article is aimed at XAI researchers who are interested in making their AI models more trustworthy, as well as towards researchers from other disciplines who are looking for effective XAI methods to complete tasks with confidence while communicating meaning from data.National Research Foundation of Korea
Ministry of Science, ICT & Future Planning, Republic of Korea
Ministry of Science & ICT (MSIT), Republic of Korea
2021R1A2C1011198Institute for Information amp; communications Technology Planning amp; Evaluation) (IITP) - Korea government (MSIT) under the ICT Creative Consilience Program
IITP-2021-2020-0-01821AI Platform to Fully Adapt and Reflect Privacy-Policy Changes2022-0-0068
- …