26 research outputs found

    Survival trees: a pathway among features and open issues of the main R packages

    Get PDF
    Survival analysis aims to study the occurrence of a particular event during a follow-up period. Recently, many machine learning methods have been used for analyzing right-censored data. Among these, survival trees are a useful tool of recursive partitioning for defining homogeneous groups in terms of survival probability. However, there are still some unclear points on how to work with these methods from a theoretical and practical point of view. Indeed, even if there are a lot of proposed methods, many of these present little documentation and there does not exist an harmonization of all these proposals. This work aims to shed light on the topic and to provide a practical guide for simulating survival data, fitting survival trees and evaluating their performance with the statistical software R

    Survival trees: a pathway among features and open issues of the main R packages

    Get PDF
    Survival analysis aims to study the occurrence of a particular event during a follow-up period. Recently, many machine learning methods have been used for analyzing right-censored data. Among these, survival trees are a useful tool of recursive partitioning for defining homogeneous groups in terms of survival probability. However, there are still some unclear points on how to work with these methods from a theoretical and practical point of view. Indeed, even if there are a lot of proposed methods, many of these present little documentation and there does not exist an harmonization of all these proposals. This work aims to shed light on the topic and to provide a practical guide for simulating survival data, fitting survival trees and evaluating their performance with the statistical software R

    DAily time use, Physical Activity, quality of care and interpersonal relationships in patients with Schizophrenia spectrum disorders (DiAPASon): an Italian multicentre study

    Get PDF
    Background: Schizophrenia spectrum disorders (SSD) are ranked among the leading causes of disabilities worldwide. Many people with SSD spend most of their daily time being inactive, and this is related to the severity of negative symptoms. Here, we present the 3-year DiAPAson project aimed at (1) evaluating the daily time use among patients with SSD living in Residential Facilities (RFs) compared to outpatients with SSD and to the general population (Study 1); (2) evaluating the quality of staff-patient relationships, its association with specific patient outcomes and the quality of care provided in RFs (Study 2); and (3) assessing daily activity patterns in residential patients, outpatients with SSD and healthy controls using real-time methodologies (Study 3). Methods: Study 1 will include 300 patients with SSD living in RFs and 300 outpatients; data obtained in these clinical populations will be compared with normative data obtained by the National Institute of Statistics (ISTAT) in the national survey on daily time use. Time use assessments will consist of daily diaries asking participants to retrospectively report time spent in different activities. In Study 2, a series of questionnaires will be administered to 300 residential patients (recruited for Study 1) to evaluate the quality of care and staff-patient relationships, level of well-being and burnout of RFs' staff, and quality of RFs using a European standardized questionnaire (QuIRC-SA). In Study 3, the daily time use will be evaluated in a subgroup of 50 residential patients, 50 outpatients and 50 healthy controls using the Experience Sampling Method approach (participants will complete a brief questionnaire -about time use, mood and perceived energy- on a smartphone 8 times a day for 1 week) to compare retrospective and real-time reports. Moreover, their level of physical activity, sleep patterns, and energy expenditure will be monitored through a multi-sensor device. Discussion: This project is highly innovative because it combines different types of assessments (i.e., retrospective and real-time reports; multi-sensor monitoring) to trace an accurate picture of daily time use and levels of physical activity that will help identify the best therapeutic options promoting daily activities and physical exercise in patients with SSD. Trial registration: ISRCTN registry ID ISRCTN21141466

    Neurocognition and social cognition in patients with schizophrenia spectrum disorders with and without a history of violence: results of a multinational European study

    Get PDF
    Objective: Neurocognitive impairment has been extensively studied in people with schizophrenia spectrum disorders and seems to be one of the major determinants of functional outcome in this clinical population. Data exploring the link between neuropsychological deficits and the risk of violence in schizophrenia has been more inconsistent. In this study, we analyse the differential predictive potential of neurocognition and social cognition to discriminate patients with schizophrenia spectrum disorders with and without a history of severe violence. Methods: Overall, 398 (221 cases and 177 controls) patients were recruited in forensic and general psychiatric settings across five European countries and assessed using a standardized battery. Results: Education and processing speed were the strongest discriminators between forensic and non-forensic patients, followed by emotion recognition. In particular, increased accuracy for anger recognition was the most distinctive feature of the forensic group. Conclusions: These results may have important clinical implications, suggesting potential enhancements of the assessment and treatment of patients with schizophrenia spectrum disorders with a history of violence, who may benefit from consideration of socio-cognitive skills commonly neglected in ordinary clinical practice

    Recursive Partitioning for Survival Data

    No full text
    During the years many machine learning methods have been introduced for analyzing survival data. Among these, survival trees are a useful method for defining homogeneous groups according to their survival probability. In this context there are still some unclear points, both related to theoretical and practical issues in model fitting and performance evaluation. The aim of this contribution is to shed light on some of these points

    Statistical Models and Machine Learning for Survival Data Analysis

    No full text
    L'argomento principale di questa tesi è l'analisi della sopravvivenza, un insieme di metodi utilizzati negli studi longitudinali in cui l'interesse non è solo nel verificarsi (o meno) di un particolare evento, ma anche nel tempo necessario per osservarlo. Negli anni sono stati inizialmente proposti dei modelli statistici e, in seguito, sono stati introdotti anche metodi di machine learning per affrontare studi di analisi di sopravvivenza. La prima parte del lavoro fornisce un'introduzione ai concetti di base dell'analisi di sopravvivenza e un'ampia rassegna della letteratura esistente. Nello specifico, particolare attenzione è stata posta sui principali modelli statistici (non parametrici, semiparametrici e parametrici) e, tra i metodi di machine learning, sugli alberi e sulle random forests di sopravvivenza. Per questi metodi sono state descritte le principali proposte introdotte negli ultimi decenni. Nella seconda parte della tesi sono invece stati riportati i miei contributi di ricerca. Questi lavori si sono concentrati principalmente su due obiettivi: (1) la razionalizzazione in un protocollo unificato dell'approccio computazionale, che ad oggi è basato su diversi pacchetti esistenti con poca documentazione, molti punti ancora oscuri e anche alcuni bug, e (2) l'applicazione di metodi di analisi dei dati di sopravvivenza in un contesto insolito in cui, per quanto ne sappiamo, questo approccio non era mai stato utilizzato. Nello specifico il primo contributo è consistito nella scrittura di un tutorial volto a permettere a coloro che sono interessati di utilizzare questi metodi, facendo ordine tra i molti pacchetti esistenti e risolvendo i molteplici problemi computazionali presenti. Esso affronta i principali passi da seguire quando si vuole condurre uno studio di simulazione, con particolare attenzione a: (i) simulazione dei dati di sopravvivenza, (ii) adattamento del modello e (iii) valutazione della performance. Il secondo contributo è invece basato sull'applicazione di metodi di analisi di sopravvivenza, sia modelli statistici che algoritmi di machine learning, per analizzare le prestazioni offensive dei giocatori della National Basketball Association (NBA). In particolare, è stata effettuata una procedura di selezione delle variabili per determinare le principali variabili associate alla probabilità di superare un determinato numero di punti fatti durante la parte di stagione successiva all'All-Stars game e il tempo necessario per farlo. Concludendo, questa tesi si propone di porre le basi per lo sviluppo di un framework unificato in grado di armonizzare gli approcci frammentati esistenti e privo di errori computazionali. Inoltre, i risultati di questa tesi suggeriscono che un approccio di analisi di sopravvivenza può essere esteso anche a nuovi contesti.The main topic of this thesis is survival analysis, a collection of methods used in longitudinal studies in which the interest is not only in the occurrence (or not) of a particular event, but also in the time needed for observing it. Over the years, firstly statistical models and then machine learning methods have been proposed to address studies of survival analysis. The first part of the work provides an introduction to the basic concepts of survival analysis and an extensive review of the existing literature. In particular, the focus has been set on the main statistical models (nonparametric, semiparametric and parametric) and, among machine learning methods, on survival trees and random survival forests. For these methods the main proposals introduced during the last decades have been described. In the second part of the thesis, instead, my research contributions have been reported. These works mainly focused on two aims: (1) the rationalization into a unified protocol of the computational approach, which nowadays is based on several existing packages with few documentation, several still obscure points and also some bugs, and (2) the application of survival data analysis methods in an unusual context where, to our best knowledge, this approach had never been used. In particular, the first contribution consisted in the writing of a tutorial aimed to enable the interested users to approach these methods, making order among the many existing algorithms and packages and providing solutions to the several related computational issues. It dealt with the main steps to follow when a simulation study is carried out, paying attention to: (i) survival data simulation, (ii) model fitting and (iii) performance assessment. The second contribution was based on the application of survival analysis methods, both statistical models and machine learning algorithms, for analyzing the offensive performance of the National Basketball Association (NBA) players. In particular, variable selection has been performed for determining the main variables associated to the probability of exceeding a given amount of scored points during the post All-Stars game season segment and the time needed for doing it. Concluding, this thesis proposes to lay the ground for the development of a unified framework able to harmonize the existing fragmented approaches and without computational issues. Moreover, the findings of this thesis suggest that a survival analysis approach can be extended also to new contexts

    Survival trees: a pathway among features and open issues of the main R packages

    No full text
    Survival analysis aims to study the occurrence of a particular event during a follow-up period. Recently, many machine learning methods have been used for analyzing right-censored data. Among these, survival trees are a useful tool of recursive partitioning for defining homogeneous groups in terms of survival probability. However, there are still some unclear points on how to work with these methods from a practical point of view. Indeed, even if there are a lot of proposed methods, many of these present little documentation, mainly concerning the corresponding R functions. Moreover, there does not exist an harmonization of all these proposals. This work aims to shed light on the topic and to provide a practical guide for simulating survival data, fitting survival trees and evaluating their performance with the statistical software R

    Which achievements are associated with a better offensive performance in NBA? A survival analysis study

    No full text
    Data analytics spread consistently over the years for answering several questions in all the fields, included sports. The aim of this work is to analyze the offensive performance of NBA players in terms of the amount of minutes taken to exceed a given point threshold during the post All-Star season segment. The final goal is to perform variable selection, identifying which are the main players’ achievements that significantly impact the outcome. Survival analysis methods, in particular Cox regression and LASSO Cox have been used. Results suggest that attempting a higher number of two- and three-point shots, having been selected for the All-Stars game and gaining more double doubles increase the probability of exceeding the threshold and of doing it in a shorter time. Another interesting result regards the number of steals, the only variable selected by the models related to defense and game construction, which was negatively associated to the outcome

    Multivariate Statistical Techniques to Manage Multiple Data in Psychology

    No full text
    Introduction:In big-data contexts, multivariate statistical techniques and machine learning methods play a crucial role for the assessment of the interrelations between and within sets of variables. In particular, in social and behavioural sciences, for which the exploration of patterns and mutual interrelation among subject features is needed, a proper use of this technique becomes paramount.Methods:A series of multivariate techniques –clustering, decision trees, principal component, multiple correspondence, partial least discriminate analysis –was applied to a sample of patients with diagnosis of borderline personality disorder (BPD)and bipolar disorder (BD), in order to outline specific socio-demographic and clinical profiles for both the diagnoses.Results:Although the BPD and BD patients are clinically blurred, some features appeared to well discriminate between the two diagnoses. BPD patients are more probably females who have shown self-harm behaviours and/or suicide attempts, while BD are more likely to be males who have never shown self-harm behaviours and have not attempted suicide. Moreover, the assessment variables with more discriminate power were BIS-11, SCL-90 and STAI-T. In particular, patients with SCL-90 total score <36 were more probably BD patients (probability p=87%); whereas patients with SCL-90 score 36 and a BIS-11 score 64 were more probably BPD patients (p=83%).Conclusions:The application of multivariate statistical analyses and machine learning techniques allows the definition of specific clinical and diagnostic profiles that can be crucial for taking adequately charge of the patients in a context of precision medicine andan ad-hoc diagnostic and care pattern

    Multivariate Statistical Techniques to Manage Multiple Data in Psychology

    No full text
    Introduction: In big-data contexts, multivariate statistical techniques and machine learning methods play a crucial role for the assessment of the interrelations between and within sets of variables. In particular, in social and behavioural sciences, for which the exploration of patterns and mutual interrelation among subject features is needed, a proper use of this technique becomes paramount. Methods: A series of multivariate techniques –clustering, decision trees, principal component, multiple correspondence, partial least discriminate analysis –was applied to a sample of patients with diagnosis of borderline personality disorder (BPD)and bipolar disorder (BD), in order to outline specific socio-demographic and clinical profiles for both the diagnoses. Results: Although the BPD and BD patients are clinically blurred, some features appeared to well discriminate between the two diagnoses. BPD patients are more probably females who have shown self-harm behaviours and/or suicide attempts, while BD are more likely to be males who have never shown self-harm behaviours and have not attempted suicide. Moreover, the assessment variables with more discriminate power were BIS-11, SCL-90 and STAI-T. In particular, patients with SCL-90 total score = 36 and a BIS-11 score >=64 were more probably BPD patients (p=83%). Conclusions: The application of multivariate statistical analyses and machine learning techniques allows the definition of specific clinical and diagnostic profiles that can be crucial for taking adequately charge of the patients in a context of precision medicine and an ad-hoc diagnostic and care pattern
    corecore