7 research outputs found

    A theoretical and methodological framework for machine learning in survival analysis: Enabling transparent and accessible predictive modelling on right-censored time-to-event data

    Get PDF
    Survival analysis is an important field of Statistics concerned with mak- ing time-to-event predictions with ‘censored’ data. Machine learning, specifically supervised learning, is the field of Statistics concerned with using state-of-the-art algorithms in order to make predictions on unseen data. This thesis looks at unifying these two fields as current research into the two is still disjoint, with ‘classical survival’ on one side and su- pervised learning (primarily classification and regression) on the other. This PhD aims to improve the quality of machine learning research in survival analysis by focusing on transparency, accessibility, and predic- tive performance in model building and evaluation. This is achieved by examining historic and current proposals and implementations for models and measures (both classical and machine learning) in survival analysis and making novel contributions. In particular this includes: i) a survey of survival models including a crit- ical and technical survey of almost all supervised learning model classes currently utilised in survival, as well as novel adaptations; ii) a survey of evaluation measures for survival models, including key definitions, proofs and theorems for survival scoring rules that had previously been missing from the literature; iii) introduction and formalisation of composition and reduction in survival analysis, with a view on increasing transparency of modelling strategies and improving predictive performance; iv) imple- mentation of several R software packages, in particular mlr3proba for machine learning in survival analysis; and v) the first large-scale bench- mark experiment on right-censored time-to-event data with 24 survival models and 66 datasets. Survival analysis has many important applications in medical statistics, engineering and finance, and as such requires the same level of rigour as other machine learning fields such as regression and classification; this thesis aims to make this clear by describing a framework from prediction and evaluation to implementation

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF

    Suggested approach for establishing a rehabilitation engineering information service for the state of California

    Get PDF
    An ever expanding body of rehabilitation engineering technology is developing in this country, but it rarely reaches the people for whom it is intended. The increasing concern of state and federal departments of rehabilitation for this technology lag was the stimulus for a series of problem-solving workshops held in California during 1977. As a result of the workshops, the recommendation emerged that the California Department of Rehabilitation take the lead in the development of a coordinated delivery system that would eventually serve the entire state and be a model for similar systems across the nation

    Sensing and Visualizing Social Context from Spatial Proximity

    Get PDF
    The concept of pervasive computing, as introduced by Marc Weiser under the name ubiquitous computing in the early 90s, spurred research into various kinds of context-aware systems and applications. There is a wide range of contextual parameters, including location, time, temperature, devices and people in proximity, which have been part of the initial ideas about context-aware computing. While locational context is already a well understood concept, social context---based on the people around us---proves to be harder to grasp and to operationalize. This work continues the line of research into social context, which is based on the proximity and meeting patterns of people in the physical space. It takes this research out of the lab and out of well controlled situations into our urban environments, which are full of ambiguity and opportunities. The key to this research is the tool that caused dramatic change in individual and collective behavior during the last 20 years and which is a manifestation of many of the ideas of the pervasive computing paradigm: the mobile phone. In this work, the mobile is regarded as a proxy for people. Through it, the social environment becomes accessible to digital measurement and processing. To understand the large amount of data that now becomes available to automatic measurement, we will turn to the discipline of social network analysis. It provides powerful methods, that are able to condense data and extract relevant meaning. Visualization helps to understand and interpret the results. This thesis contains a number of experiments, that demonstrate how the automatic measurement of social proximity data through Bluetooth can be used to measure variables of personal behavior, group behavior and the behavior of groups in relation to places. The principal contributions are: * A methodology to visualize personal social context by using an ego proximity network. Specific episodes can be localized and compared. * method to compare different days in terms of social context, e.g. to support automatic diary applications. * A method to compose social geographic maps. Locations of similar social context are detected and combined. * Functions to measure short-term changes in social activity, based on the distinction between strange and familiar devices. * The characterization of Bluetooth inquiries for social proximity sensing. * A dataset of Bluetooth sightings from an ego perspective in seven different settings. Additionally, some settings feature multiple stationary scanners and Cell-ID measurements. * Soft- and hardware to capture, collect, store and analyze Bluetooth proximity data

    Reliability and Maintenance of Structures under Severe Uncertainty

    Get PDF
    Maintenance of structures and infrastructures is of increasing importance in order to reach acceptable level of safety despite the unavoidable uncertainty, and the economic efforts have to be reasonable. These two goals represent competing objectives in an overall optimization of very complex system and structure, which involve significant uncertainties. In fact, all civil engineering structures and engineering systems are subjected to degradation by fatigue cracks and corrosion due to varying loads. When the cracks propagate or corrosion grows, the structural system accumulates damage thereby leading to serviceability loss and eventual collapse. These failures can be prevented by appropriate maintenance scheduling and repair, even in the presence of uncertainties of various nature and scale, leading to a reduction in fluctuations and changes of structural and environmental parameters and conditions in the models describing the processes involved in fatigue cracks and corrosion growth. Degradation models used to predict the future state of components often involve simplifications and assumptions to compensate a lack of data, imprecision and vagueness, which cannot be ignored. To overcome these issues, the imprecise probabilities framework and markovian approach are proposed for performing reliability analysis, decision-making, and risk-based design and maintenance. It is shown how these approaches can improve the current practise based on models: B31G, Modified B31G, DNV-101 and Shell-92 failure pressure models. The reliability assessment is performed by taking into account the simultaneous action of many natural and technological loads. These loads are random by nature and can be adequately described only by stochastic processes; which are not performed due to lack of valid calculation methods. This methodology has been applied to study the reliability of arctic pipeline infrastructure. Finally, a robust and efficient probabilistic framework for optimal inspection and maintenance schedule selection for corroded pipelines and fatigue cracks in bridges is presented. Optimal solution is obtained through only one reliability assessment removing huge computational cost of the reliability-base optimization approach and making the analysis of industrial size problem feasible

    Fisher networks: A principled approach to retrieval-based classification

    Get PDF
    Due to the technological advances in the acquisition and processing of information, current data mining applications involve databases of sizes that would be unthinkable just two decades ago. However, real-word datasets are often riddled with irrelevant variables that not only do not generate any meaningful information about the process of interest, but may also obstruct the contribution of the truly informative data features. Taking into consideration the relevance of the different measures available can make the difference between reaching an accurate reflection of the underlying truth and obtaining misleading results that cause the drawing of erroneousconclusions. Another important consideration in data analysis is the interpretability of the models used to fit the data. It is clear that performance must be a key aspect in deciding which methodology to use, but it should not be the only one. Models with an obscure internal operation see their practical usefulness effectively diminished by the difficulty to understand the reasoning behind their inferences, which makes them less appealing to users that are not familiar with their theoretical basis. This thesis proposes a novel framework for the visualisation and categorisation of data in classification contexts that tackles the two issues discussed above and provides an informative output of intuitive interpretation. The system is based on a Fisher information metric that automatically filters the contribution of variables depending on their relevance with respect to the classification problem at hand, measured by their influence on the posterior class probabilities. Fisher distances can then be used to calculate rigorous problem-specific similarity measures, which can be grouped into a pairwise adjacency matrix, thus defining a network. Following this novel construction process results in a principled visualisation of the data organised in communities that highlights the structure of the underlying class membership probabilities. Furthermore, the relational nature of the network can be used to reproduce the probabilistic predictions of the original estimates in a case-based approach, making them explainable by means of known cases in the dataset. The potential applications and usefulness of the framework are illustrated using several real-world datasets, giving examples of the typical output that the end user receives and how they can use it to learn more about the cases of interest as well as about the dataset as a whole
    corecore