881 research outputs found

    Data- og ekspertdreven variabelseleksjon for prediktive modeller i helsevesenet : mot økt tolkbarhet i underbestemte maskinlæringsproblemer

    Get PDF
    Modern data acquisition techniques in healthcare generate large collections of data from multiple sources, such as novel diagnosis and treatment methodologies. Some concrete examples are electronic healthcare record systems, genomics, and medical images. This leads to situations with often unstructured, high-dimensional heterogeneous patient cohort data where classical statistical methods may not be sufficient for optimal utilization of the data and informed decision-making. Instead, investigating such data structures with modern machine learning techniques promises to improve the understanding of patient health issues and may provide a better platform for informed decision-making by clinicians. Key requirements for this purpose include (a) sufficiently accurate predictions and (b) model interpretability. Achieving both aspects in parallel is difficult, particularly for datasets with few patients, which are common in the healthcare domain. In such cases, machine learning models encounter mathematically underdetermined systems and may overfit easily on the training data. An important approach to overcome this issue is feature selection, i.e., determining a subset of informative features from the original set of features with respect to the target variable. While potentially raising the predictive performance, feature selection fosters model interpretability by identifying a low number of relevant model parameters to better understand the underlying biological processes that lead to health issues. Interpretability requires that feature selection is stable, i.e., small changes in the dataset do not lead to changes in the selected feature set. A concept to address instability is ensemble feature selection, i.e. the process of repeating the feature selection multiple times on subsets of samples of the original dataset and aggregating results in a meta-model. This thesis presents two approaches for ensemble feature selection, which are tailored towards high-dimensional data in healthcare: the Repeated Elastic Net Technique for feature selection (RENT) and the User-Guided Bayesian Framework for feature selection (UBayFS). While RENT is purely data-driven and builds upon elastic net regularized models, UBayFS is a general framework for ensembles with the capabilities to include expert knowledge in the feature selection process via prior weights and side constraints. A case study modeling the overall survival of cancer patients compares these novel feature selectors and demonstrates their potential in clinical practice. Beyond the selection of single features, UBayFS also allows for selecting whole feature groups (feature blocks) that were acquired from multiple data sources, as those mentioned above. Importance quantification of such feature blocks plays a key role in tracing information about the target variable back to the acquisition modalities. Such information on feature block importance may lead to positive effects on the use of human, technical, and financial resources if systematically integrated into the planning of patient treatment by excluding the acquisition of non-informative features. Since a generalization of feature importance measures to block importance is not trivial, this thesis also investigates and compares approaches for feature block importance rankings. This thesis demonstrates that high-dimensional datasets from multiple data sources in the medical domain can be successfully tackled by the presented approaches for feature selection. Experimental evaluations demonstrate favorable properties of both predictive performance, stability, as well as interpretability of results, which carries a high potential for better data-driven decision support in clinical practice.Moderne datainnsamlingsteknikker i helsevesenet genererer store datamengder fra flere kilder, som for eksempel nye diagnose- og behandlingsmetoder. Noen konkrete eksempler er elektroniske helsejournalsystemer, genomikk og medisinske bilder. Slike pasientkohortdata er ofte ustrukturerte, høydimensjonale og heterogene og hvor klassiske statistiske metoder ikke er tilstrekkelige for optimal utnyttelse av dataene og god informasjonsbasert beslutningstaking. Derfor kan det være lovende å analysere slike datastrukturer ved bruk av moderne maskinlæringsteknikker for å øke forståelsen av pasientenes helseproblemer og for å gi klinikerne en bedre plattform for informasjonsbasert beslutningstaking. Sentrale krav til dette formålet inkluderer (a) tilstrekkelig nøyaktige prediksjoner og (b) modelltolkbarhet. Å oppnå begge aspektene samtidig er vanskelig, spesielt for datasett med få pasienter, noe som er vanlig for data i helsevesenet. I slike tilfeller må maskinlæringsmodeller håndtere matematisk underbestemte systemer og dette kan lett føre til at modellene overtilpasses treningsdataene. Variabelseleksjon er en viktig tilnærming for å håndtere dette ved å identifisere en undergruppe av informative variabler med hensyn til responsvariablen. Samtidig som variabelseleksjonsmetoder kan lede til økt prediktiv ytelse, fremmes modelltolkbarhet ved å identifisere et lavt antall relevante modellparametere. Dette kan gi bedre forståelse av de underliggende biologiske prosessene som fører til helseproblemer. Tolkbarhet krever at variabelseleksjonen er stabil, dvs. at små endringer i datasettet ikke fører til endringer i hvilke variabler som velges. Et konsept for å adressere ustabilitet er ensemblevariableseleksjon, dvs. prosessen med å gjenta variabelseleksjon flere ganger på en delmengde av prøvene i det originale datasett og aggregere resultater i en metamodell. Denne avhandlingen presenterer to tilnærminger for ensemblevariabelseleksjon, som er skreddersydd for høydimensjonale data i helsevesenet: "Repeated Elastic Net Technique for feature selection" (RENT) og "User-Guided Bayesian Framework for feature selection" (UBayFS). Mens RENT er datadrevet og bygger på elastic net-regulariserte modeller, er UBayFS et generelt rammeverk for ensembler som muliggjør inkludering av ekspertkunnskap i variabelseleksjonsprosessen gjennom forhåndsbestemte vekter og sidebegrensninger. En case-studie som modellerer overlevelsen av kreftpasienter sammenligner disse nye variabelseleksjonsmetodene og demonstrerer deres potensiale i klinisk praksis. Utover valg av enkelte variabler gjør UBayFS det også mulig å velge blokker eller grupper av variabler som representerer de ulike datakildene som ble nevnt over. Kvantifisering av viktigheten av variabelgrupper spiller en nøkkelrolle for forståelsen av hvorvidt datakildene er viktige for responsvariablen. Tilgang til slik informasjon kan føre til at bruken av menneskelige, tekniske og økonomiske ressurser kan forbedres dersom informasjonen integreres systematisk i planleggingen av pasientbehandlingen. Slik kan man redusere innsamling av ikke-informative variabler. Siden generaliseringen av viktighet av variabelgrupper ikke er triviell, undersøkes og sammenlignes også tilnærminger for rangering av viktigheten til disse variabelgruppene. Denne avhandlingen viser at høydimensjonale datasett fra flere datakilder fra det medisinske domenet effektivt kan håndteres ved bruk av variabelseleksjonmetodene som er presentert i avhandlingen. Eksperimentene viser at disse kan ha positiv en effekt på både prediktiv ytelse, stabilitet og tolkbarhet av resultatene. Bruken av disse variabelseleksjonsmetodene bærer et stort potensiale for bedre datadrevet beslutningsstøtte i klinisk praksis

    Fully-Automated Packaging Structure Recognition of Standardized Logistics Assets on Images

    Get PDF
    Innerhalb einer logistischen Lieferkette müssen vielfältige Transportgüter an zahlreichen Knotenpunkten bearbeitet, wiedererkannt und kontrolliert werden. Dabei ist oft ein großer manueller Aufwand erforderlich, um die Paketidentität oder auch die Packstruktur zu erkennen oder zu verifizieren. Solche Schritte sind notwendig, um beispielsweise eine Lieferung auf ihre Vollständigkeit hin zu überprüfen. Wir untersuchen die Konzeption und Implementierung eines Verfahrens zur vollständigen Automatisierung der Erkennung der Packstruktur logistischer Sendungen. Ziel dieses Verfahrens ist es, basierend auf einem einzigen Farbbild, eine oder mehrere Transporteinheiten akkurat zu lokalisieren und relevante Charakteristika, wie beispielsweise die Gesamtzahl oder die Anordnung der enthaltenen Packstücke, zu erkennen. Wir stellen eine aus mehreren Komponenten bestehende Bildverarbeitungs-Pipeline vor, die diese Aufgabe der Packstrukturerkennung lösen soll. Unsere erste Implementierung des Verfahrens verwendet mehrere Deep Learning Modelle, genauer gesagt Convolutional Neural Networks zur Instanzsegmentierung, sowie Bildverarbeitungsmethoden und heuristische Komponenten. Wir verwenden einen eigenen Datensatz von Echtbildern aus einer Logistik-Umgebung für Training und Evaluation unseres Verfahrens. Wir zeigen, dass unsere Lösung in der Lage ist, die korrekte Packstruktur in etwa 85% der Testfälle unseres Datensatzes zu erkennen, und sogar eine höhere Genauigkeit erzielt wird, wenn nur die meist vorkommenden Packstücktypen betrachtet werden. Für eine ausgewählte Bilderkennungs-Komponente unseres Algorithmus vergleichen wir das Potenzial der Verwendung weniger rechenintensiver, eigens designter Bildverarbeitungsmethoden mit den zuvor implementierten Deep Learning Verfahren. Aus dieser Untersuchung schlussfolgern wir die bessere Eignung der lernenden Verfahren, welche wir auf deren sehr gute Fähigkeit zur Generalisierung zurückführen. Außerdem formulieren wir das Problem der Objekt-Lokalisierung in Bildern anhand selbst gewählter Merkmalspunkte, wie beispielsweise Eckpunkte logistischer Transporteinheiten. Ziel hiervon ist es, Objekte präziser zu lokalisieren, als dies insbesondere im Vergleich zur Verwendung herkömmlicher umgebender Rechtecke möglich ist, während gleichzeitig die Objektform durch bekanntes Vorwissen zur Objektgeometrie forciert wird. Wir stellen ein spezifisches Deep Learning Modell vor, welches die beschriebene Aufgabe löst im Fall von Objekten, welche durch vier Eckpunkte beschrieben werden können. Das dabei entwickelte Modell mit Namen TetraPackNet wird evaluiert mittels allgemeiner und anwendungsbezogener Metriken. Wir belegen die Anwendbarkeit der Lösung im Falle unserer Bilderkennungs-Pipeline und argumentieren die Relevanz für andere Anwendungsfälle, wie beispielweise Kennzeichenerkennung

    Contributions to improve the technologies supporting unmanned aircraft operations

    Get PDF
    Mención Internacional en el título de doctorUnmanned Aerial Vehicles (UAVs), in their smaller versions known as drones, are becoming increasingly important in today's societies. The systems that make them up present a multitude of challenges, of which error can be considered the common denominator. The perception of the environment is measured by sensors that have errors, the models that interpret the information and/or define behaviors are approximations of the world and therefore also have errors. Explaining error allows extending the limits of deterministic models to address real-world problems. The performance of the technologies embedded in drones depends on our ability to understand, model, and control the error of the systems that integrate them, as well as new technologies that may emerge. Flight controllers integrate various subsystems that are generally dependent on other systems. One example is the guidance systems. These systems provide the engine's propulsion controller with the necessary information to accomplish a desired mission. For this purpose, the flight controller is made up of a control law for the guidance system that reacts to the information perceived by the perception and navigation systems. The error of any of the subsystems propagates through the ecosystem of the controller, so the study of each of them is essential. On the other hand, among the strategies for error control are state-space estimators, where the Kalman filter has been a great ally of engineers since its appearance in the 1960s. Kalman filters are at the heart of information fusion systems, minimizing the error covariance of the system and allowing the measured states to be filtered and estimated in the absence of observations. State Space Models (SSM) are developed based on a set of hypotheses for modeling the world. Among the assumptions are that the models of the world must be linear, Markovian, and that the error of their models must be Gaussian. In general, systems are not linear, so linearization are performed on models that are already approximations of the world. In other cases, the noise to be controlled is not Gaussian, but it is approximated to that distribution in order to be able to deal with it. On the other hand, many systems are not Markovian, i.e., their states do not depend only on the previous state, but there are other dependencies that state space models cannot handle. This thesis deals a collection of studies in which error is formulated and reduced. First, the error in a computer vision-based precision landing system is studied, then estimation and filtering problems from the deep learning approach are addressed. Finally, classification concepts with deep learning over trajectories are studied. The first case of the collection xviiistudies the consequences of error propagation in a machine vision-based precision landing system. This paper proposes a set of strategies to reduce the impact on the guidance system, and ultimately reduce the error. The next two studies approach the estimation and filtering problem from the deep learning approach, where error is a function to be minimized by learning. The last case of the collection deals with a trajectory classification problem with real data. This work completes the two main fields in deep learning, regression and classification, where the error is considered as a probability function of class membership.Los vehículos aéreos no tripulados (UAV) en sus versiones de pequeño tamaño conocidos como drones, van tomando protagonismo en las sociedades actuales. Los sistemas que los componen presentan multitud de retos entre los cuales el error se puede considerar como el denominador común. La percepción del entorno se mide mediante sensores que tienen error, los modelos que interpretan la información y/o definen comportamientos son aproximaciones del mundo y por consiguiente también presentan error. Explicar el error permite extender los límites de los modelos deterministas para abordar problemas del mundo real. El rendimiento de las tecnologías embarcadas en los drones, dependen de nuestra capacidad de comprender, modelar y controlar el error de los sistemas que los integran, así como de las nuevas tecnologías que puedan surgir. Los controladores de vuelo integran diferentes subsistemas los cuales generalmente son dependientes de otros sistemas. Un caso de esta situación son los sistemas de guiado. Estos sistemas son los encargados de proporcionar al controlador de los motores información necesaria para cumplir con una misión deseada. Para ello se componen de una ley de control de guiado que reacciona a la información percibida por los sistemas de percepción y navegación. El error de cualquiera de estos sistemas se propaga por el ecosistema del controlador siendo vital su estudio. Por otro lado, entre las estrategias para abordar el control del error se encuentran los estimadores en espacios de estados, donde el filtro de Kalman desde su aparición en los años 60, ha sido y continúa siendo un gran aliado para los ingenieros. Los filtros de Kalman son el corazón de los sistemas de fusión de información, los cuales minimizan la covarianza del error del sistema, permitiendo filtrar los estados medidos y estimarlos cuando no se tienen observaciones. Los modelos de espacios de estados se desarrollan en base a un conjunto de hipótesis para modelar el mundo. Entre las hipótesis se encuentra que los modelos del mundo han de ser lineales, markovianos y que el error de sus modelos ha de ser gaussiano. Generalmente los sistemas no son lineales por lo que se realizan linealizaciones sobre modelos que a su vez ya son aproximaciones del mundo. En otros casos el ruido que se desea controlar no es gaussiano, pero se aproxima a esta distribución para poder abordarlo. Por otro lado, multitud de sistemas no son markovianos, es decir, sus estados no solo dependen del estado anterior, sino que existen otras dependencias que los modelos de espacio de estados no son capaces de abordar. Esta tesis aborda un compendio de estudios sobre los que se formula y reduce el error. En primer lugar, se estudia el error en un sistema de aterrizaje de precisión basado en visión por computador. Después se plantean problemas de estimación y filtrado desde la aproximación del aprendizaje profundo. Por último, se estudian los conceptos de clasificación con aprendizaje profundo sobre trayectorias. El primer caso del compendio estudia las consecuencias de la propagación del error de un sistema de aterrizaje de precisión basado en visión artificial. En este trabajo se propone un conjunto de estrategias para reducir el impacto sobre el sistema de guiado, y en última instancia reducir el error. Los siguientes dos estudios abordan el problema de estimación y filtrado desde la perspectiva del aprendizaje profundo, donde el error es una función que minimizar mediante aprendizaje. El último caso del compendio aborda un problema de clasificación de trayectorias con datos reales. Con este trabajo se completan los dos campos principales en aprendizaje profundo, regresión y clasificación, donde se plantea el error como una función de probabilidad de pertenencia a una clase.I would like to thank the Ministry of Science and Innovation for granting me the funding with reference PRE2018-086793, associated to the project TEC2017-88048-C2-2-R, which provide me the opportunity to carry out all my PhD. activities, including completing an international research internship.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Antonio Berlanga de Jesús.- Secretario: Daniel Arias Medina.- Vocal: Alejandro Martínez Cav

    Data analysis with merge trees

    Get PDF
    Today’s data are increasingly complex and classical statistical techniques need growingly more refined mathematical tools to be able to model and investigate them. Paradigmatic situations are represented by data which need to be considered up to some kind of trans- formation and all those circumstances in which the analyst finds himself in the need of defining a general concept of shape. Topological Data Analysis (TDA) is a field which is fundamentally contributing to such challenges by extracting topological information from data with a plethora of interpretable and computationally accessible pipelines. We con- tribute to this field by developing a series of novel tools, techniques and applications to work with a particular topological summary called merge tree. To analyze sets of merge trees we introduce a novel metric structure along with an algorithm to compute it, define a framework to compare different functions defined on merge trees and investigate the metric space obtained with the aforementioned metric. Different geometric and topolog- ical properties of the space of merge trees are established, with the aim of obtaining a deeper understanding of such trees. To showcase the effectiveness of the proposed metric, we develop an application in the field of Functional Data Analysis, working with functions up to homeomorphic reparametrization, and in the field of radiomics, where each patient is represented via a clustering dendrogram

    Single-cell analysis of cell competition using quantitative microscopy and machine learning

    Get PDF
    Cell competition is a widely conserved, fundamental biological quality control mechanism. The cell competition assay of MDCK wild-type versus mutant MDCK Scribble-knockdown (ScribKD) relies on a mechanical mechanism of competition, which posits that the emergence of compressing stresses within the tissue at high confluency drive the competitive outcome. According to this mechanism, proliferating wild-type cells out-compete mutant ScribKD cells, resulting in their apoptosis and apical extrusion. Previous studies show that there is an increased division rate of wild-type cells in neighbourhoods with high numbers of ScribKD cells, but what still remains a mystery is whether this is a cause or consequence of increased apoptosis in the “loser” cell population. This project also interrogated the competitive assay of wild-type versus RasV12 , which is hypothesized to operate on a biochemical mechanism and results in the apical extrusion (but not apoptosis) of the loser RasV12 population. For both these mechanisms of competition it is still unknown which population of cells are driving the winner/loser outcome. Is the winner cell proliferation prompting the loser cell demise? Or is an autonomous loser elimination prompting a subsequent winner cell proliferation? In my research, I have employed multi-modal, time-lapse microscopy to image competition assays continuously for several days. These data were then segmented into wild-type or mutant instances using a Convolutional Neural Network (CNN) that can differentiate between the cell types, after which they were tracked across cellular generations using a Bayesian multi-object tracker. A conjugate analysis of fluorescent cell-cycle indicator probes was then utilised to automatically identify key time points of cellular fate commitment using deep-learning image classification. A spatio-temporal analysis was then conducted in order to quantify any correlation between wild-type proliferation and mutant cell demise. For the case of wild-type versus ScribKD , there was no clear evidence for the wild-type cells mitoses directly impacting upon the ScribKD cell apoptotic elimination. Instead, a subsequent analysis found that a more subtle mechanism of pre-emptive, local density increases around the apoptosis site appeared to be determining the eventual ScribKD fate. On the other hand, there was clear evidence of a direct impact of wild-type mitoses on the subsequent apical extrusion and competitive elimination of RasV12 cells. Both of these conclusions agree with the prevailing classification of cell competition types: mechanical interactions are more diffuse and occur over a larger spatio-temporal domain, whereas biochemical interactions are constrained to nearest neighbour cells. The hypothesized density-dependency of ScribKD elimination was further quantified on a single-cell scale by these analyses, as well as a potential new understanding of RasV12 extrusion. Most interestingly, it appears that there is a clear biophysical mechanism to the elimination in the biochemical RasV12 cell competition. This suggests that perhaps a new semantic approach is needed in the field of cell competition in order to accurately classify different mechanisms of elimination

    Morris Catalog 2023-2025

    Get PDF
    This document serves as an official historical record for a specific period in time. The information found is subject to change without notice. Colleges and departments make changes to their degree requirements and course descriptions frequently. More information is available at catalogs.umn.edu.https://digitalcommons.morris.umn.edu/catalog/1034/thumbnail.jp
    corecore