20 research outputs found

    Skeletonization and segmentation of binary voxel shapes

    Get PDF
    Preface. This dissertation is the result of research that I conducted between January 2005 and December 2008 in the Visualization research group of the Technische Universiteit Eindhoven. I am pleased to have the opportunity to thank a number of people that made this work possible. I owe my sincere gratitude to Alexandru Telea, my supervisor and first promotor. I did not consider pursuing a PhD until my Master’s project, which he also supervised. Due to our pleasant collaboration from which I learned quite a lot, I became convinced that becoming a doctoral student would be the right thing to do for me. Indeed, I can say it has greatly increased my knowledge and professional skills. Alex, thank you for our interesting discussions and the freedom you gave me in conducting my research. You made these four years a pleasant experience. I am further grateful to Jack vanWijk, my second promotor. Our monthly discussions were insightful, and he continuously encouraged me to take a more formal and scientific stance. I would also like to thank Prof. Jan de Graaf from the department of mathematics for our discussions on some of my conjectures. His mathematical rigor was inspiring. I am greatly indebted to the Netherlands Organisation for Scientific Research (NWO) for funding my PhD project (grant number 612.065.414). I thank Prof. Kaleem Siddiqi, Prof. Mark de Berg, and Dr. Remco Veltkamp for taking part in the core doctoral committee and Prof. Deborah Silver and Prof. Jos Roerdink for participating in the extended committee. Our Visualization group provides a great atmosphere to do research in. In particular, I would like to thank my fellow doctoral students Frank van Ham, Hannes Pretorius, Lucian Voinea, Danny Holten, Koray Duhbaci, Yedendra Shrinivasan, Jing Li, NielsWillems, and Romain Bourqui. They enabled me to take my mind of research from time to time, by discussing political and economical affairs, and more trivial topics. Furthermore, I would like to thank the senior researchers of our group, Huub van de Wetering, Kees Huizing, and Michel Westenberg. In particular, I thank Andrei Jalba for our fruitful collaboration in the last part of my work. On a personal level, I would like to thank my parents and sister for their love and support over the years, my friends for providing distractions outside of the office, and Michelle for her unconditional love and ability to light up my mood when needed

    Formalisation et étude de problématiques de scoring en risque de crédit: Inférence de rejet, discrétisation de variables et interactions, arbres de régression logistique

    Get PDF
    This manuscript deals with model-based statistical learning in the binary classification setting. As an application, credit scoring is widely examined with a special attention on its specificities. Proposed and existing approaches are illustrated on real data from Crédit Agricole Consumer Finance, a financial institute specialized in consumer loans which financed this PhD through a CIFRE funding.First, we consider the so-called reject inference problem, which aims at taking advantage of the information collected on rejected credit applicants for which no repayment performance can be observed (i.e. unlabelled observations). This industrial problem led to a research one by reinterpreting unlabelled observations as an information loss that can be compensated by modelling missing data. This interpretation sheds light on existing reject inference methods and allows to conclude that none of them should be recommended since they lack proper modelling assumptions that make them suitable for classical statistical model selection tools.Next, yet another industrial problem, corresponding to the discretization of continuous features or grouping of levels of categorical features before any modelling step, was tackled. This is motivated by practical (interpretability) and theoretical reasons (predictive power). To perform these quantizations, ad hoc heuristics are often used, which are empirical and time-consuming for practitioners. They are seen here as a latent variable problem, setting us back to a model selection problem. The high combinatorics of this model space necessitated a new cost-effective and automatic exploration strategy which involves either a particular neural network architecture or a Stochastic-EM algorithm and gives precise statistical guarantees.Third, as an extension to the preceding problem, interactions of covariates may be introduced in the problem in order to improve the predictive performance. This task, up to now again manually processed by practitioners and highly combinatorial, presents an accrued risk of misselecting a ``good'' model. It is performed here with a Metropolis-Hastings sampling procedure which finds the best interactions in an automatic fashion while ensuring its standard convergence properties, thus good predictive performance is guaranteed.Finally, contrary to the preceding problems which tackled a particular scorecard, we look at the scoring system as a whole. It generally consists of a tree-like structure composed of many scorecards (each relative to a particular population segment), which is often not optimized but rather imposed by the company's culture and / or history. Again, ad hoc industrial procedures are used, which lead to suboptimal performance. We propose some lines of approach to optimize this logistic regression tree which result in good empirical performance and new research directions illustrating the predictive strength and interpretability of a mix of parametric and non-parametric models.This manuscript is concluded by a discussion on potential scientific obstacles, among which the high dimensionality (in the number of features). The financial industry is indeed investing massively in unstructured data storage, which remains to this day largely unused for Credit Scoring applications. Doing so will need statistical guarantees to achieve the additional predictive performance that was hoped for.Cette thèse se place dans le cadre des modèles d’apprentissage automatique de classification binaire. Le cas d’application est le scoring de risque de crédit. En particulier, les méthodes proposées ainsi que les approches existantes sont illustrées par des données réelles de Crédit Agricole Consumer Finance, acteur majeur en Europe du crédit à la consommation, à l’origine de cette thèse grâce à un financement CIFRE.Premièrement, on s’intéresse à la problématique dite de ``réintégration des refusés''. L’objectif est de tirer parti des informations collectées sur les clients refusés, donc par définition sans étiquette connue, quant à leur remboursement de crédit. L’enjeu a été de reformuler cette problématique industrielle classique dans un cadre rigoureux, celui de la modélisation pour données manquantes. Cette approche a permis de donner tout d’abord un nouvel éclairage aux méthodes standards de réintégration, et ensuite de conclure qu’aucune d’entre elles n’était réellement à recommander tant que leur modélisation, lacunaire en l’état, interdisait l’emploi de méthodes de choix de modèles statistiques.Une autre problématique industrielle classique correspond à la discrétisation des variables continues et le regroupement des modalités de variables catégorielles avant toute étape de modélisation. La motivation sous-jacente correspond à des raisons à la fois pratiques (interprétabilité) et théoriques (performance de prédiction). Pour effectuer ces quantifications, des heuristiques, souvent manuelles et chronophages, sont cependant utilisées. Nous avons alors reformulé cette pratique courante de perte d’information comme un problème de modélisation à variables latentes, revenant ainsi à une sélection de modèle. Par ailleurs, la combinatoire associé à cet espace de modèles nous a conduit à proposer des stratégies d’exploration, soit basées sur un réseau de neurone avec un gradient stochastique, soit basées sur un algorithme de type EM stochastique.Comme extension du problème précédent, il est également courant d’introduire des interactions entre variables afin, comme toujours, d’améliorer la performance prédictive des modèles. La pratique classiquement répandue est de nouveau manuelle et chronophage, avec des risques accrus étant donnée la surcouche combinatoire que cela engendre. Nous avons alors proposé un algorithme de Metropolis-Hastings permettant de rechercher les meilleures interactions de façon quasi-automatique tout en garantissant de bonnes performances grâce à ses propriétés de convergence standards.La dernière problématique abordée vise de nouveau à formaliser une pratique répandue, consistant à définir le système d’acceptation non pas comme un unique score mais plutôt comme un arbre de scores. Chaque branche de l’arbre est alors relatif à un segment de population particulier. Pour lever la sous-optimalité des méthodes classiques utilisées dans les entreprises, nous proposons une approche globale optimisant le système d’acceptation dans son ensemble. Les résultats empiriques qui en découlent sont particulièrement prometteurs, illustrant ainsi la flexibilité d’un mélange de modélisation paramétrique et non paramétrique.Enfin, nous anticipons sur les futurs verrous qui vont apparaître en Credit Scoring et qui sont pour beaucoup liés la grande dimension (en termes de prédicteurs). En effet, l’industrie financière investit actuellement dans le stockage de données massives et non structurées, dont la prochaine utilisation dans les règles de prédiction devra s’appuyer sur un minimum de garanties théoriques pour espérer atteindre les espoirs de performance prédictive qui ont présidé à cette collecte

    Multiple Particle Positron Emission Particle Tracking and its Application to Flows in Porous Media

    Get PDF
    Positron emission particle tracking (PEPT) is a method for flow interrogation capable of measurement in opaque systems. In this work a novel method for PEPT is introduced that allows for simultaneous tracking of multiple tracers. This method (M-PEPT) is adapted from optical particle tracking techniques and is designed to track an arbitrary number of positron-emitting tracer-particles entering and leaving the field of view of a detector array. M-PEPT is described, and its applicability is demonstrated for a number of measurements ranging from turbulent shear flow interrogation to cell migration. It is found that this method can locate over 80 particles simultaneously with spatial resolution of order 0.2 mm at tracking frequency of 10 Hz and, at lower particle number densities, can achieve similar spatial resolution at tracking frequency 1000 Hz. The method is limited in its ability to resolve particles approaching close to one another, and suggestions for future improvements are made.M-PEPT is used to study flow in porous media constructed from packing of glass beads of different diameters. Anomalous (i.e. non-Fickian) dispersion of tracers is studied in these systems under the continuous time random walk (CTRW) paradigm. Pore-length transition time distributions are measured, and it is found that in all cases, these distributions indicate the presence of long waiting times between transitions, confirming the central assumption of the CTRW model. All systems demonstrate non-Fickian spreading of tracers at early and intermediate times with a late time recovery of Fickian dispersion, but a clear link between transition time distributions and tracer spreading is not made. Velocity increment statistics are examined, and it is found that temporal velocity increments in the mean-flow direction show a universal scaling. Spatial velocity increments also appear to collapse to a similar form, but there is insufficient data to determine the presence of universal scaling

    Tree-based Density Estimation: Algorithms and Applications

    Get PDF
    Data Mining can be seen as an extension to statistics. It comprises the preparation of data and the process of gathering new knowledge from it. The extraction of new knowledge is supported by various machine learning methods. Many of the algorithms are based on probabilistic principles or use density estimations for their computations. Density estimation has been practised in the field of statistics for several centuries. In the simplest case, a histogram estimator, like the simple equalwidth histogram, can be used for this task and has been shown to be a practical tool to represent the distribution of data visually and for computation. Like other nonparametric approaches, it can provide a flexible solution. However, flexibility in existing approaches is generally restricted because the size of the bins is fixed either the width of the bins or the number of values in them. Attempts have been made to generate histograms with a variable bin width and a variable number of values per interval, but the computational approaches in these methods have proven too difficult and too slow even with modern computer technology. In this thesis new flexible histogram estimation methods are developed and tested as part of various machine learning tasks, namely discretization, naive Bayes classification, clustering and multiple-instance learning. Not only are the new density estimation methods applied to machine learning tasks, they also borrow design principles from algorithms that are ubiquitous in artificial intelligence: divide-andconquer methods are a well known way to tackle large problems by dividing them into small subproblems. Decision trees, used for machine learning classification, successfully apply this approach. This thesis presents algorithms that build density estimators using a binary split tree to cut a range of values into subranges of varying length. No class values are required for this splitting process, making it an unsupervised method. The result is a histogram estimator that adapts well even to complex density functions a novel density estimation method with flexible density estimation ability and good computational behaviour. Algorithms are presented for both univariate and multivariate data. The univariate histogram estimator is applied to discretization for density estimation and also used as density estimator inside a naive Bayes classifier. The multivariate histogram, used as the basis for a clustering method, is applied to improve the runtime behaviour of a well-known algorithm for multiple-instance classification. Performance in these applications is evaluated by comparing the new approaches with existing methods

    LWA 2013. Lernen, Wissen & Adaptivität ; Workshop Proceedings Bamberg, 7.-9. October 2013

    Get PDF
    LWA Workshop Proceedings: LWA stands for "Lernen, Wissen, Adaption" (Learning, Knowledge, Adaptation). It is the joint forum of four special interest groups of the German Computer Science Society (GI). Following the tradition of the last years, LWA provides a joint forum for experienced and for young researchers, to bring insights to recent trends, technologies and applications, and to promote interaction among the SIGs

    Numerical and Experimental Study on Inertial Impactors

    Get PDF
    One of the most important physical properties that defines the behavior of an aerosol particle is its size. Size defines to a great extent how particles behave in physical and chemical processes. Applying experimental and numerical methods, this thesis studies the fundamentals of the operation of impactors, the instruments that are used to measure the size of aerosol particles.The first part of the thesis develops a CFD simulation approach, which is suitable for low pressure impactors and their verification. The CFD model is then used to the study parameters that affect the shape of a low pressure impactor’s collection efficiency curve. The second part focuses on the applications of these findings by introducing two new impactors: a variable nozzle area impactor (VNAI), designed for detailed study of particle behavior in collisions, and a high-resolution low-pressure cascade impactor (HRLPI), used in combination with electrical detection to measure nanoparticle size distribution.Simulations showed that the steepness of the collection efficiency curve depends on the uniformity of the impaction conditions in the impactor jet. Conditions were defined in terms of static pressure, velocity, and particle stopping distance profiles in the cross section of the jet. Uniform impaction conditions and a steep cut-curve were achieved at a short throat, low pressure impactor stage.In the devised VNAI impactor, particles showed very uniform impaction velocities, a fact that was used to examine the critical velocity of the rebound of spherical silver particles. The critical velocities were several orders of magnitude lower than those for micron sized particles. This may be explained by a different material pair used in the experiments and previous studies. The HRLPI was designed based on instrument response simulations to gain maximum information on aerodynamic size distribution and to guarantee robust inversion characteristics in real-time measurement. This was achieved with roughly ten stages per size decade and with slit type, short-throat nozzles.This thesis sheds light on some still unanswered questions in impactor theory and successfully applies the theory to practise by introducing new high resolution impactors for nanoparticle research.<br/

    Detecting and Evaluating Therapy Induced Changes in Radiomics Features Measured from Non-Small Cell Lung Cancer to Predict Patient Outcomes

    Get PDF
    The purpose of this study was to investigate whether radiomics features measured from weekly 4-dimensional computed tomography (4DCT) images of non-small cell lung cancers (NSCLC) change during treatment and if those changes are prognostic for patient outcomes or dependent on treatment modality. Radiomics features are quantitative metrics designed to evaluate tumor heterogeneity from routine medical imaging. Features that are prognostic for patient outcome could be used to monitor tumor response and identify high-risk patients for adaptive treatment. This would be especially valuable for NSCLC due to the high prevalence and mortality of this disease. A novel process was designed to select feature-specific image preprocessing and remove features that were not robust to differences in CT model or tumor volumes. These features were then measured from weekly 4DCT images. These features were evaluated to determine at which point in treatment they first begin changing if those changes were different for patients treated with protons versus photons. A subset of features demonstrated significant changes by the second or third week of treatment, however changes were never significantly different between patient groups. Delta-radiomics features were defined as relative net changes, linear regression slopes, and end of treatment feature values. Features were then evaluated in univariate and multivariate models for overall survival, distant metastases, and local-regional recurrence. In general, the delta-radiomics features were not more prognostic than models built using clinical factors or features at pre-treatment. However one shape descriptor measured at pre-treatment significantly improved model fit and performance for overall survival and distant metastases. Additionally for local-regional recurrence, the only significant covariate was texture strength measured at the end of treatment. A separate study characterized radiomics feature variability in cone-beam CT images to increased scatter, increased motion, and different scanners. Features were affected by all three parameters and specifically by motion amplitudes greater than 1 cm. This study resulted in strong evidence that a set of robust radiomics features change significantly during treatment. While these changes were not prognostic or dependent on treatment modality, future studies may benefit from the methodologies described here to explore delta-radiomics in alternative tumor sites or imaging modalities

    On The Fluid Dynamics of Virtual Impaction and The Design of a Slit Aerosol Sampler

    Get PDF
    It has been long established that Reynolds number effects can lead to flow instabilities and/or transition from laminar to turbulent flow regimes. The nature of free shear jets is well understood and heavily covered in the fluid mechanics literature. On the other hand, the study of confined nozzles presents some challenges and is still a developing area of research. In this work, we focus on quasi-impinging jets, such as the ones feeding into a virtual impactor. Virtual impactors are popular, inexpensive aerosol collection devices capable of separating airborne solid particles. Recently they found increased application in areas that require concentration of dilute aerosols, such as biological-laden flows. In essence, this research is motivated by the need to fundamentally understand the fluid-particle interaction mechanisms entailed during virtual impaction. To this end, we rely on theoretical insight gained by numerical analysis of the classical equations within a one-way coupled Lagrangian framework. In the first part of this investigation we perform a direct transient simulation of the two-dimensional incompressible Navier-Stokes equations for air as the carrier phase. The momentum and continuity equations are solved by FLUENT. The solutions of three separate computations with jet Reynolds numbers equal to 350, 2100, and 3500 are analyzed. The 2-D time-mean results established the nature of the jet potential core and clarifications about the role of the Reynolds number were proposed. Transient analysis deciphered the characteristics of the mirrored Kelvin-Helmholtz instability, along with particle-eddy interaction mechanisms. In the second part we perform a large eddy simulation (LES) on a domain of a real-life sampler. The Lagrangian dynamic residual stress model is implemented and validated for two canonical turbulent flows. The newly contrived code is then applied to the study of a prototype device. A three-dimensional growth mechanism is proposed for the jet mixing layers. The Lagrangian dynamic model LES exhibited significant regions of high subgrid turbulent viscosity, compared to the dynamic Lilly-model simulation, and we were able to identify the origin, and learn the dynamics of five key coherent structures dominant during transition. Comparison with preliminary experimental data for the aerosol separation efficiency showed fairly good agreement

    From clinics to methods and back: a tale of amyloid-PET quantification

    Get PDF
    The in-vivo assessment of cerebral amyloid load is taking a leading role in the early differential diagnosis of neurodegenerative diseases. With the hopefully near introduction of disease-modifying drugs, we expect a paradigm shift in the current diagnostic pathway with an unprecedented surge in the request of exams and detailed analysis
    corecore