77 research outputs found

    Detection of concealed cars in complex cargo X-ray imagery using Deep Learning

    Get PDF
    BACKGROUND: Non-intrusive inspection systems based on X-ray radiography techniques are routinely used at transport hubs to ensure the conformity of cargo content with the supplied shipping manifest. As trade volumes increase and regulations become more stringent, manual inspection by trained operators is less and less viable due to low throughput. Machine vision techniques can assist operators in their task by automating parts of the inspection workflow. Since cars are routinely involved in trafficking, export fraud, and tax evasion schemes, they represent an attractive target for automated detection and flagging for subsequent inspection by operators. OBJECTIVE: Development and evaluation of a novel method for the automated detection of cars in complex X-ray cargo imagery. METHODS: X-ray cargo images from a stream-of-commerce dataset were classified using a window-based scheme. The limited number of car images was addressed by using an oversampling scheme. Different Convolutional Neural Network (CNN) architectures were compared with well-established bag of words approaches. In addition, robustness to concealment was evaluated by projection of objects into car images. RESULTS: CNN approaches outperformed all other methods evaluated, achieving 100% car image classification rate for a false positive rate of 1-in-454. Cars that were partially or completely obscured by other goods, a modus operandi frequently adopted by criminals, were correctly detected. CONCLUSIONS: We believe that this level of performance suggests that the method is suitable for deployment in the field. It is expected that the generic object detection workflow described can be extended to other object classes given the availability of suitable training data

    Rail-volution: building livable communities with transit

    Get PDF
    ReportRail-volution 2005, September 8-10, 2005. Salt Lake City, Utah. Workshop Summary

    Collaborative Targeted Maximum Likelihood Estimation

    Get PDF
    Collaborative double robust targeted maximum likelihood estimators represent a fundamental further advance over standard targeted maximum likelihood estimators of causal inference and variable importance parameters. The targeted maximum likelihood approach involves fluctuating an initial density estimate, (Q), in order to make a bias/variance tradeoff targeted towards a specific parameter in a semi-parametric model. The fluctuation involves estimation of a nuisance parameter portion of the likelihood, g. TMLE and other double robust estimators have been shown to be consistent and asymptotically normally distributed (CAN) under regularity conditions, when either one of these two factors of the likelihood of the data is correctly specified. In this article we provide a template for applying collaborative targeted maximum likelihood estimation (C-TMLE) to the estimation of pathwise differentiable parameters in semi-parametric models. The procedure creates a sequence of candidate targeted maximum likelihood estimators based on an initial estimate for Q coupled with a succession of increasingly non-parametric estimates for g. In a departure from current state of the art nuisance parameter estimation, C-TMLE estimates of g are constructed based on a loss function for the relevant factor Q_0, instead of a loss function for the nuisance parameter itself. Likelihood-based cross-validation is used to select the best estimator among all candidate TMLE estimators in this sequence. A penalized-likelihood loss function for Q_0 is suggested when the parameter of interest is borderline-identifiable. We present theoretical results for collaborative double robustness, demonstrating that the collaborative targeted maximum likelihood estimator is CAN when Q and g are both mis-specified, providing that g solves a specified score equation implied by the difference between the Q and the true Q_0. This marks an improvement over the current definition of double robustness in the estimating equation literature. We also establish an asymptotic linearity theorem for the C-DR-TMLE of the target parameter, showing that the C-DR-TMLE is more adaptive to the truth, and, as a consequence, can even be super efficient if the first stage density estimator does an excellent job itself with respect to the target parameter. This research provides a template for targeted efficient and robust loss-based learning of a particular target feature of the probability distribution of the data within large (infinite dimensional) semi-parametric models, while still providing statistical inference in terms of confidence intervals and p-values. This research also breaks with a taboo (e.g., in the propensity score literature in the field of causal inference) on using the relevant part of likelihood to fine-tune the fitting of the nuisance parameter/censoring mechanism/treatment mechanism

    Targeted Maximum Likelihood Learning

    Get PDF
    Suppose one observes a sample of independent and identically distributed observations from a particular data generating distribution. Suppose that one has available an estimate of the density of the data generating distribution such as a maximum likelihood estimator according to a given or data adaptively selected model. Suppose that one is concerned with estimation of a particular pathwise differentiable Euclidean parameter. A substitution estimator evaluating the parameter of the density estimator is typically too biased and might not even converge at the parametric rate: that is, the density estimator was targeted to be a good estimator of the density and might therefore result in a poor estimator of a particular smooth functional of the density. In this article we propose a one step (and, by iteration, k-th step) targeted maximum likelihood density estimator which involves 1) creating a hardest parametric submodel with parameter epsilon through the given density estimator with score equal to the efficient influence curve of the pathwise differentiable parameter at the density estimator, 2) estimating this parameter epsilon with the maximum likelihood estimator, and 3) defining a new density estimator as the corresponding update of the original density estimator. We show that iteration of this algorithm results in a targeted maximum likelihood density estimator which solves the efficient influence curve estimating equation and thereby yields an efficient or locally efficient estimator of the parameter of interest under regularity conditions. We also show that, if the parameter is linear and the model is convex, then the targeted maximum likelihood estimator is often achieved in the first step, and it results in a locally efficient estimator at an arbitrary (e.g., heavily misspecified) starting density. This tool provides us with a new class of targeted likelihood based estimators of pathwise differentiable parameters. We also show that the targeted maximum likelihood estimators are now in full agreement with the locally efficient estimating function methodology as presented in Robins and Rotnitzky (1992) and van der Laan and Robins (2003), creating, in particular, algebraic equivalence between the double robust locally efficient estimators using the targeted maximum likelihood estimators as an estimate of its nuisance parameters, and targeted maximum likelihood estimators. In addition, it is argued that the targeted MLE has various advantages relative to the current estimating function based approach.We proceed by providing data driven methodologies to select the initial density estimator for the targeted MLE, thereby providing data adaptive targeted maximum likelihood estimation methodology. Finally, we show that targeted maximum likelihood estimation can be generalized to estimate any kind of parameter, such as infinite dimensional non-pathwise differentiable parameters, by restricting the likelihood and cross-validated log-likelihood to targeted candidate density estimators only. We illustrate the method with various worked out examples

    Nouveaux algorithmes et optimisations pour le développement de modèles interactives

    Get PDF
    In this thesis, we propose a human-in-the-loop framework for efficient model development over large data sets. In this framework, we aim to apply active learning algorithms to select a small sequence of data instances for the user to label and derive an accurate model while, at the same time, offering interactive performance in presenting the next data instance for reviewing. However, existing active learning techniques often fail to provide satisfactory performance when built over large data sets. Not only such models often require hundreds of labeled data instances to reach high accuracy, but retrieving the next instance to label can be time-consuming, making it incompatible with the interactive nature of the human exploration process. To address these issues, we propose the following contributions:1) A new theoretical framework that allows for an efficient implementation of the Generalized Binary Search strategy over kernel classifiers. Compared to previous work, our framework offers both strong theoretical guarantees on performance and efficient implementation in time and space.2) An optimized VS algorithm called OptVS that uses the hit-and-run algorithm for sampling the version space. We also develop a range of sampling optimizations to improve both sample quality and running time. In practice, we observe that OptVS achieves similar or better performance than state-of-the-art version space algorithms while running under 1 second per iteration at all times.3) A new algorithm that leverages the factorization structure provided by the user to create subspaces and factorizes the version space accordingly to perform active learning in the subspaces. We also provide theoretical results on the optimality of our factorized VS algorithm and optimizations for dealing with categorical variables. Our evaluation results show that, for all user patterns considered, our factorized VS algorithm outperforms non-factorized active learners as well as DSM, another factorization-aware algorithm, often by a wide margin while maintaining interactive speed.4) Following the intuitive reasoning behind the user decision-making process, we develop a new human-inspired classification algorithm, called the Factorized Linear Model (FLM), that decomposes the user interest as a combination of low-dimensional convex objects, resulting in an accurate, efficient, and interpretable classifier. In practice, we observe that the FLM classifier achieves comparable or better performance than SVM and another interpretable model, VIPR, over the majority of user interest patterns while taking only a few minutes to train over a large data set of nearly one million points.5) A novel automatically factorized active learning strategy called the Swapping Algorithm. This technique initially employs OptVS to escape the slow convergence of initial iterations and then swaps to an FLM-based strategy to take advantage of its higher classification accuracy. Our evaluation shows that the Swapping Algorithm achieves similar or better performance than non-factorized active learners while approximating the explicitly factorized methods.Dans cette thèse, nous proposons un cadre "human-in-the-loop" pour le développement efficace de modèles sur de grands ensembles de données. Dans ce cadre, nous visons à appliquer des algorithmes d'apprentissage actif pour sélectionner une petite séquence d'instances de données que l'utilisateur doit étiqueter et dériver un modèle précis, tout en offrant une performance interactive en présentant l'instance de données suivante pour examen. Cependant, les techniques d'apprentissage actif existantes ne parviennent souvent pas à fournir des performances satisfaisantes lorsqu'elles sont construites sur de grands ensembles de données. Non seulement ces modèles nécessitent souvent des centaines d'instances de données étiquetées pour atteindre un niveau de précision élevé, mais la récupération de l'instance suivante à étiqueter peut prendre beaucoup de temps, ce qui la rend incompatible avec la nature interactive du processus d'exploration humain. Pour résoudre ces problèmes, nous proposons les contributions suivantes :1) Un nouveau cadre théorique qui permet une mise en œuvre efficace de la stratégie de recherche binaire généralisée sur les classificateurs à noyau. Par rapport aux travaux précédents, notre cadre offre à la fois de solides garanties théoriques sur les performances et une mise en œuvre efficace en temps et en espace.2) Un algorithme VS optimisé appelé OptVS qui utilise l'algorithme hit-and-run pour échantillonner l'espace des versions. Nous développons également une série d'optimisations de l'échantillonnage pour améliorer à la fois la qualité de l'échantillon et le temps d'exécution. Dans la pratique, nous observons qu'OptVS atteint des performances similaires ou supérieures à celles des algorithmes d'espace de version de l'état de l'art tout en fonctionnant en permanence en dessous d'une seconde par itération.3) Un nouvel algorithme qui exploite la structure de factorisation fournie par l'utilisateur pour créer des sous-espaces et factoriser l'espace de version en conséquence pour effectuer un apprentissage actif dans les sous-espaces. Nous fournissons également des résultats théoriques sur l'optimalité de notre algorithme VS factorisé et des optimisations pour traiter les variables catégorielles. Nos résultats d'évaluation montrent que, pour tous les modèles d'utilisateurs considérés, notre algorithme VS factorisé surpasse les apprenants actifs non factorisés ainsi que DSM, un autre algorithme prenant en compte la factorisation, souvent par une large marge tout en maintenant la vitesse interactive.4) En suivant le raisonnement intuitif derrière le processus de prise de décision de l'utilisateur, nous développons un nouvel algorithme de classification inspiré par l'homme, appelé le modèle linéaire factorisé (FLM), qui décompose l'intérêt de l'utilisateur comme une combinaison d'objets convexes de faible dimension, résultant en un classificateur précis, efficace et interprétable. En pratique, nous observons que le classificateur FLM atteint des performances comparables ou supérieures à celles du SVM et d'un autre modèle interprétable, VIPR, pour la majorité des modèles d'intérêt de l'utilisateur, tout en prenant seulement quelques minutes pour s'entraîner sur un grand ensemble de données de près d'un million de points.5) Une nouvelle stratégie d'apprentissage actif factorisé automatiquement appelée l'algorithme de permutation. Cette technique utilise initialement OptVS pour échapper à la convergence lente des itérations initiales, puis passe à une stratégie basée sur FLM pour profiter de sa précision de classification supérieure. Notre évaluation montre que l'algorithme de permutation atteint des performances similaires ou supérieures à celles des apprenants actifs non factorisés tout en s'approchant des méthodes explicitement factorisées

    Structural characterization of meningococcal vaccine antigen NadA and of its transcriptional regulator NadR in ligand-bound and free forms.

    Get PDF
    Serogroup B Neisseria meningitidis (MenB) is the cause of the invasive meningococcal disease (IMD). Bexsero is the first genome-derived vaccine against MenB. Neisserial adhesin A (NadA) is one of the three protein antigens included in Bexsero. The main aim of this work was to obtain detailed insights into the structure of vaccine NadA variant 3 (NadAv3) and into the molecular mechanisms governing its transcriptional regulation by NadR (Neisseria adhesin A Regulator). A deep understanding of nadA expression is important for understanding the contribution of NadA to vaccine-induced protection against meningococcal disease. NadA expression is regulated by the ligand-responsive transcriptional repressor NadR. The functional, biochemical and high-resolution structural characterization of NadR is presented in the first part of the thesis (Part One). These studies provide detailed insights into how small molecule ligands, such as hydroxyphenylacetate derivatives, found in relevant host niches, modulate the structure and activity of NadR, by ‘conformational selection’ of inactive forms. In the second part of the thesis (Part Two), strategies involving both protein engineering and crystal manipulation to increase the likelihood of solving the crystal structure of NadAv3 are described. The first approach was the rational design of new constructs of NadAv3, based on the recently solved crystal structure of a close sequence variant (NadAv5). Then, a comprehensive set of biochemical, biophysical and structural techniques were applied to investigate all the generated NadAv3 constructs. The well-characterized trimeric NadAv3 constructs represented a set of high quality reagents which were validated as probes for functional studies and as a platform for continued attempts for protein crystallization. Mutagenesis studies and screenings to identify a new crystal form of NadAv3 were performed to improve crystal quality; structure determination is ongoing. The atomic resolution structure of NadAv3 will help to understand its biological role as both an adhesin and a vaccine antigen

    Readings in Targeted Maximum Likelihood Estimation

    Get PDF
    This is a compilation of current and past work on targeted maximum likelihood estimation. It features the original targeted maximum likelihood learning paper as well as chapters on super (machine) learning using cross validation, randomized controlled trials, realistic individualized treatment rules in observational studies, biomarker discovery, case-control studies, and time-to-event outcomes with censored data, among others. We hope this collection is helpful to the interested reader and stimulates additional research in this important area

    Medical University of South Carolina and Medical University Hospital Authority Board of Trustees annual President’s report

    Get PDF
    These regularly published reports contain information related to education, faculty, research, health, and enterprise at the Medical University of South Carolina
    • …
    corecore