4,188 research outputs found

    Estimating Anthropometric Marker Locations from 3-D LADAR Point Clouds

    Get PDF
    An area of interest for improving the identification portion of the system is in extracting anthropometric markers from a Laser Detection and Ranging (LADAR) point cloud. Analyzing anthropometrics markers is a common means of studying how a human moves and has been shown to provide good results in determining certain demographic information about the subject. This research examines a marker extraction method utilizing principal component analysis (PCA), self-organizing maps (SOM), alpha hulls, and basic anthropometric knowledge. The performance of the extraction algorithm is tested by performing gender classification with the calculated markers

    Probabilistic performance estimators for computational chemistry methods: Systematic Improvement Probability and Ranking Probability Matrix. I. Theory

    Full text link
    The comparison of benchmark error sets is an essential tool for the evaluation of theories in computational chemistry. The standard ranking of methods by their Mean Unsigned Error is unsatisfactory for several reasons linked to the non-normality of the error distributions and the presence of underlying trends. Complementary statistics have recently been proposed to palliate such deficiencies, such as quantiles of the absolute errors distribution or the mean prediction uncertainty. We introduce here a new score, the systematic improvement probability (SIP), based on the direct system-wise comparison of absolute errors. Independently of the chosen scoring rule, the uncertainty of the statistics due to the incompleteness of the benchmark data sets is also generally overlooked. However, this uncertainty is essential to appreciate the robustness of rankings. In the present article, we develop two indicators based on robust statistics to address this problem: P_{inv}, the inversion probability between two values of a statistic, and \mathbf{P}_{r}, the ranking probability matrix. We demonstrate also the essential contribution of the correlations between error sets in these scores comparisons

    Genz and Mendell-Elston Estimation of the High-Dimensional Multivariate Normal Distribution

    Get PDF
    Statistical analysis of multinomial data in complex datasets often requires estimation of the multivariate normal (MVN) distribution for models in which the dimensionality can easily reach 10–1000 and higher. Few algorithms for estimating the MVN distribution can offer robust and efficient performance over such a range of dimensions. We report a simulation-based comparison of two algorithms for the MVN that are widely used in statistical genetic applications. The venerable Mendell- Elston approximation is fast but execution time increases rapidly with the number of dimensions, estimates are generally biased, and an error bound is lacking. The correlation between variables significantly affects absolute error but not overall execution time. The Monte Carlo-based approach described by Genz returns unbiased and error-bounded estimates, but execution time is more sensitive to the correlation between variables. For ultra-high-dimensional problems, however, the Genz algorithm exhibits better scale characteristics and greater time-weighted efficiency of estimation

    Pareto Smoothed Importance Sampling

    Full text link
    Importance weighting is a general way to adjust Monte Carlo integration to account for draws from the wrong distribution, but the resulting estimate can be noisy when the importance ratios have a heavy right tail. This routinely occurs when there are aspects of the target distribution that are not well captured by the approximating distribution, in which case more stable estimates can be obtained by modifying extreme importance ratios. We present a new method for stabilizing importance weights using a generalized Pareto distribution fit to the upper tail of the distribution of the simulated importance ratios. The method, which empirically performs better than existing methods for stabilizing importance sampling estimates, includes stabilized effective sample size estimates, Monte Carlo error estimates and convergence diagnostics.Comment: Major revision: 1) proofs for consistency, finite variance, and asymptotic normality, 2) justification of k<0.7 with theoretical computational complexity analysis, 3) major rewrit

    Sensitivity analysis and related analysis: A survey of statistical techniques

    Get PDF
    This paper reviews the state of the art in five related types of analysis, namely (i) sensitivity or what-if analysis, (ii) uncertainty or risk analysis, (iii) screening, (iv) validation, and (v) optimization. The main question is: when should which type of analysis be applied; which statistical techniques may then be used? This paper distinguishes the following five stages in the analysis of a simulation model. 1) Validation: the availability of data on the real system determines which type of statistical technique to use for validation. 2) Screening: in the simulation's pilot phase the really important inputs can be identified through a novel technique, called sequential bifurcation, which uses aggregation and sequential experimentation. 3) Sensitivity analysis: the really important inputs should be This approach with its five stages implies that sensitivity analysis should precede uncertainty analysis. This paper briefly discusses several case studies for each phase.Experimental Design;Statistical Methods;Regression Analysis;Risk Analysis;Least Squares;Sensitivity Analysis;Optimization;Perturbation;statistics

    Wrapper algorithms and their performance assessment on high-dimensional molecular data

    Get PDF
    Prediction problems on high-dimensional molecular data, e.g. the classification of microar- ray samples into normal and cancer tissues, are complex and ill-posed since the number of variables usually exceeds the number of observations by orders of magnitude. Recent research in the area has propagated a variety of new statistical models in order to handle these new biological datasets. In practice, however, these models are always applied in combination with preprocessing and variable selection methods as well as model selection which is mostly performed by cross-validation. Varma and Simon (2006) have used the term ‘wrapper-algorithm’ for this integration of preprocessing and model selection into the construction of statistical models. Additionally, they have proposed the method of nested cross-validation (NCV) as a way of estimating their prediction error which has evolved to the gold-standard by now. In the first part, this thesis provides further theoretical and empirical justification for the usage of NCV in the context of wrapper-algorithms. Moreover, a computationally less intensive alternative to NCV is proposed which can be motivated in a decision theoretic framework. The new method can be interpreted as a smoothed variant of NCV and, in contrast to NCV, guarantees intuitive bounds for the estimation of the prediction error. The second part focuses on the ranking of wrapper algorithms. Cross-study-validation is proposed as an alternative concept to the repetition of separated within-study-validations if several similar prediction problems are available. The concept is demonstrated using six different wrapper algorithms for survival prediction on censored data on a selection of eight breast cancer datasets. Additionally, a parametric bootstrap approach for simulating realistic data from such related prediction problems is described and subsequently applied to illustrate the concept of cross-study-validation for the ranking of wrapper algorithms. Eventually, the last part approaches computational aspects of the analyses and simula- tions performed in the thesis. The preprocessing before the analysis as well as the evaluation of the prediction models requires the usage of large computing resources. Parallel comput- ing approaches are illustrated on cluster, cloud and high performance computing resources using the R programming language. Usage of heterogeneous hardware and processing of large datasets are covered as well as the implementation of the R-package survHD for the analysis and evaluation of high-dimensional wrapper algorithms for survival prediction from censored data.Prädiktionsprobleme für hochdimensionale genetische Daten, z.B. die Klassifikation von Proben in normales und Krebsgewebe, sind komplex und unterbestimmt, da die Anzahl der Variablen die Anzahl der Beobachtungen um ein Vielfaches übersteigt. Die Forschung hat auf diesem Gebiet in den letzten Jahren eine Vielzahl an neuen statistischen Meth- oden hervorgebracht. In der Praxis werden diese Algorithmen jedoch stets in Kombination mit Vorbearbeitung und Variablenselektion sowie Modellwahlverfahren angewandt, wobei letztere vorwiegend mit Hilfe von Kreuzvalidierung durchgeführt werden. Varma und Simon (2006) haben den Begriff ’Wrapper-Algorithmus’ für eine derartige Einbet- tung von Vorbearbeitung und Modellwahl in die Konstruktion einer statistischen Methode verwendet. Zudem haben sie die genestete Kreuzvalidierung (NCV) als eine Methode zur Sch ̈atzung ihrer Fehlerrate eingeführt, welche sich mittlerweile zum Goldstandard entwickelt hat. Im ersten Teil dieser Doktorarbeit, wird eine tiefergreifende theoretische Grundlage sowie eine empirische Rechtfertigung für die Anwendung von NCV bei solchen ’Wrapper-Algorithmen’ vorgestellt. Außerdem wird eine alternative, weniger computerintensive Methode vorgeschlagen, welche im Rahmen der Entscheidungstheorie motiviert wird. Diese neue Methode kann als eine gegl ̈attete Variante von NCV interpretiert wer- den und hält im Gegensatz zu NCV intuitive Grenzen bei der Fehlerratenschätzung ein. Der zweite Teil behandelt den Vergleich verschiedener ’Wrapper-Algorithmen’ bzw. das Sch ̈atzen ihrer Reihenfolge gem ̈aß eines bestimmten Gütekriteriums. Als eine Alterna- tive zur wiederholten Durchführung von Kreuzvalidierung auf einzelnen Datensätzen wird das Konzept der studienübergreifenden Validierung vorgeschlagen. Das Konzept wird anhand von sechs verschiedenen ’Wrapper-Algorithmen’ für die Vorhersage von Uberlebenszeiten bei acht Brustkrebsstudien dargestellt. Zusätzlich wird ein Bootstrapverfahren beschrieben, mit dessen Hilfe man mehrere realistische Datens ̈atze aus einer Menge von solchen verwandten Prädiktionsproblemen generieren kann. Der letzte Teil beleuchtet schließlich computationale Verfahren, die bei der Umsetzung der Analysen in dieser Dissertation eine tragende Rolle gespielt haben. Die Vorbearbeitungsschritte sowie die Evaluation der Prädiktionsmodelle erfordert die extensive Nutzung von Computerressourcen. Es werden Ansätze zum parallelen Rechnen auf Cluster-, Cloud- und Hochleistungsrechen- ressourcen unter der Verwendung der Programmiersprache R beschrieben. Die Benutzung von heterogenen Hardwarearchitekturen, die Verarbeitung von großen Datensätzen sowie die Entwicklung des R-Pakets survHD für die Analyse und Evaluierung von ’Wrapper- Algorithmen’ zur Uberlebenszeitenanalyse werden thematisiert

    Information theoretic novelty detection

    Get PDF
    We present a novel approach to online change detection problems when the training sample size is small. The proposed approach is based on estimating the expected information content of a new data point and allows an accurate control of the false positive rate even for small data sets. In the case of the Gaussian distribution, our approach is analytically tractable and closely related to classical statistical tests. We then propose an approximation scheme to extend our approach to the case of the mixture of Gaussians. We evaluate extensively our approach on synthetic data and on three real benchmark data sets. The experimental validation shows that our method maintains a good overall accuracy, but significantly improves the control over the false positive rate

    Investigation of the Effects of Image Signal-to-Noise Ratio on TSPO PET Quantification of Neuroinflammation

    Get PDF
    Neuroinflammation may be imaged using positron emission tomography (PET) and the tracer [11C]-PK11195. Accurate and precise quantification of 18 kilodalton Translocator Protein (TSPO) binding parameters in the brain has proven difficult with this tracer, due to an unfavourable combination of low target concentration in tissue, low brain uptake of the tracer and relatively high non-specific binding, all of which leads to higher levels of relative image noise. To address these limitations, research into new radioligands for the TSPO, with higher brain uptake and lower non-specific binding relative to [11C]-PK11195, is being conducted world-wide. However, factors other than radioligand properties are known to influence signal-to-noise ratio in quantitative PET studies, including the scanner sensitivity, image reconstruction algorithms and data analysis methodology. The aim of this thesis was to investigate and validate computational tools for predicting image noise in dynamic TSPO PET studies, and to employ those tools to investigate the factors that affect image SNR and reliability of TSPO quantification in the human brain. The feasibility of performing multiple (n≥40) independent Monte Carlo simulations for each dynamic [11C]-PK11195 frame- with realistic modelling of the radioactivity source, attenuation and PET tomograph geometries- was investigated. A Beowulf-type high performance computer cluster, constructed from commodity components, was found to be well suited to this task. Timing tests on a single desktop computer system indicated that a computer cluster capable of simulating an hour-long dynamic [11C]-PK11195 PET scan, with 40 independent repeats, and with a total simulation time of less than 6 weeks, could be constructed for less than 10,000 Australian dollars. A computer cluster containing 44 computing cores was therefore assembled, and a peak simulation rate of 2.84x105 photon pairs per second was achieved using the GEANT4 Application for Tomographic Emission (GATE) Monte Carlo simulation software. A simulated PET tomograph was developed in GATE that closely modelled the performance characteristics of several real-world clinical PET systems in terms of spatial resolution, sensitivity, scatter fraction and counting rate performance. The simulated PET system was validated using adaptations of the National Electrical Manufacturers Association (NEMA) quality assurance procedures within GATE. Image noise in dynamic TSPO PET scans was estimated by performing n=40 independent Monte Carlo simulations of an hour-long [11C]-PK11195 scan, and of an hour- long dynamic scan for a hypothetical TSPO ligand with double the brain activity concentration of [11C]-PK11195. From these data an analytical noise model was developed that allowed image noise to be predicted for any combination of brain tissue activity concentration and scan duration. The noise model was validated for the purpose of determining the precision of kinetic parameter estimates for TSPO PET. An investigation was made into the effects of activity concentration in tissue, radionuclide half-life, injected dose and compartmental model complexity on the reproducibility of kinetic parameters. Injecting 555 MBq of carbon-11 labelled TSPO tracer produced similar binding parameter precision to 185 MBq of fluorine-18, and a moderate (20%) reduction in precision was observed for the reduced carbon-11 dose of 370 MBq. Results indicated that a factor of 2 increase in frame count level (relative to [11C]-PK11195, and due for example to higher ligand uptake, injected dose or absolute scanner sensitivity) is required to obtain reliable binding parameter estimates for small regions of interest when fitting a two-tissue compartment, four-parameter compartmental model. However, compartmental model complexity had a similarly large effect, with the reduction of model complexity from the two-tissue compartment, four-parameter to a one-tissue compartment, two-parameter model producing a 78% reduction in coefficient of variation of the binding parameter estimates at each tissue activity level and region size studied. In summary, this thesis describes the development and validation of Monte Carlo methods for estimating image noise in dynamic TSPO PET scans, and analytical methods for predicting relative image noise for a wide range of tissue activity concentration and acquisition durations. The findings of this research suggest that a broader consideration of the kinetic properties of novel TSPO radioligands, with a view to selection of ligands that are potentially amenable to analysis with a simple one-tissue compartment model, is at least as important as efforts directed towards reducing image noise, such as higher brain uptake, in the search for the next generation of TSPO PET tracers
    • …
    corecore