4,188 research outputs found
Estimating Anthropometric Marker Locations from 3-D LADAR Point Clouds
An area of interest for improving the identification portion of the system is in extracting anthropometric markers from a Laser Detection and Ranging (LADAR) point cloud. Analyzing anthropometrics markers is a common means of studying how a human moves and has been shown to provide good results in determining certain demographic information about the subject. This research examines a marker extraction method utilizing principal component analysis (PCA), self-organizing maps (SOM), alpha hulls, and basic anthropometric knowledge. The performance of the extraction algorithm is tested by performing gender classification with the calculated markers
Probabilistic performance estimators for computational chemistry methods: Systematic Improvement Probability and Ranking Probability Matrix. I. Theory
The comparison of benchmark error sets is an essential tool for the
evaluation of theories in computational chemistry. The standard ranking of
methods by their Mean Unsigned Error is unsatisfactory for several reasons
linked to the non-normality of the error distributions and the presence of
underlying trends. Complementary statistics have recently been proposed to
palliate such deficiencies, such as quantiles of the absolute errors
distribution or the mean prediction uncertainty. We introduce here a new score,
the systematic improvement probability (SIP), based on the direct system-wise
comparison of absolute errors. Independently of the chosen scoring rule, the
uncertainty of the statistics due to the incompleteness of the benchmark data
sets is also generally overlooked. However, this uncertainty is essential to
appreciate the robustness of rankings. In the present article, we develop two
indicators based on robust statistics to address this problem: P_{inv}, the
inversion probability between two values of a statistic, and \mathbf{P}_{r},
the ranking probability matrix. We demonstrate also the essential contribution
of the correlations between error sets in these scores comparisons
Genz and Mendell-Elston Estimation of the High-Dimensional Multivariate Normal Distribution
Statistical analysis of multinomial data in complex datasets often requires estimation of the multivariate normal (MVN) distribution for models in which the dimensionality can easily reach 10â1000 and higher. Few algorithms for estimating the MVN distribution can offer robust and efficient performance over such a range of dimensions. We report a simulation-based comparison of two algorithms for the MVN that are widely used in statistical genetic applications. The venerable Mendell- Elston approximation is fast but execution time increases rapidly with the number of dimensions, estimates are generally biased, and an error bound is lacking. The correlation between variables significantly affects absolute error but not overall execution time. The Monte Carlo-based approach described by Genz returns unbiased and error-bounded estimates, but execution time is more sensitive to the correlation between variables. For ultra-high-dimensional problems, however, the Genz algorithm exhibits better scale characteristics and greater time-weighted efficiency of estimation
Pareto Smoothed Importance Sampling
Importance weighting is a general way to adjust Monte Carlo integration to
account for draws from the wrong distribution, but the resulting estimate can
be noisy when the importance ratios have a heavy right tail. This routinely
occurs when there are aspects of the target distribution that are not well
captured by the approximating distribution, in which case more stable estimates
can be obtained by modifying extreme importance ratios. We present a new method
for stabilizing importance weights using a generalized Pareto distribution fit
to the upper tail of the distribution of the simulated importance ratios. The
method, which empirically performs better than existing methods for stabilizing
importance sampling estimates, includes stabilized effective sample size
estimates, Monte Carlo error estimates and convergence diagnostics.Comment: Major revision: 1) proofs for consistency, finite variance, and
asymptotic normality, 2) justification of k<0.7 with theoretical
computational complexity analysis, 3) major rewrit
Sensitivity analysis and related analysis: A survey of statistical techniques
This paper reviews the state of the art in five related types of analysis, namely (i) sensitivity or what-if analysis, (ii) uncertainty or risk analysis, (iii) screening, (iv) validation, and (v) optimization. The main question is: when should which type of analysis be applied; which statistical techniques may then be used? This paper distinguishes the following five stages in the analysis of a simulation model. 1) Validation: the availability of data on the real system determines which type of statistical technique to use for validation. 2) Screening: in the simulation's pilot phase the really important inputs can be identified through a novel technique, called sequential bifurcation, which uses aggregation and sequential experimentation. 3) Sensitivity analysis: the really important inputs should be This approach with its five stages implies that sensitivity analysis should precede uncertainty analysis. This paper briefly discusses several case studies for each phase.Experimental Design;Statistical Methods;Regression Analysis;Risk Analysis;Least Squares;Sensitivity Analysis;Optimization;Perturbation;statistics
Wrapper algorithms and their performance assessment on high-dimensional molecular data
Prediction problems on high-dimensional molecular data, e.g. the classification of microar-
ray samples into normal and cancer tissues, are complex and ill-posed since the number
of variables usually exceeds the number of observations by orders of magnitude. Recent
research in the area has propagated a variety of new statistical models in order to handle
these new biological datasets. In practice, however, these models are always applied in
combination with preprocessing and variable selection methods as well as model selection
which is mostly performed by cross-validation. Varma and Simon (2006) have used the
term âwrapper-algorithmâ for this integration of preprocessing and model selection into the
construction of statistical models. Additionally, they have proposed the method of nested
cross-validation (NCV) as a way of estimating their prediction error which has evolved to
the gold-standard by now.
In the first part, this thesis provides further theoretical and empirical justification for
the usage of NCV in the context of wrapper-algorithms. Moreover, a computationally less
intensive alternative to NCV is proposed which can be motivated in a decision theoretic
framework. The new method can be interpreted as a smoothed variant of NCV and, in
contrast to NCV, guarantees intuitive bounds for the estimation of the prediction error.
The second part focuses on the ranking of wrapper algorithms. Cross-study-validation is
proposed as an alternative concept to the repetition of separated within-study-validations
if several similar prediction problems are available. The concept is demonstrated using
six different wrapper algorithms for survival prediction on censored data on a selection of
eight breast cancer datasets. Additionally, a parametric bootstrap approach for simulating
realistic data from such related prediction problems is described and subsequently applied
to illustrate the concept of cross-study-validation for the ranking of wrapper algorithms.
Eventually, the last part approaches computational aspects of the analyses and simula-
tions performed in the thesis. The preprocessing before the analysis as well as the evaluation
of the prediction models requires the usage of large computing resources. Parallel comput-
ing approaches are illustrated on cluster, cloud and high performance computing resources
using the R programming language. Usage of heterogeneous hardware and processing of
large datasets are covered as well as the implementation of the R-package survHD for
the analysis and evaluation of high-dimensional wrapper algorithms for survival prediction
from censored data.Prädiktionsprobleme fßr hochdimensionale genetische Daten, z.B. die Klassifikation von
Proben in normales und Krebsgewebe, sind komplex und unterbestimmt, da die Anzahl
der Variablen die Anzahl der Beobachtungen um ein Vielfaches Ăźbersteigt. Die Forschung
hat auf diesem Gebiet in den letzten Jahren eine Vielzahl an neuen statistischen Meth-
oden hervorgebracht. In der Praxis werden diese Algorithmen jedoch stets in Kombination mit Vorbearbeitung und Variablenselektion sowie Modellwahlverfahren angewandt,
wobei letztere vorwiegend mit Hilfe von Kreuzvalidierung durchgefĂźhrt werden. Varma
und Simon (2006) haben den Begriff âWrapper-Algorithmusâ fĂźr eine derartige Einbet-
tung von Vorbearbeitung und Modellwahl in die Konstruktion einer statistischen Methode
verwendet. Zudem haben sie die genestete Kreuzvalidierung (NCV) als eine Methode
zur Sch Ěatzung ihrer Fehlerrate eingefĂźhrt, welche sich mittlerweile zum Goldstandard entwickelt hat. Im ersten Teil dieser Doktorarbeit, wird eine tiefergreifende theoretische
Grundlage sowie eine empirische Rechtfertigung fĂźr die Anwendung von NCV bei solchen
âWrapper-Algorithmenâ vorgestellt. AuĂerdem wird eine alternative, weniger computerintensive Methode vorgeschlagen, welche im Rahmen der Entscheidungstheorie motiviert
wird. Diese neue Methode kann als eine gegl Ěattete Variante von NCV interpretiert wer-
den und hält im Gegensatz zu NCV intuitive Grenzen bei der Fehlerratenschätzung ein.
Der zweite Teil behandelt den Vergleich verschiedener âWrapper-Algorithmenâ bzw. das
Sch Ěatzen ihrer Reihenfolge gem ĚaĂ eines bestimmten GĂźtekriteriums. Als eine Alterna-
tive zur wiederholten Durchfßhrung von Kreuzvalidierung auf einzelnen Datensätzen wird
das Konzept der studienĂźbergreifenden Validierung vorgeschlagen. Das Konzept wird anhand von sechs verschiedenen âWrapper-Algorithmenâ fĂźr die Vorhersage von Uberlebenszeiten bei acht Brustkrebsstudien dargestellt. Zusätzlich wird ein Bootstrapverfahren
beschrieben, mit dessen Hilfe man mehrere realistische Datens Ěatze aus einer Menge von
solchen verwandten Prädiktionsproblemen generieren kann. Der letzte Teil beleuchtet
schlieĂlich computationale Verfahren, die bei der Umsetzung der Analysen in dieser Dissertation eine tragende Rolle gespielt haben. Die Vorbearbeitungsschritte sowie die Evaluation der Prädiktionsmodelle erfordert die extensive Nutzung von Computerressourcen.
Es werden Ansätze zum parallelen Rechnen auf Cluster-, Cloud- und Hochleistungsrechen-
ressourcen unter der Verwendung der Programmiersprache R beschrieben. Die Benutzung
von heterogenen Hardwarearchitekturen, die Verarbeitung von groĂen Datensätzen sowie
die Entwicklung des R-Pakets survHD fĂźr die Analyse und Evaluierung von âWrapper-
Algorithmenâ zur Uberlebenszeitenanalyse
werden thematisiert
Information theoretic novelty detection
We present a novel approach to online change detection problems when the training sample size is small. The proposed approach is based on estimating the expected information content of a new data point and allows an accurate control of the false positive rate even for small data sets. In the case of the Gaussian distribution, our approach is analytically tractable and closely related
to classical statistical tests. We then propose an approximation scheme to extend our approach to the case of the mixture of Gaussians. We evaluate extensively our approach on synthetic data and on three real benchmark data
sets. The experimental validation shows that our method maintains a good overall accuracy, but significantly improves the control over the false positive rate
Investigation of the Effects of Image Signal-to-Noise Ratio on TSPO PET Quantification of Neuroinflammation
Neuroinflammation may be imaged using positron emission tomography (PET) and the tracer [11C]-PK11195. Accurate and precise quantification of 18 kilodalton Translocator Protein (TSPO) binding parameters in the brain has proven difficult with this tracer, due to an unfavourable combination of low target concentration in tissue, low brain uptake of the tracer and relatively high non-specific binding, all of which leads to higher levels of relative image noise. To address these limitations, research into new radioligands for the TSPO, with higher brain uptake and lower non-specific binding relative to [11C]-PK11195, is being conducted world-wide. However, factors other than radioligand properties are known to influence signal-to-noise ratio in quantitative PET studies, including the scanner sensitivity, image reconstruction algorithms and data analysis methodology. The aim of this thesis was to investigate and validate computational tools for predicting image noise in dynamic TSPO PET studies, and to employ those tools to investigate the factors that affect image SNR and reliability of TSPO quantification in the human brain. The feasibility of performing multiple (nâĽ40) independent Monte Carlo simulations for each dynamic [11C]-PK11195 frame- with realistic modelling of the radioactivity source, attenuation and PET tomograph geometries- was investigated. A Beowulf-type high performance computer cluster, constructed from commodity components, was found to be well suited to this task. Timing tests on a single desktop computer system indicated that a computer cluster capable of simulating an hour-long dynamic [11C]-PK11195 PET scan, with 40 independent repeats, and with a total simulation time of less than 6 weeks, could be constructed for less than 10,000 Australian dollars. A computer cluster containing 44 computing cores was therefore assembled, and a peak simulation rate of 2.84x105 photon pairs per second was achieved using the GEANT4 Application for Tomographic Emission (GATE) Monte Carlo simulation software. A simulated PET tomograph was developed in GATE that closely modelled the performance characteristics of several real-world clinical PET systems in terms of spatial resolution, sensitivity, scatter fraction and counting rate performance. The simulated PET system was validated using adaptations of the National Electrical Manufacturers Association (NEMA) quality assurance procedures within GATE. Image noise in dynamic TSPO PET scans was estimated by performing n=40 independent Monte Carlo simulations of an hour-long [11C]-PK11195 scan, and of an hour- long dynamic scan for a hypothetical TSPO ligand with double the brain activity concentration of [11C]-PK11195. From these data an analytical noise model was developed that allowed image noise to be predicted for any combination of brain tissue activity concentration and scan duration. The noise model was validated for the purpose of determining the precision of kinetic parameter estimates for TSPO PET. An investigation was made into the effects of activity concentration in tissue, radionuclide half-life, injected dose and compartmental model complexity on the reproducibility of kinetic parameters. Injecting 555 MBq of carbon-11 labelled TSPO tracer produced similar binding parameter precision to 185 MBq of fluorine-18, and a moderate (20%) reduction in precision was observed for the reduced carbon-11 dose of 370 MBq. Results indicated that a factor of 2 increase in frame count level (relative to [11C]-PK11195, and due for example to higher ligand uptake, injected dose or absolute scanner sensitivity) is required to obtain reliable binding parameter estimates for small regions of interest when fitting a two-tissue compartment, four-parameter compartmental model. However, compartmental model complexity had a similarly large effect, with the reduction of model complexity from the two-tissue compartment, four-parameter to a one-tissue compartment, two-parameter model producing a 78% reduction in coefficient of variation of the binding parameter estimates at each tissue activity level and region size studied. In summary, this thesis describes the development and validation of Monte Carlo methods for estimating image noise in dynamic TSPO PET scans, and analytical methods for predicting relative image noise for a wide range of tissue activity concentration and acquisition durations. The findings of this research suggest that a broader consideration of the kinetic properties of novel TSPO radioligands, with a view to selection of ligands that are potentially amenable to analysis with a simple one-tissue compartment model, is at least as important as efforts directed towards reducing image noise, such as higher brain uptake, in the search for the next generation of TSPO PET tracers
- âŚ