2 research outputs found

    Multilevel mixed-type data analysis for validating partitions of scrapie isolates

    Get PDF
    The dissertation arises from a joint study with the Department of Food Safety and Veterinary Public Health of the Istituto Superiore di Sanità. The aim is to investigate and validate the existence of distinct strains of the scrapie disease taking into account the availability of a priori benchmark partition formulated by researchers. Scrapie of small ruminants is caused by prions, which are unconventional infectious agents of proteinaceous nature a ecting humans and animals. Due to the absence of nucleic acids, which precludes direct analysis of strain variation by molecular methods, the presence of di erent sheep scrapie strains is usually investigated by bioassay in laboratory rodents. Data are collected by an experimental study on scrapie conducted at the Istituto Superiore di Sanità by experimental transmission of scrapie isolates to bank voles. We aim to discuss the validation of a given partition in a statistical classification framework using a multi-step procedure. Firstly, we use unsupervised classification to see how alternative clustering results match researchers’ understanding of the heterogeneity of the isolates. We discuss whether and how clustering results can be eventually exploited to extend the preliminary partition elicited by researchers. Then we motivate the subsequent partition validation based on the predictive performance of several supervised classifiers. Our data-driven approach contains two main methodological original contributions. We advocate the use of partition validation measures to investigate a given benchmark partition: firstly we discuss the issue of how the data can be used to evaluate a preliminary benchmark partition and eventually modify it with statistical results to find a conclusive partition that could be used as a “gold standard” in future studies. Moreover, collected data have a multilevel structure and for each lower-level unit, mixed-type data are available. Each step in the procedure is then adapted to deal with multilevel mixed-type data. We extend distance-based clustering algorithms to deal with multilevel mixed-type data. Whereas in supervised classification we propose a two-step approach to classify the higher-level units starting from the lower-level observations. In this framework, we also need to define an ad-hoc cross validation algorithm

    Hierarchical Models for Screening of Iron Deficiency Anemia

    No full text
    We investigate the problem of classifying individuals based on estimated density functions for each individual. Given labelled histograms characterizing red blood cells (RBCs) for different individuals, the learning problem is to build a classifier which can classify new unlabelled histograms into normal and iron deficient classes. Thus, the problem is similar to conventional classification in that there is labelled training data, but different in that the underlying measurements are not feature vectors but histograms or density estimates. We describe a general framework based on probabilistic hierarchical models for modelling such data and illustrate how the model lends itself to classification. We contrast this approach with two other alternatives: (1) directly defining distance between densities using a cross-entropy distance measure, and (2) using parameters of the estimated densities as feature vectors for a standard discriminative classification framework. We evaluate all three m..
    corecore