88 research outputs found

    Strategies for non-parametric smoothing of the location model in mixed-variable discriminant analysis

    Get PDF
    The non-parametric smoothing of the location model proposed by Asparoukhov and Krzanowski (2000) for allocating objects with mixtures of variables into two groups is studied. The strategy for selecting the smoothing parameter through the maximisation of the pseudo-likelihood function is reviewed. Problems with previous methods are highlighted, and two alternative strategies are proposed. Some investigations into other possible smoothing procedures for estimating cell probabilities are discussed. A leave-one-out method is proposed for constructing the allocation rule and evaluating its performance by estimating the true error rate. Results of a numerical study on simulated data highlight the feasibility of the proposed allocation rule as well as its advantages over previous methods, and an example using real data is presented

    Data visualization in yield component analysis: an expert study

    Get PDF
    Even though data visualization is a common analytical tool in numerous disciplines, it has rarely been used in agricultural sciences, particularly in agronomy. In this paper, we discuss a study on employing data visualization to analyze a multiplicative model. This model is often used by agronomists, for example in the so-called yield component analysis. The multiplicative model in agronomy is normally analyzed by statistical or related methods. In practice, unfortunately, usefulness of these methods is limited since they help to answer only a few questions, not allowing for a complex view of the phenomena studied. We believe that data visualization could be used for such complex analysis and presentation of the multiplicative model. To that end, we conducted an expert survey. It showed that visualization methods could indeed be useful for analysis and presentation of the multiplicative model

    Clustering Algorithms: Their Application to Gene Expression Data

    Get PDF
    Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure

    Imputação múltipla livre de distribuição em tabelas incompletas de dupla entrada

    Get PDF
    O objetivo deste trabalho foi propor um novo algoritmo de imputação múltipla livre de distribuição, por meio de modificações no método de imputação simples recentemente desenvolvido por Yan para contornar o problema de desbalanceamento de experimentos. O método utiliza a decomposição por valores singulares de uma matriz e foi testado por meio de simulações baseadas em duas matrizes de dados reais completos, provenientes de ensaios com eucalipto e cana-de-açúcar, com retiradas aleatórias de valores em diferentes percentagens. A qualidade das imputações foi avaliada por uma medida de acurácia geral que combina a variância entre imputações e o viés quadrático médio delas em relação aos valores retirados. A melhor alternativa para imputação múltipla é um modelo multiplicativo que inclui pesos próximos a 1 para os autovalores calculados com a decomposição. A metodologia proposta não depende de pressuposições distribucionais ou estruturais e não tem restrições quanto ao padrão ou ao mecanismo de ausência dos dados

    Statistical strategies for avoiding false discoveries in metabolomics and related experiments

    Full text link

    Discrimination and classification using both binary and continuous variables

    No full text
    SIGLEAvailable from British Library Document Supply Centre- DSC:DX73760/87 / BLDSC - British Library Document Supply CentreGBUnited Kingdo