15 research outputs found

    Comparison of similarity coefficients used for cluster analysis with dominant markers in maize (Zea mays L)

    Get PDF
    The objective of this study was to evaluate whether different similarity coefficients used with dominant markers can influence the results of cluster analysis, using eighteen inbred lines of maize from two different populations, BR-105 and BR-106. These were analyzed by AFLP and RAPD markers and eight similarity coefficients were calculated: Jaccard, Sorensen-Dice, Anderberg, Ochiai, Simple-matching, Rogers and Tanimoto, Ochiai II and Russel and Rao. The similarity matrices obtained were compared by the Spearman correlation, cluster analysis with dendrograms (UPGMA, WPGMA, Single Linkage, Complete Linkage and Neighbour-Joining methods), the consensus fork index between all pairs of dendrograms, groups obtained through the Tocher optimization procedure and projection efficiency in a two-dimensional space. The results showed that for almost all methodologies and marker systems, the Jaccard, Sorensen-Dice, Anderberg and Ochiai coefficient showed close results, due to the fact that all of them exclude negative co-occurrences. Significant alterations in the results for the Simple Matching, Rogers and Tanimoto, and Ochiai II coefficients were not observed either, probably due to the fact that they all include negative co-occurrences. The Russel and Rao coefficient presented very different results from the others in almost all the cases studied and should not be used, because it excludes the negative co-occurrences in the numerator and includes them in the denominator of their expression. Due to the fact that the negative co-occurrences do not necessarily mean that the regions of the DNA are identical, the use of coefficients that do not include negative co-occurrences was suggested.8391Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES

    Comparison of Similarity Coefficients used for Cluster Analysis with Amplified Fragment Length Polymorphism Markers in the Silkworm, Bombyx mori

    Get PDF
    Establishing accurate genetic similarity and dissimilarity between individuals is an essential and decisive point for clustering and analyzing inter and intra population diversity because different similarity and dissimilarity indices may yield contradictory outcomes. We assessed the variations caused by three commonly used similarity coefficients including Jaccard, Sorensen-Dice and Simple matching in the clustering and ordination of seven Iranian native silkworm, Bombyx mori L. (Lepidoptera: Bombycidae), strains analyzed by amplified fragment length polymorphism markers. Comparisons among the similarity coefficients were made using the Spearman correlation analysis, dendrogram evaluation (visual inspection and consensus fork index - CIC), projection efficiency in a two-dimensional space, and groups formed by the Tocher optimization procedure. The results demonstrated that for almost all methodologies, the Jaccard and Sorensen-Dice coefficients revealed extremely close results, because both of them exclude negative co-occurrences. Due to the fact that there is no guarantee that the DNA regions with negative cooccurrences between two strains are indeed identical, the use of coefficients such as Jaccard and Sorensen-Dice that do not include negative co-occurrences was imperative for closely related organisms

    A Bayesian approach for mapping QTL in experimental populations

    No full text
    Muitos caracteres em plantas e animais são de natureza quantitativa, influenciados por múltiplos genes. Com o advento de novas técnicas moleculares tem sido possível mapear os locos que controlam os caracteres quantitativos, denominados QTLs (Quantitative Trait Loci). Mapear um QTL significa identificar sua posição no genoma, bem como, estimar seus efeitos genéticos. A maior dificuldade para realizar o mapeamento de QTLs, se deve ao fato de que o número de QTLs é desconhecido. Métodos bayesianos juntamente com método Monte Carlo com Cadeias de Markov (MCMC), têm sido implementados para inferir conjuntamente o número de QTLs, suas posições no genoma e os efeitos genéticos . O desafio está em obter a amostra da distribuição conjunta a posteriori desses parâmetros, uma vez que o número de QTLs pode ser considerado desconhecido e a dimensão do espaço paramétrico muda de acordo com o número de QTLs presente no modelo. No presente trabalho foi implementado, utilizando-se o programa estatístico R uma abordagem bayesiana para mapear QTLs em que múltiplos QTLs e os efeitos de epistasia são considerados no modelo. Para tanto foram ajustados modelos com números crescentes de QTLs e o fator de Bayes foi utilizado para selecionar o modelo mais adequado e conseqüentemente, estimar o número de QTLs que controlam os fenótipos de interesse. Para investigar a eficiência da metodologia implementada foi feito um estudo de simulação em que foram considerados duas diferentes populações experimentais: retrocruzamento e F2, sendo que para ambas as populações foi feito o estudo de simulação considerando modelos com e sem epistasia. A abordagem implementada mostrou-se muito eficiente, sendo que para todas as situações consideradas o modelo selecionado foi o modelo contendo o número verdadeiro de QTLs considerado na simulação dos dados. Além disso, foi feito o mapeamento de QTLs de três fenótipos de milho tropical: altura da planta (AP), altura da espiga (AE) e produção de grãos utilizando a metodologia implementada e os resultados obtidos foram comparados com os resultados encontrados pelo método CIM.Many traits in plants and animals have quantitative nature, influenced by multiple genes. With the new molecular techniques, it has been possible to map the loci, which control the quantitative traits, called QTL (Quantitative Trait Loci). Mapping a QTL means to identify its position in the genome, as well as to estimate its genetics effects. The great difficulty of mapping QTL relates to the fact that the number of QTL is unknown. Bayesian approaches used with Markov Chain Monte Carlo method (MCMC) have been applied to infer QTL number, their positions in the genome and their genetic effects. The challenge is to obtain the sample from the joined distribution posterior of these parameters, since the number of QTL may be considered unknown and hence the dimension of the parametric space changes according to the number of QTL in the model. In this study, a Bayesian approach was applied, using the statistical program R, in order to map QTL, considering multiples QTL and epistasis effects in the model. Models were adjusted with the crescent number of QTL and Bayes factor was used to select the most suitable model and, consequently, to estimate the number of QTL that control interesting phenotype. To evaluate the efficiency of the applied methodology, a simulation study was done, considering two different experimental populations: backcross and F2, accomplishing the simulation study for both populations, considering models with and without epistasis. The applied approach resulted to be very efficient, considering that for all the used situations, the selected model was the one containing the real number of QTL used in the data simulation. Moreover, the QTL mapping of three phenotypes of tropical corn was done: plant height, corn-cob height and grain production, using the applied methodology and the results were compared to the results found by the CIM method

    Comparison of similarity coefficients used in cluster analysis with dominant markers data.

    No full text
    Estudos de divergência genética e relações filogenéticas entre espécies vegetais de importância agronômica têm merecido atenção cada vez maior com o recente advento dos marcadores moleculares. Nesses trabalhos, os pesquisadores têm interesse em agrupar os indivíduos semelhantes de forma que as maiores diferenças ocorram entre os grupos formados. Métodos estatísticos de análise, tais como análise de agrupamentos, análise de fatores e análise de componentes principais auxiliam nesse tipo de estudo. Contudo, antes de se empregar algum desses métodos, deve ser obtida uma matriz de similaridade entre os genótipos, sendo que diversos coeficientes são propostos na literatura para esse fim. O presente trabalho teve como objetivo avaliar se diferentes coeficientes de similaridade influenciam os resultados das análises de agrupamentos, feitas a partir de dados provenientes de análises com marcadores moleculares dominantes. Foram utilizados dados de 18 linhagens de milho provenientes de duas diferentes populações, BR-105 e BR-106, as quais foram analisadas por marcadores dos tipos AFLP e RAPD. Foram considerados para comparação os coeficientes de Jaccard, Sorensen-Dice, Anderberg, Ochiai, Simple Matching, Rogers e Tanimoto, Ochiai II e Russel e Rao, para os quais foram obtidas as matrizes de similaridade. Essas matrizes foram comparadas utilizando as correlações de Pearson e Spearman, análise de agrupamentos com construção de dendrogramas, correlações, distorção e estresse entre as matrizes de similaridade e as matrizes cofenéticas, índices de consenso entre os dendrogramas, grupos obtidos com o método de otimização de Tocher e com a projeção no plano bidimensional das matrizes de similaridade. Os resultados mostraram que para praticamente todas metodologias usadas, para ambos marcadores, os coeficientes de Jaccard, Sorensen-Dice, Anderberg e Ochiai mostraram resultados semelhantes entre si, o que foi atribuído ao fato deles apresentarem como propriedade comum a desconsideração da ausência conjunta de bandas. Isso também foi observado para os coeficientes de Simple Matching, Rogers e Tanimoto e Ochiai II, que também não apresentaram entre si grandes alterações nos resultados, possivelmente devido ao fato de todos considerarem a ausência conjunta. O coeficiente de Russel e Rao apresentou resultados muito diferentes dos demais coeficientes, em função dele excluir a ausência conjunta do numerador e incluí-la no denominador, não sendo recomendado seu uso. Devido ao fato da ausência conjunta não significar necessariamente que as regiões do DNA são idênticas, sugere-se a escolha dentre os coeficientes que desconsideram a ausência conjunta.With the recent advent of the molecular markers, studies of divergence and phylogenetic relationships between and within vegetable species of agricultural interest have been received greater attention. In these studies, the aim is to group similar individuals looking for bigger differences among the groups. Statistical methods of analysis such as cluster analysis, factor analysis and principal components analysis can be used in this kind of study. However, before to employ some method, the similarity matrix between genotypes must be obtained using one of the several coefficients proposed in the concerning literature. The aim of this study was to evaluate if different similarity coefficients can influence the results of cluster analysis with dominant markers. Data from 18 inbred lines of maize from two different populations, BR-105 and BR-106, were analyzed by AFLP and RAPD markers and eight similarity coefficients (Jaccard, Sorensen-Dice, Anderberg, Ochiai, Simple-matching, Rogers and Tanimoto, Ochiai II and Russel and Rao) were obtained. The similarity matrices were compared by Pearson's and Spearman's correlations, cluster analysis (with dendrograms, correlations, distortion and stress between the similarity and cofenetical matrices, consensus fork index between all pairs of dendrograms), Tocher´s optimization procedure and with the projection in two-dimensional space of the similarity matrices. The results showed that for almost all of the methodologies and both markers, the coefficients of Jaccard, Sorensen-Dice, Anderberg and Ochiai, gave similar results, due to the fact that all of them excludes negative co-occurences. It was also observed that the Simple Matching, Rogers and Tanimoto, and Ochiai II, probably due to the fact of all including the negative co-occurences. The Russel and Rao coefficient presented results very different from the others, because it excludes the negative co-occurences in the numerator and include it in the denominator of its expression, which is a reason for not recommending it. Due the fact of the negative co-occurences does not mean, necessarily, that the regions of the DNA are identical, it is suggested to choose one those coefficients that do not include it

    A Bayesian approach for mapping QTL in experimental populations

    No full text
    Muitos caracteres em plantas e animais são de natureza quantitativa, influenciados por múltiplos genes. Com o advento de novas técnicas moleculares tem sido possível mapear os locos que controlam os caracteres quantitativos, denominados QTLs (Quantitative Trait Loci). Mapear um QTL significa identificar sua posição no genoma, bem como, estimar seus efeitos genéticos. A maior dificuldade para realizar o mapeamento de QTLs, se deve ao fato de que o número de QTLs é desconhecido. Métodos bayesianos juntamente com método Monte Carlo com Cadeias de Markov (MCMC), têm sido implementados para inferir conjuntamente o número de QTLs, suas posições no genoma e os efeitos genéticos . O desafio está em obter a amostra da distribuição conjunta a posteriori desses parâmetros, uma vez que o número de QTLs pode ser considerado desconhecido e a dimensão do espaço paramétrico muda de acordo com o número de QTLs presente no modelo. No presente trabalho foi implementado, utilizando-se o programa estatístico R uma abordagem bayesiana para mapear QTLs em que múltiplos QTLs e os efeitos de epistasia são considerados no modelo. Para tanto foram ajustados modelos com números crescentes de QTLs e o fator de Bayes foi utilizado para selecionar o modelo mais adequado e conseqüentemente, estimar o número de QTLs que controlam os fenótipos de interesse. Para investigar a eficiência da metodologia implementada foi feito um estudo de simulação em que foram considerados duas diferentes populações experimentais: retrocruzamento e F2, sendo que para ambas as populações foi feito o estudo de simulação considerando modelos com e sem epistasia. A abordagem implementada mostrou-se muito eficiente, sendo que para todas as situações consideradas o modelo selecionado foi o modelo contendo o número verdadeiro de QTLs considerado na simulação dos dados. Além disso, foi feito o mapeamento de QTLs de três fenótipos de milho tropical: altura da planta (AP), altura da espiga (AE) e produção de grãos utilizando a metodologia implementada e os resultados obtidos foram comparados com os resultados encontrados pelo método CIM.Many traits in plants and animals have quantitative nature, influenced by multiple genes. With the new molecular techniques, it has been possible to map the loci, which control the quantitative traits, called QTL (Quantitative Trait Loci). Mapping a QTL means to identify its position in the genome, as well as to estimate its genetics effects. The great difficulty of mapping QTL relates to the fact that the number of QTL is unknown. Bayesian approaches used with Markov Chain Monte Carlo method (MCMC) have been applied to infer QTL number, their positions in the genome and their genetic effects. The challenge is to obtain the sample from the joined distribution posterior of these parameters, since the number of QTL may be considered unknown and hence the dimension of the parametric space changes according to the number of QTL in the model. In this study, a Bayesian approach was applied, using the statistical program R, in order to map QTL, considering multiples QTL and epistasis effects in the model. Models were adjusted with the crescent number of QTL and Bayes factor was used to select the most suitable model and, consequently, to estimate the number of QTL that control interesting phenotype. To evaluate the efficiency of the applied methodology, a simulation study was done, considering two different experimental populations: backcross and F2, accomplishing the simulation study for both populations, considering models with and without epistasis. The applied approach resulted to be very efficient, considering that for all the used situations, the selected model was the one containing the real number of QTL used in the data simulation. Moreover, the QTL mapping of three phenotypes of tropical corn was done: plant height, corn-cob height and grain production, using the applied methodology and the results were compared to the results found by the CIM method

    Comparison of similarity coefficients used for cluster analysis with dominant markers in maize (Zea mays L)

    No full text
    The objective of this study was to evaluate whether different similarity coefficients used with dominant markers can influence the results of cluster analysis, using eighteen inbred lines of maize from two different populations, BR-105 and BR-106. These were analyzed by AFLP and RAPD markers and eight similarity coefficients were calculated: Jaccard, Sorensen-Dice, Anderberg, Ochiai, Simple-matching, Rogers and Tanimoto, Ochiai II and Russel and Rao. The similarity matrices obtained were compared by the Spearman correlation, cluster analysis with dendrograms (UPGMA, WPGMA, Single Linkage, Complete Linkage and Neighbour-Joining methods), the consensus fork index between all pairs of dendrograms, groups obtained through the Tocher optimization procedure and projection efficiency in a two-dimensional space. The results showed that for almost all methodologies and marker systems, the Jaccard, Sorensen-Dice, Anderberg and Ochiai coefficient showed close results, due to the fact that all of them exclude negative co-occurrences. Significant alterations in the results for the Simple Matching, Rogers and Tanimoto, and Ochiai II coefficients were not observed either, probably due to the fact that they all include negative co-occurrences. The Russel and Rao coefficient presented very different results from the others in almost all the cases studied and should not be used, because it excludes the negative co-occurrences in the numerator and includes them in the denominator of their expression. Due to the fact that the negative co-occurrences do not necessarily mean that the regions of the DNA are identical, the use of coefficients that do not include negative co-occurrences was suggested
    corecore