39 research outputs found
Discriminant analysis for the prediction and classification of tick-borne infections in some dairy cattle herds at Dakahlia Governorate, Egypt
This study was undertaken to use the variable loadings in linear discriminant analysis (LDA) to determine the most important predictors for the discrimination of tick-borne diseases (TBDs), particularly babesiosis and anaplasmosis and predict the group membership from the predictors. In total, 163 cattle, from different localities at Dakahlia Governorate, Egypt, were investigated in 2012 and 2013 for the presence of TBDs. All cattle were clinically examined and a clinical index score was determined for each cow. Blood samples were also collected from each animal for adopting microscopy and diagnostic laboratory methods. Out of the examined cattle, 83 animals were acutely-ill (Babesia bovis and Anaplasma marginale were identified in 11 and 10 animals, respectively), while 80 cows were apparently healthy but having previous attacks of blood parasites (23 animals harbored anaplasma marginale (asymptomatic carriers)). The remained 119 animals were negative to TBDs. Fourteen animals were not survived and 149 cases were survived. As the result of the first LDA to discriminate babesiosis, anaplasmosis and negative to TBDs, 89.0% of animals were correctly classified; 78.8% (26/33) for anaplasma, 100% (11/11) for babesia infections, 90.8% (108/119) for negative to TBDs, respectively. The important predictors for the discrimination were oculonasal discharge, bloody feces, hemoglobinuria, bloody feces and respiratory rate. On the other hand, the second LDA discrimination showed high classification accuracy of 87.1% for the discrimination of survivors and non-survivors; 89.9% (134/149) for survivors and 57.1% (8/14) for non-survivors, while the important predictors included oculonasal discharge, recumbent posture and nervous sign
Efficiently finding genome-wide three-way gene interactions from transcript- and genotype-data
Motivation: We address the issue of finding a three-way gene interaction, i.e. two interacting genes in expression under the genotypes of another gene, given a dataset in which expressions and genotypes are measured at once for each individual. This issue can be a general, switching mechanism in expression of two genes, being controlled by categories of another gene, and finding this type of interaction can be a key to elucidating complex biological systems. The most suitable method for this issue is likelihood ratio test using logistic regressions, which we call interaction test, but a serious problem of this test is computational intractability at a genome-wide level
ROS-DET: robust detector of switching mechanisms in gene expression.
A switching mechanism in gene expression, where two genes are positively correlated in one condition and negatively correlated in the other condition, is a key to elucidating complex biological systems. There already exist methods for detecting switching mechanisms from microarrays. However, current approaches have problems under three real cases: outliers, expression values with a very small range and a small number of examples. ROS-DET overcomes these three problems, keeping the computational complexity of current approaches. We demonstrated that ROS-DET outperformed existing methods, under that all these three situations are considered. Furthermore, for each of the top 10 pairs ranked by ROS-DET, we attempted to identify a pathway, i.e. consecutive biological phenomena, being related with the corresponding two genes by checking the biological literature. In 8 out of the 10 pairs, we found two parallel pathways, one of the two genes being in each of the two pathways and two pathways coming to (or starting with) the same gene. This indicates that two parallel pathways would be cooperatively used under one experimental condition, corresponding to the positive correlation, and the two pathways might be alternatively used under the other condition, corresponding to the negative correlation. ROS-DET is available from http://www.bic.kyoto-u.ac.jp/pathway/kayano/ros-det.htm
Data from: Construction of a virtual Mycobacterium tuberculosis consensus genome and its application to data from a next generation sequencer
Background: Although Mycobacterium tuberculosis isolates are consisted of several different lineages and the epidemiology analyses are usually assessed relative to a particular reference genome, M. tuberculosis H37Rv, which might introduce some biased results. Those analyses are essentially based genome sequence information of M. tuberculosis and could be performed in sillico in theory, with whole genome sequence (WGS) data available in the databases and obtained by next generation sequencers (NGSs). As an approach to establish higher resolution methods for such analyses, whole genome sequences of the M. tuberculosis complexes (MTBCs) strains available on databases were aligned to construct virtual reference genome sequences called the consensus sequence (CS), and evaluated its feasibility in in sillico epidemiological analyses. Results: The consensus sequence (CS) was successfully constructed and utilized to perform phylogenetic analysis, evaluation of read mapping efficacy, which is crucial for detecting single nucleotide polymorphisms (SNPs), and various MTBC typing methods virtually including spoligotyping, VNTR, Long sequence polymorphism and Beijing typing. SNPs detected based on CS, in comparison with H37Rv, were utilized in concatemer-based phylogenetic analysis to determine their reliability relative to a phylogenetic tree based on whole genome alignment as the gold standard. Statistical comparison of phylogenic trees based on CS with that of H37Rv indicated the former showed always better results that that of later. SNP detection and concatenation with CS was advantageous because the frequency of crucial SNPs distinguishing among strain lineages was higher than those of H37Rv. The number of SNPs detected was lower with the consensus than with the H37Rv sequence, resulting in a significant reduction in computational time. Performance of each virtual typing was satisfactory and accorded with those published when those are available. Conclusions: These results indicated that virtual CS constructed from genome sequence data is an ideal approach as a reference for MTBC studies
Functional Cluster Analysis via Orthonormal Gaussian Basis Expansions and Its Application
Kyushu University 21st Century COE Program Development of Dynamic Mathematics with High Functionality九州大学21世紀COEプログラム「機能数理学の構築と展開」This paper introduces functional cluster analysis (FCA) for multidimensional functional data sets, utilizing orthonormal Gaussian basis functions. An essential point in FCA is the use of orthonormal bases that yield the identity matrix for the integral of the product of any two bases (identity cross product matrix). We construct orthonormal Gaussian basis functions using Cholesky decomposition and derive its property concerning the Gram-Schmidt orthonormalization. Advantages of the functional clustering approach are that it can be applied to the data observed at possibly different time points for each subject, and the functional structure behind the data can be captured by removing the measurement errors. The proposed method is applied to three-dimensional (3D) protein structural data that determine the 3D arrangement of amino acids in individual protein. In addition, numerical experiments are conducted to investigate the effectiveness of our method with the orthonormal Gaussian bases, comparing to conventional cluster analysis. The numerical results show that our methodology is superior to the conventional method for noisy data sets with outliers
Functional Principal Component Analysis via Regularized Gaussian Basis Expansions and Its Application to Unbalanced Data
This paper introduces regularized functional principal component analysis for multidimensional functional data sets, utilizing Gaussian basis functions. An essential point in a functional approach via basis expansions is the evaluation of the matrix for the integral of the product of any two bases (cross product matrix). Advantages of the use of the Gaussian type of basis functions in the functional approach are that its cross product matrix can be easily calculated, and it creates a much more flexible instrument for transforming each individual’s observation into a functional form. The proposed method is applied to the analysis of three-dimensional (3D) protein structural data that can be referred to as unbalanced data. It is shown that our method extracts useful information from unbalanced data. Numerical experiments are conducted to investigate the effectiveness of our method via Gaussian basis functions, comparing to the method based on B-splines. On performing regularized functional principal component analysis with B-splines, we also derive the exact form of its cross product matrix. The numerical results show that our methodology is superior to that based on B-splines for unbalanced data.Kyushu University 21st Century COE Program Development of Dynamic Mathematics with High Functionality九州大学21世紀COEプログラム「機能数理学の構築と展開
Sparse functional principal component analysis via regularized basis expansions and its application
MI: Global COE Program Education-and-Research Hub for Mathematics-for-IndustryグローバルCOEプログラム「マス・フォア・インダストリ教育研究拠点」This paper introduces principal component analysis for multidimensional sparse functional data sets, utilizing Gaussian basis functions. Our multidimensional model is estimated by maximizing a penalized log-likelihood function, while previous mixed-type models were estimated by maximum likelihood methods for one-dimensional sparse functional data set. The penalized estimation performs well for our multidimensional model, while maximum likelihood methods yield unstable parameter estimates and some of the parameter estimates are often infinite. Numerical experiments are conducted to investigate the effectiveness of our method via the Gaussian bases for some types of missing data. The proposed method is applied to handwriting data, which consist of the XY coordinates values in handwritings