2,195 research outputs found

    Statistical analysis of genomic data : a new model for class prediction and inference

    Get PDF
    Genomics is a major scientific revolution in this century. High-throughput genomic data provides an opportunity for identifying genes and SNPs (singlenucleotide polymorphism) that are related to various clinical phenotypes. To deal with the sheer volume of genetic data being produced, it requires advanced methodological development in biostatistics that is lagging behind the technical capability to generate genomic data. SNPs have great importance in biomedical research for comparing regions of the genome between cohorts (such as case-control studies). Within a population, SNPs can be assigned a minor allele frequency, the lowest allele frequency at a locus that is observed in a particular population, and be recoded to binary datasets. Therefore, it is important to develop suitable statistical methods for SNPs analysis of genome alteration with the goal of contributing to the understanding of complex human diseases or traits such as mental health.In this thesis, we develop new statistical methodologies for the analysis of schizophrenia genomic data from the WA Genetic Epidemiology Resource (WAGER). The motivation is driven by the schizophrenia class prediction, (i.e. the prediction of individuals’ disease status through their genotype and quantitative traits). In general, individual’s disease status is a nominal variable, while genotypes can be converted into ordinal variables but are of high dimension. Note that the usual nonparametric regression that is developed for continuous variables cannot be applied here. There are some methodologies, such as the tree-based logistic Non-parametric Pathway-based Regression model (NPR) proposed by Wei and Li (2007)available in the literature. However, it is found that this model does not well adapt to the data set that we are analyzing. It is even worse than the (generalized) linear logistic regression model. Using logistic discrimination rule, together with adding quantitative traits, some important results have been obtained. However, some shortcomings remain. Firstly, the generalized linear logistic model has a high type I error rate for schizophrenia classification. Secondly, quantitative traits required for schizophrenia class prediction are performance assessments which demand several hours on-site participation by both assessor and assessee. These traits are generally quite difficult to reach even for a medium size sample. Meanwhile, though the laboratory analyzing cost is high, a person’s genotype can be obtained by merely collecting a drop of blood.Thus, two kinds of nonlinear models are proposed to capture the nonlinear effects in SNP datasets, which are categorical. The main contributions of this thesis are summarized as follows: • Two kinds of nonlinear threshold index logistic regression models are proposed to capture the nonlinear effects by applying the idea of threshold models (Tong (1983, 1990)) which are parametric and therefore applicable to the categorical data. One of the proposed models, which is called the partially linear threshold index logistic regression (PL-TILoR) model, is given by log ( P(Yi = 1|Xi) 1 − P(Yi = 1|Xi) ) = ®TXi + g(¯TXi), (0.1) where Yi is the disease status of the ith person under case-control study, taking on values of 1 (case) or 0 (control), Xi is the vector of genotype variables, which is p-dimensional, and the superscript T stands for transpose of a vector or matrix. Here, ® and ¯ are p-dimensional unknown parameters with ¯ being an index vector used for the reduction of dimension, satisfying k¯k = 1 and ®T¯ = 0 for model identifiability, and g is, therefore, a one-dimensional nonlinear function, which is modelled as stepwise linear function through threshold effect (Tong, 1990), given below. g(z) = (b1z + b2)I{z•c} + (b3z + b4)I{z>c}, (0.2) where bi’s and c are unknown parameters to be estimated and IA is an indicator function of the set A. In practice, the first component in model (0.1) could also be nonlinear. In this case, model (0.1) becomes log ( P(Yi = 1|Xi) 1 − P(Yi = 1|Xi) ) = g1(®TXi) + g2(¯TXi), (0.3) where k®k = 1, k¯k = 1 and ®T¯ = 0 for model identifiability, and g1 and g2 are two one-dimensional nonlinear functions which are modelled by stepwise linear functions through threshold effects as follows: gk(z) = (bk1z + bk2)I{z•ck} + (bk3z + bk4)I{z>ck}, k = 1, 2, (0.4) where bki’s and ck’s are unknown parameters to be estimated. Thus, (0.3) and (0.4) form an additive threshold index logistic regression (ATILoR) model. • A maximum likelihood methodology is developed to estimate the unknown parameters in the PL-TILoR and A-TILoR models. Simulation studies have found that the proposed methodology works well for finite size samples. • Empirical studies of the proposed models applied to the analysis of schizophrenia genomic data from the WA Genetic Epidemiology Resource (WAGER) have shown that A-TILoR model is very successful in reducing the type I error rate in schizophrenia classification without even using quantitative traits. It outperforms the generalized linear logistic model that is widely used in the literature

    Doduo: Learning Dense Visual Correspondence from Unsupervised Semantic-Aware Flow

    Full text link
    Dense visual correspondence plays a vital role in robotic perception. This work focuses on establishing the dense correspondence between a pair of images that captures dynamic scenes undergoing substantial transformations. We introduce Doduo to learn general dense visual correspondence from in-the-wild images and videos without ground truth supervision. Given a pair of images, it estimates the dense flow field encoding the displacement of each pixel in one image to its corresponding pixel in the other image. Doduo uses flow-based warping to acquire supervisory signals for the training. Incorporating semantic priors with self-supervised flow training, Doduo produces accurate dense correspondence robust to the dynamic changes of the scenes. Trained on an in-the-wild video dataset, Doduo illustrates superior performance on point-level correspondence estimation over existing self-supervised correspondence learning baselines. We also apply Doduo to articulation estimation and zero-shot goal-conditioned manipulation, underlining its practical applications in robotics. Code and additional visualizations are available at https://ut-austin-rpl.github.io/DoduoComment: Project website: https://ut-austin-rpl.github.io/Dodu

    Few-View Object Reconstruction with Unknown Categories and Camera Poses

    Full text link
    While object reconstruction has made great strides in recent years, current methods typically require densely captured images and/or known camera poses, and generalize poorly to novel object categories. To step toward object reconstruction in the wild, this work explores reconstructing general real-world objects from a few images without known camera poses or object categories. The crux of our work is solving two fundamental 3D vision problems -- shape reconstruction and pose estimation -- in a unified approach. Our approach captures the synergies of these two problems: reliable camera pose estimation gives rise to accurate shape reconstruction, and the accurate reconstruction, in turn, induces robust correspondence between different views and facilitates pose estimation. Our method FORGE predicts 3D features from each view and leverages them in conjunction with the input images to establish cross-view correspondence for estimating relative camera poses. The 3D features are then transformed by the estimated poses into a shared space and are fused into a neural radiance field. The reconstruction results are rendered by volume rendering techniques, enabling us to train the model without 3D shape ground-truth. Our experiments show that FORGE reliably reconstructs objects from five views. Our pose estimation method outperforms existing ones by a large margin. The reconstruction results under predicted poses are comparable to the ones using ground-truth poses. The performance on novel testing categories matches the results on categories seen during training. Project page: https://ut-austin-rpl.github.io/FORGE

    Determination of fundamental properties of an M31 globular cluster from main-sequence photometry

    Full text link
    M31 globular cluster B379 is the first extragalactic cluster, the age of which was determined by main-sequence photometry. In this method, the age of a cluster is obtained by fitting its CMD with stellar evolutionary models. However, different stellar evolutionary models use different parameters of stellar evolution, such as range of stellar masses, different opacities and equations of state, and different recipes, and so on. So, it is interesting to check whether different stellar evolutionary models can give consistent results for the same cluster. Brown et al. (2004a) constrained the age of B379 by comparing its CMD with isochrones of the 2006 VandenBerg models. Using SSP models of BC03 and its multi-photometry, Ma et al. (2007) independently determined the age of B379, which is in good agreement with the determination of Brown et al. (2004a). The BC03 models are calculated based on the Padova evolutionary tracks. It is necessary to check whether the age of B379 which, being determined based on the Padova evolutionary tracks, is in agreement with the determination of Brown et al. (2004a). So, in this paper, we re-determine its age using isochrones of the Padova stellar evolutionary models. In addition, the metal abundance, the distance modulus, and the reddening value for B379 are also determined in this paper. The results obtained in this paper are consistent with the previous determinations, which including the age obtained by Brown et al. (2004a). So, this paper confirms the consistence of the age scale of B379 between the Padova isochrones and the 2006 VandenBerg isochrones, i.e. the results' comparison between Brown et al. (2004a) and Ma et al. (2007) is meaningful. The results obtained in this paper are: the metallicity [M/H]=-0.325, the age τ=11.0±1.5\tau=11.0\pm1.5 Gyr, the reddening value E(B-V)=0.08, and the distance modulus (m−M)0=24.44±0.10(m-M)_{0}=24.44\pm0.10.Comment: Accepted for Publication in PASP, 7 pages, 1 figure and 1 tabl
    • …
    corecore