13 research outputs found
Identifying regions of interest in mammogram images
Screening mammography is the primary preventive strategy for early detection of breast cancer and an essential input to breast cancer risk prediction and application of prevention/risk management guidelines. Identifying regions of interest within mammogram images that are associated with 5- or 10-year breast cancer risk is therefore clinically meaningful. The problem is complicated by the irregular boundary issue posed by the semi-circular domain of the breast area within mammograms. Accommodating the irregular domain is especially crucial when identifying regions of interest, as the true signal comes only from the semi-circular domain of the breast region, and noise elsewhere. We address these challenges by introducing a proportional hazards model with imaging predictors characterized by bivariate splines over triangulation. The model sparsity is enforced with the group lasso penalty function. We apply the proposed method to the motivating Joanne Knight Breast Health Cohort to illustrate important risk patterns and show that the proposed method is able to achieve higher discriminatory performance
Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains
Existing approaches for multivariate functional principal component analysis
are restricted to data on the same one-dimensional interval. The presented
approach focuses on multivariate functional data on different domains that may
differ in dimension, e.g. functions and images. The theoretical basis for
multivariate functional principal component analysis is given in terms of a
Karhunen-Lo\`eve Theorem. For the practically relevant case of a finite
Karhunen-Lo\`eve representation, a relationship between univariate and
multivariate functional principal component analysis is established. This
offers an estimation strategy to calculate multivariate functional principal
components and scores based on their univariate counterparts. For the resulting
estimators, asymptotic results are derived. The approach can be extended to
finite univariate expansions in general, not necessarily orthonormal bases. It
is also applicable for sparse functional data or data with measurement error. A
flexible R-implementation is available on CRAN. The new method is shown to be
competitive to existing approaches for data observed on a common
one-dimensional domain. The motivating application is a neuroimaging study,
where the goal is to explore how longitudinal trajectories of a
neuropsychological test score covary with FDG-PET brain scans at baseline.
Supplementary material, including detailed proofs, additional simulation
results and software is available online.Comment: Revised Version. R-Code for the online appendix is available in the
.zip file associated with this article in subdirectory "/Software". The
software associated with this article is available on CRAN (packages funData
and MFPCA
Multivariate Analysis for Multiple Network Data via Semi-Symmetric Tensor PCA
Network data are commonly collected in a variety of applications,
representing either directly measured or statistically inferred connections
between features of interest. In an increasing number of domains, these
networks are collected over time, such as interactions between users of a
social media platform on different days, or across multiple subjects, such as
in multi-subject studies of brain connectivity. When analyzing multiple large
networks, dimensionality reduction techniques are often used to embed networks
in a more tractable low-dimensional space. To this end, we develop a framework
for principal components analysis (PCA) on collections of networks via a
specialized tensor decomposition we term Semi-Symmetric Tensor PCA or SS-TPCA.
We derive computationally efficient algorithms for computing our proposed
SS-TPCA decomposition and establish statistical efficiency of our approach
under a standard low-rank signal plus noise model. Remarkably, we show that
SS-TPCA achieves the same estimation accuracy as classical matrix PCA, with
error proportional to the square root of the number of vertices in the network
and not the number of edges as might be expected. Our framework inherits many
of the strengths of classical PCA and is suitable for a wide range of
unsupervised learning tasks, including identifying principal networks,
isolating meaningful changepoints or outlying observations, and for
characterizing the "variability network" of the most varying edges. Finally, we
demonstrate the effectiveness of our proposal on simulated data and on an
example from empirical legal studies. The techniques used to establish our main
consistency results are surprisingly straightforward and may find use in a
variety of other network analysis problems
On the use of the Gram matrix for multivariate functional principal components analysis
Dimension reduction is crucial in functional data analysis (FDA). The key
tool to reduce the dimension of the data is functional principal component
analysis. Existing approaches for functional principal component analysis
usually involve the diagonalization of the covariance operator. With the
increasing size and complexity of functional datasets, estimating the
covariance operator has become more challenging. Therefore, there is a growing
need for efficient methodologies to estimate the eigencomponents. Using the
duality of the space of observations and the space of functional features, we
propose to use the inner-product between the curves to estimate the
eigenelements of multivariate and multidimensional functional datasets. The
relationship between the eigenelements of the covariance operator and those of
the inner-product matrix is established. We explore the application of these
methodologies in several FDA settings and provide general guidance on their
usability.Comment: 23 pages, 12 figure
Multi-Rank Sparse and Functional PCA: Manifold Optimization and Iterative Deflation Techniques
We consider the problem of estimating multiple principal components using the
recently-proposed Sparse and Functional Principal Components Analysis (SFPCA)
estimator. We first propose an extension of SFPCA which estimates several
principal components simultaneously using manifold optimization techniques to
enforce orthogonality constraints. While effective, this approach is
computationally burdensome so we also consider iterative deflation approaches
which take advantage of existing fast algorithms for rank-one SFPCA. We show
that alternative deflation schemes can more efficiently extract signal from the
data, in turn improving estimation of subsequent components. Finally, we
compare the performance of our manifold optimization and deflation techniques
in a scenario where orthogonality does not hold and find that they still lead
to significantly improved performance.Comment: To appear in IEEE CAMSAP 201
Object-Oriented Software for Functional Data
This paper introduces the funData R package as an object-oriented implementation of functional data. It implements a unified framework for dense univariate and multivariate functional data on one- and higher dimensional domains as well as for irregular functional data. The aim of this package is to provide a user-friendly, self-contained core toolbox for functional data, including important functionalities for creating, accessing and modifying functional data objects, that can serve as a basis for other packages. The package further contains a full simulation toolbox, which is a useful feature when implementing and testing new methodological developments. Based on the theory of object-oriented data analysis, it is shown why it is natural to implement functional data in an object-oriented manner. The classes and methods provided by funData are illustrated in many examples using two freely available datasets. The MFPCA package, which implements multivariate functional principal component analysis, is presented as an example for an advanced methodological package that uses the funData package as a basis, including a case study with real data. Both packages are publicly available on GitHub and the Comprehensive R Archive Network