353 research outputs found
Factor modeling for high-dimensional time series: Inference for the number of factors
This paper deals with the factor modeling for high-dimensional time series
based on a dimension-reduction viewpoint. Under stationary settings, the
inference is simple in the sense that both the number of factors and the factor
loadings are estimated in terms of an eigenanalysis for a nonnegative definite
matrix, and is therefore applicable when the dimension of time series is on the
order of a few thousands. Asymptotic properties of the proposed method are
investigated under two settings: (i) the sample size goes to infinity while the
dimension of time series is fixed; and (ii) both the sample size and the
dimension of time series go to infinity together. In particular, our estimators
for zero-eigenvalues enjoy faster convergence (or slower divergence) rates,
hence making the estimation for the number of factors easier. In particular,
when the sample size and the dimension of time series go to infinity together,
the estimators for the eigenvalues are no longer consistent. However, our
estimator for the number of the factors, which is based on the ratios of the
estimated eigenvalues, still works fine. Furthermore, this estimation shows the
so-called "blessing of dimensionality" property in the sense that the
performance of the estimation may improve when the dimension of time series
increases. A two-step procedure is investigated when the factors are of
different degrees of strength. Numerical illustration with both simulated and
real data is also reported.Comment: Published in at http://dx.doi.org/10.1214/12-AOS970 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Nonparametric eigenvalue-regularized precision or covariance matrix estimator
We introduce nonparametric regularization of the eigenvalues of a sample covariance matrix through splitting of the data (NERCOME), and prove that NERCOME enjoys asymptotic optimal nonlinear shrinkage of eigenvalues with respect to the Frobenius norm. One advantage of NERCOME is its computational speed when the dimension is not too large. We prove that NERCOME is positive definite almost surely, as long as the true covariance matrix is so, even when the dimension is larger than the sample size. With respect to the Steinās loss function, the inverse of our estimator is asymptotically the optimal precision matrix estimator. Asymptotic efficiency loss is defined through comparison with an ideal estimator, which assumed the knowledge of the true covariance matrix. We show that the asymptotic efficiency loss of NERCOME is almost surely 0 with a suitable split location of the data. We also show that all the aforementioned optimality holds for data with a factor structure. Our method avoids the need to first estimate any unknowns from a factor model, and directly gives the covariance or precision matrix estimator, which can be useful when factor analysis is not the ultimate goal. We compare the performance of our estimators with other methods through extensive simulations and real data analysis
Recommended from our members
Identification of integrin drug targets for 17 solid tumor types.
Integrins are contributors to remodeling of the extracellular matrix and cell migration. Integrins participate in the assembly of the actin cytoskeleton, regulate growth factor signaling pathways, cell proliferation, and control cell motility. In solid tumors, integrins are involved in promoting metastasis to distant sites, and angiogenesis. Integrins are a key target in cancer therapy and imaging. Integrin antagonists have proven successful in halting invasion and migration of tumors. Overexpressed integrins are prime anti-cancer drug targets. To streamline the development of specific integrin cancer therapeutics, we curated data to predict which integrin heterodimers are pausible therapeutic targets against 17 different solid tumors. Computational analysis of The Cancer Genome Atlas (TCGA) gene expression data revealed a set of integrin targets that are differentially expressed in tumors. Filtered by FPKM (Fragments Per Kilobase of transcript per Million mapped reads) expression level, overexpressed subunits were paired into heterodimeric protein targets. By comparing the RNA-seq differential expression results with immunohistochemistry (IHC) data, overexpressed integrin subunits were validated. Biologics and small molecule drug compounds against these identified overexpressed subunits and heterodimeric receptors are potential therapeutics against these cancers. In addition, high-affinity and high-specificity ligands against these integrins can serve as efficient vehicles for delivery of cancer drugs, nanotherapeutics, or imaging probes against cancer
Rank and factor loadings estimation in time series tensor factor model by pre-averaging
The idiosyncratic components of a tensor time series factor model can exhibit serial correlations, (e.g., finance or economic data), ruling out many state-of-the-art methods that assume white/independent idiosyncratic components. While the traditional higher order orthogonal iteration (HOOI) is proved to be convergent to a set of factor loading matrices, the closeness of them to the true underlying factor loading matrices are in general not established, or only under i.i.d. Gaussian noises. Under the presence of serial and cross-correlations in the idiosyncratic components and time series variables with only bounded fourth-order moments, for tensor time series data with tensor order two or above, we propose a pre-averaging procedure that can be considered a random projection method. The estimated directions corresponding to the strongest factors are then used for projecting the data for a potentially improved re-estimation of the factor loading spaces themselves, with theoretical guarantees and rate of convergence spelt out when not all factors are pervasive. We also propose a new rank estimation method, which utilizes correlation information from the projected data. Extensive simulations are performed and compared to other state-of-the-art or traditional alternatives. A set of tensor-valued NYC taxi data is also analyzed
A nonparametric eigenvalue-regularized integrated covariance matrix estimator for asset return data
In high-frequency data analysis, the extreme eigenvalues of a realized covariance matrix are biased when its dimension p is large relative to the sample size n. Furthermore, with non-synchronous trading and contamination of microstructure noise, we propose a nonparametrically eigenvalue-regularized integrated covariance matrix estimator (NERIVE) which does not assume specific structures for the underlying integrated covariance matrix. We show that NERIVE is positive definite in probability, with extreme eigenvalues shrunk nonlinearly under the high dimensional framework p=n ! c > 0. We also prove that in portfolio allocation, the minimum variance optimal weight vector constructed using NERIVE has maximum exposure and actual risk upper bounds of order p. Incidentally, the same maximum exposure bound is also satisfied by the theoretical minimum variance portfolio weights. All these results hold true also under a jump-diffusion model for the log-price processes with jumps removed using the wavelet method proposed in Fan and Wang (2007). They are further extended to accommodate the existence of pervasive factors such as a market factor under the setting p3=2=n ! c > 0. The practical performance of NERIVE is illustrated by comparing to the usual two-scale realized covariance matrix as well as some other nonparametric alternatives using different simulation settings and a real data set
Daan Go: An Entrepreneurial Case Study
Daan Go is a brand of Euro-Asian food services, now consisting of only its specialty bakery Daan Go Cake Lab. Having various retail locations across the Greater Toronto Area with current distribution channels that serve both Ontario and Quebec, Daan Go is currently focusing on expanding its bakery operations across Canada. To fulfill its vision of becoming a world-wide desserts brand, Daan Go finds itself facing unique challenges. Based on a case analysis, this report examines how Daan Go can leverage its current key capabilities, while maintaining its competitive position within the niche market it serves.
This report provides an overview of Daan Go, including its retail locations, ownership, management and workforce, products and services, and its current marketing strategy. Next, the report analyzes the competitive landscape by comparing Daan Go to local speciality bakeries, including direct competitor bakeries that also specialize in Euro-Asian fusion desserts, and general competitor bakeries. The reportās situational analysis examines Daan Goās operating environment through the use of a PESTEL analysis, Porterās 5 Forces, and a SWOT analysis. These analyses inform the reportās recommendations to Daan Go as to its next steps as the bakery continues to expand throughout the province.
Potential solutions to issues identified by Daan Go are evaluated through the use of multi-criteria decision matrixes. An implementation plan and marketing plan follows, including a 5-year financial projection. The reportās recommendations are based upon an analysis of Daan Go using recognized business theories and frameworks
Factor Strength Estimation in Vector and Matrix Time Series Factor Models
Most factor modelling research in vector or matrix-valued time series assume
all factors are pervasive/strong and leave weaker factors and their
corresponding series to the noise. Weaker factors can in fact be important to a
group of observed variables, for instance a sector factor in a large portfolio
of stocks may only affect particular sectors, but can be important both in
interpretations and predictions for those stocks. While more recent factor
modelling researches do consider ``local'' factors which are weak factors with
sparse corresponding factor loadings, there are real data examples in the
literature where factors are weak because of weak influence on most/all
observed variables, so that the corresponding factor loadings are not sparse
(non-local). As a first in the literature, we propose estimators of factor
strengths for both local and non-local weak factors, and prove their
consistency with rates of convergence spelt out for both vector and
matrix-valued time series factor models. Factor strength has an important
indication in what estimation procedure of factor models to follow, as well as
the estimation accuracy of various estimators (Chen and Lam, 2024). Simulation
results show that our estimators have good performance in recovering the true
factor strengths, and an analysis on the NYC taxi traffic data indicates the
existence of weak factors in the data which may not be localized
Detection and estimation of block structure in spatial weight matrix
In many economic applications, it is often of interest to categorize, classify or label individuals by groups based on similarity of observed behavior. We propose a method that captures group affiliation or, equivalently, estimates the block structure of a neighboring matrix embedded in a Spatial Econometric model. The main results of the LASSO estimator shows that off-diagonal block elements are estimated as zeros with high probability, property defined as āzero-block consistencyā. Furthermore, we present and prove zero-block consistency for the estimated spatial weight matrix even under a thin margin of interaction between groups. The tool developed in this paper can be used as a verification of block structure by applied researchers, or as an exploration tool for estimating unknown block structures. We analyzed the US Senate voting data and correctly identified blocks based on party affiliations. Simulations also show that the method performs well
Estimation and selection of spatial weight matrix in a spatial lag model
Spatial econometric models allow for interactions among variables through the specification of a spatial weight matrix. Practitioners often face the risk of misspecification of such a matrix. In many problems a number of potential specifications exist, such as geographic distances, or various economic quantities among variables. We propose estimating the best linear combination of these specifications, added with a potentially sparse adjustment matrix. The coefficients in the linear combination, together with the sparse adjustment matrix, are subjected to variable selection through the adaptive Least Absolute Shrinkage and Selection Operator (LASSO). As a special case, if no spatial weight matrices are specified, the sparse adjustment matrix becomes a sparse spatial weight matrix estimator of our model. Our method can therefore be seen as a unified framework for the estimation and selection of a spatial weight matrix. The rate of convergence of all proposed estimators are determined when the number of time series variables can grow faster than the number of time points for data, while Oracle properties for all penalized estimators are presented. Simulations and an application to stocks data confirms the good performance of our procedure
- ā¦