28,462 research outputs found
The EM Algorithm and the Rise of Computational Biology
In the past decade computational biology has grown from a cottage industry
with a handful of researchers to an attractive interdisciplinary field,
catching the attention and imagination of many quantitatively-minded
scientists. Of interest to us is the key role played by the EM algorithm during
this transformation. We survey the use of the EM algorithm in a few important
computational biology problems surrounding the "central dogma"; of molecular
biology: from DNA to RNA and then to proteins. Topics of this article include
sequence motif discovery, protein sequence alignment, population genetics,
evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Large Covariance Estimation by Thresholding Principal Orthogonal Complements
This paper deals with the estimation of a high-dimensional covariance with a
conditional sparsity structure and fast-diverging eigenvalues. By assuming
sparse error covariance matrix in an approximate factor model, we allow for the
presence of some cross-sectional correlation even after taking out common but
unobservable factors. We introduce the Principal Orthogonal complEment
Thresholding (POET) method to explore such an approximate factor structure with
sparsity. The POET estimator includes the sample covariance matrix, the
factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding
estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator
(Cai and Liu, 2011) as specific examples. We provide mathematical insights when
the factor analysis is approximately the same as the principal component
analysis for high-dimensional data. The rates of convergence of the sparse
residual covariance matrix and the conditional sparse covariance matrix are
studied under various norms. It is shown that the impact of estimating the
unknown factors vanishes as the dimensionality increases. The uniform rates of
convergence for the unobserved factors and their factor loadings are derived.
The asymptotic results are also verified by extensive simulation studies.
Finally, a real data application on portfolio allocation is presented
Computer Analysis of Architecture Using Automatic Image Understanding
In the past few years, computer vision and pattern recognition systems have
been becoming increasingly more powerful, expanding the range of automatic
tasks enabled by machine vision. Here we show that computer analysis of
building images can perform quantitative analysis of architecture, and quantify
similarities between city architectural styles in a quantitative fashion.
Images of buildings from 18 cities and three countries were acquired using
Google StreetView, and were used to train a machine vision system to
automatically identify the location of the imaged building based on the image
visual content. Experimental results show that the automatic computer analysis
can automatically identify the geographical location of the StreetView image.
More importantly, the algorithm was able to group the cities and countries and
provide a phylogeny of the similarities between architectural styles as
captured by StreetView images. These results demonstrate that computer vision
and pattern recognition algorithms can perform the complex cognitive task of
analyzing images of buildings, and can be used to measure and quantify visual
similarities and differences between different styles of architectures. This
experiment provides a new paradigm for studying architecture, based on a
quantitative approach that can enhance the traditional manual observation and
analysis. The source code used for the analysis is open and publicly available
- …