106,616 research outputs found
The Shape of Art History in the Eyes of the Machine
How does the machine classify styles in art? And how does it relate to art
historians' methods for analyzing style? Several studies have shown the ability
of the machine to learn and predict style categories, such as Renaissance,
Baroque, Impressionism, etc., from images of paintings. This implies that the
machine can learn an internal representation encoding discriminative features
through its visual analysis. However, such a representation is not necessarily
interpretable. We conducted a comprehensive study of several of the
state-of-the-art convolutional neural networks applied to the task of style
classification on 77K images of paintings, and analyzed the learned
representation through correlation analysis with concepts derived from art
history. Surprisingly, the networks could place the works of art in a smooth
temporal arrangement mainly based on learning style labels, without any a
priori knowledge of time of creation, the historical time and context of
styles, or relations between styles. The learned representations showed that
there are few underlying factors that explain the visual variations of style in
art. Some of these factors were found to correlate with style patterns
suggested by Heinrich W\"olfflin (1846-1945). The learned representations also
consistently highlighted certain artists as the extreme distinctive
representative of their styles, which quantitatively confirms art historian
observations
On orthogonal projections for dimension reduction and applications in augmented target loss functions for learning problems
The use of orthogonal projections on high-dimensional input and target data
in learning frameworks is studied. First, we investigate the relations between
two standard objectives in dimension reduction, preservation of variance and of
pairwise relative distances. Investigations of their asymptotic correlation as
well as numerical experiments show that a projection does usually not satisfy
both objectives at once. In a standard classification problem we determine
projections on the input data that balance the objectives and compare
subsequent results. Next, we extend our application of orthogonal projections
to deep learning tasks and introduce a general framework of augmented target
loss functions. These loss functions integrate additional information via
transformations and projections of the target data. In two supervised learning
problems, clinical image segmentation and music information classification, the
application of our proposed augmented target loss functions increase the
accuracy
Language Identification Using Visual Features
Automatic visual language identification (VLID) is the technology of using information derived from the visual appearance and movement of the speech articulators to iden- tify the language being spoken, without the use of any audio information. This technique for language identification (LID) is useful in situations in which conventional audio processing is ineffective (very noisy environments), or impossible (no audio signal is available). Research in this field is also beneficial in the related field of automatic lip-reading. This paper introduces several methods for visual language identification (VLID). They are based upon audio LID techniques, which exploit language phonology and phonotactics to discriminate languages. We show that VLID is possible in a speaker-dependent mode by discrimi- nating different languages spoken by an individual, and we then extend the technique to speaker-independent operation, taking pains to ensure that discrimination is not due to artefacts, either visual (e.g. skin-tone) or audio (e.g. rate of speaking). Although the low accuracy of visual speech recognition currently limits the performance of VLID, we can obtain an error-rate of < 10% in discriminating between Arabic and English on 19 speakers and using about 30s of visual speech
Simultaneous Learning of Nonlinear Manifold and Dynamical Models for High-dimensional Time Series
The goal of this work is to learn a parsimonious and informative representation for high-dimensional time series. Conceptually, this comprises two distinct yet tightly coupled tasks: learning a low-dimensional manifold and modeling the dynamical process. These two tasks have a complementary relationship as the temporal constraints provide valuable neighborhood information for dimensionality reduction and conversely, the low-dimensional space allows dynamics to be learnt efficiently. Solving these two tasks simultaneously allows important information to be exchanged mutually. If nonlinear models are required to capture the rich complexity of time series, then the learning problem becomes harder as the nonlinearities in both tasks are coupled. The proposed solution approximates the nonlinear manifold and dynamics using piecewise linear models. The interactions among the linear models are captured in a graphical model. By exploiting the model structure, efficient inference and learning algorithms are obtained without oversimplifying the model of the underlying dynamical process. Evaluation of the proposed framework with competing approaches is conducted in three sets of experiments: dimensionality reduction and reconstruction using synthetic time series, video synthesis using a dynamic texture database, and human motion synthesis, classification and tracking on a benchmark data set. In all experiments, the proposed approach provides superior performance.National Science Foundation (IIS 0308213, IIS 0329009, CNS 0202067
Sparse Bilinear Logistic Regression
In this paper, we introduce the concept of sparse bilinear logistic
regression for decision problems involving explanatory variables that are
two-dimensional matrices. Such problems are common in computer vision,
brain-computer interfaces, style/content factorization, and parallel factor
analysis. The underlying optimization problem is bi-convex; we study its
solution and develop an efficient algorithm based on block coordinate descent.
We provide a theoretical guarantee for global convergence and estimate the
asymptotical convergence rate using the Kurdyka-{\L}ojasiewicz inequality. A
range of experiments with simulated and real data demonstrate that sparse
bilinear logistic regression outperforms current techniques in several
important applications.Comment: 27 pages, 5 figure
Nonlinear denoising of transient signals with application to event related potentials
We present a new wavelet based method for the denoising of {\it event related
potentials} ERPs), employing techniques recently developed for the paradigm of
deterministic chaotic systems. The denoising scheme has been constructed to be
appropriate for short and transient time sequences using circular state space
embedding. Its effectiveness was successfully tested on simulated signals as
well as on ERPs recorded from within a human brain. The method enables the
study of individual ERPs against strong ongoing brain electrical activity.Comment: 16 pages, Postscript, 6 figures, Physica D in pres
- …