Search CORE

Comprehensive evaluation of matrix factorization methods for the analysis of DNA microarray gene expression data

Author: A Hubert
A Hyvarinen
AL Edwards
BS Everitt
D Dueck
DD Lee
DL Davies
EL Lehmann
HC Romesburg
HJ Chung
HJ Chung
Hwa Jeong Seo
J Bezdek
J Dunn
Je-Gun Joung
JP Brunet
Ju Han Kim
KY Yeung
M Halkidi
Mi Hyeon Kim
N Jardine
P Paatero
P Pauca
PJ Rousseeuw
PO Hoyer
PO Hoyer
Q Qi
R Fisher
R Schachtner
R Sharan
R Tibshirani
RR Sokal
S Bicciato
S Jaccard
S Ma
SL Pomeroy
SZ Li
TR Golub
VR Iyer
W Xu
WM Rand
Y Gao
Y Tan
Y Wang
Y Xu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Clustering-based methods on gene-expression analysis have been shown to be useful in biomedical applications such as cancer subtype discovery. Among them, Matrix factorization (MF) is advantageous for clustering gene expression patterns from DNA microarray experiments, as it efficiently reduces the dimension of gene expression data. Although several MF methods have been proposed for clustering gene expression patterns, a systematic evaluation has not been reported yet. Results Here we evaluated the clustering performance of orthogonal and non-orthogonal MFs by a total of nine measurements for performance in four gene expression datasets and one well-known dataset for clustering. Specifically, we employed a non-orthogonal MF algorithm, BSNMF (Bi-directional Sparse Non-negative Matrix Factorization), that applies bi-directional sparseness constraints superimposed on non-negative constraints, comprising a few dominantly co-expressed genes and samples together. Non-orthogonal MFs tended to show better clustering-quality and prediction-accuracy indices than orthogonal MFs as well as a traditional method, K-means. Moreover, BSNMF showed improved performance in these measurements. Non-orthogonal MFs including BSNMF showed also good performance in the functional enrichment test using Gene Ontology terms and biological pathways. Conclusions In conclusion, the clustering performance of orthogonal and non-orthogonal MFs was appropriately evaluated for clustering microarray data by comprehensive measurements. This study showed that non-orthogonal MFs have better performance than orthogonal MFs and <it>K</it>-means for clustering microarray data.</p

Springer - Publisher Connector

Exploring matrix factorization techniques for significant genes identification of Alzheimer’s disease microarray gene expression data

Author: A Frigyesi
A Hyvärinen
A Pascual-Montano
AE Teschendorff
AM Martoglio
CY Tsai
DD Lee
EA Fernandez
EM Blalock
EM Blalock
G Hori
H Turner
JC Patra
K Stadlthanner
L Zhu
PO Hoyer
Q Gu
RE Suri
RM Suresh
S Seal
SA Saidi
W Liebermeister
W Liu
Wei Kong
Xiaohua Hu
Xiaoyang Mou
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The wide use of high-throughput DNA microarray technology provide an increasingly detailed view of human transcriptome from hundreds to thousands of genes. Although biomedical researchers typically design microarray experiments to explore specific biological contexts, the relationships between genes are hard to identified because they are complex and noisy high-dimensional data and are often hindered by low statistical power. The main challenge now is to extract valuable biological information from the colossal amount of data to gain insight into biological processes and the mechanisms of human disease. To overcome the challenge requires mathematical and computational methods that are versatile enough to capture the underlying biological features and simple enough to be applied efficiently to large datasets. Methods Unsupervised machine learning approaches provide new and efficient analysis of gene expression profiles. In our study, two unsupervised knowledge-based matrix factorization methods, independent component analysis (ICA) and nonnegative matrix factorization (NMF) are integrated to identify significant genes and related pathways in microarray gene expression dataset of Alzheimer’s disease. The advantage of these two approaches is they can be performed as a biclustering method by which genes and conditions can be clustered simultaneously. Furthermore, they can group genes into different categories for identifying related diagnostic pathways and regulatory networks. The difference between these two method lies in ICA assume statistical independence of the expression modes, while NMF need positivity constrains to generate localized gene expression profiles. Results In our work, we performed FastICA and non-smooth NMF methods on DNA microarray gene expression data of Alzheimer’s disease respectively. The simulation results shows that both of the methods can clearly classify severe AD samples from control samples, and the biological analysis of the identified significant genes and their related pathways demonstrated that these genes play a prominent role in AD and relate the activation patterns to AD phenotypes. It is validated that the combination of these two methods is efficient. Conclusions Unsupervised matrix factorization methods provide efficient tools to analyze high-throughput microarray dataset. According to the facts that different unsupervised approaches explore correlations in the high-dimensional data space and identify relevant subspace base on different hypotheses, integrating these methods to explore the underlying biological information from microarray dataset is an efficient approach. By combining the significant genes identified by both ICA and NMF, the biological analysis shows great efficient for elucidating the molecular taxonomy of Alzheimer’s disease and enable better experimental design to further identify potential pathways and therapeutic targets of AD.</p

Springer - Publisher Connector

1/f2 Characteristics and Isotropy in the Fourier Power Spectra of Visual Art, Cartoons, Comics, Mangas, and Different Categories of Photographs

Author: A Hyvärinen
A Torralba
A van der Schaaf
AB Lee
AC Danto
AM Martinez
AS Georghiades
B Spehar
BA Olshausen
BA Olshausen
BC Hansen
C Redies
C Redies
C Redies
Christoph Redies
D Fernandez
DJ Field
DJ Graham
DJ Graham
DJ Graham
DJ Tolhurst
DL Ruderman
DL Ruderman
E Burke
G Paul
GJ Burton
GT Fechner
I Kant
J Alvarez-Ramirez
JH van Hateren
Joachim Denzler
JR Mureika
K Pearson
M Turk
Mark W. Greenlee
Michael Koch
MW Beauvois
N Goodman
PC Mahalanobis
PO Hoyer
RF Voss
RG Bosworth
RP Taylor
S Zeki
W Kandinsky
WE Vinje
WS Geisler
Y Joye
Y Yu
Publication venue: Public Library of Science
Publication date: 01/08/2010
Field of study

Art images and natural scenes have in common that their radially averaged (1D) Fourier spectral power falls according to a power-law with increasing spatial frequency (1/f2 characteristics), which implies that the power spectra have scale-invariant properties. In the present study, we show that other categories of man-made images, cartoons and graphic novels (comics and mangas), have similar properties. Further on, we extend our investigations to 2D power spectra. In order to determine whether the Fourier power spectra of man-made images differed from those of other categories of images (photographs of natural scenes, objects, faces and plants and scientific illustrations), we analyzed their 2D power spectra by principal component analysis. Results indicated that the first fifteen principal components allowed a partial separation of the different image categories. The differences between the image categories were studied in more detail by analyzing whether the mean power and the slope of the power gradients from low to high spatial frequencies varied across orientations in the power spectra. Mean power was generally higher in cardinal orientations both in real-world photographs and artworks, with no systematic difference between the two types of images. However, the slope of the power gradients showed a lower degree of mean variability across spectral orientations (i.e., more isotropy) in art images, cartoons and graphic novels than in photographs of comparable subject matters. Taken together, these results indicate that art images, cartoons and graphic novels possess relatively uniform 1/f2 characteristics across all orientations. In conclusion, the man-made stimuli studied, which were presumably produced to evoke pleasant and/or enjoyable visual perception in human observers, form a subset of all images and share statistical properties in their Fourier power spectra. Whether these properties are necessary or sufficient to induce aesthetic perception remains to be investigated

Bayesian group sparse learning for music source separation

Author: A Cichocki
A Lefevre
C Fevotte
CM Bishop
G Saon
H Lee
H Lee
H-L Hsieh
Hsin-Lung Hsieh
J Yoo
J-T Chien
J-T Chien
J-T Chien
J-T Chien
J-T Chien
Jen-Tzung Chien
M Kim
M Kim
M Marlin
M Zhong
MD Hoffman
MD Plumbley
ME Tipping
MN Schmidt
MN Schmidt
PJ Garrigues
PO Hoyer
R Jenatton
R Kompass
R Salakhutdinov
S Bengio
S Chib
S Moussaoui
SD Babacan
Z Duan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Bayesian Integration and Non-Linear Feedback Control in a Full-Body Motor Task

Author: A Pouget
AD Kuo
AJ Nagengast
B Den Brinker
B Vereijken
CD Fiorillo
CM Harris
D Kersten
D Knill
DA Winter
DB Lockhart
E Todorov
H Sveistrup
H Tassinari
H van der Kooij
Hugo L. Fernandes
Ian H. Stevenson
Iris Vilares
J Burge
J Diedrichsen
J Diedrichsen
J Izawa
J Trommershäuser
JI Gold
JI Gold
K Kording
K Preuschoff
Karl J. Friston
Konrad P. Körding
KP Körding
KP Körding
Kunlin Wei
M Rushworth
M Woollacott
MC Dault
MO Ernst
MO Ernst
P Cisek
PO Hoyer
R Kiani
R Shadmehr
RE Kalman
RJ Peterka
RS Zemel
S Deneve
S Schaal
SH Scott
T Behrens
T Kiemel
WJ Ma
Publication venue: Public Library of Science
Publication date: 01/12/2009
Field of study

A large number of experiments have asked to what degree human reaching movements can be understood as being close to optimal in a statistical sense. However, little is known about whether these principles are relevant for other classes of movements. Here we analyzed movement in a task that is similar to surfing or snowboarding. Human subjects stand on a force plate that measures their center of pressure. This center of pressure affects the acceleration of a cursor that is displayed in a noisy fashion (as a cloud of dots) on a projection screen while the subject is incentivized to keep the cursor close to a fixed position. We find that salient aspects of observed behavior are well-described by optimal control models where a Bayesian estimation model (Kalman filter) is combined with an optimal controller (either a Linear-Quadratic-Regulator or Bang-bang controller). We find evidence that subjects integrate information over time taking into account uncertainty. However, behavior in this continuous steering task appears to be a highly non-linear function of the visual feedback. While the nervous system appears to implement Bayes-like mechanisms for a full-body, dynamic task, it may additionally take into account the specific costs and constraints of the task

Quantitative historical analysis uncovers a single dimension of complexity that structures global variation in human social organization.

Author: Baines John
Baker David
Bidmead Julye
Bogaard Amy
Bol Peter
Brandl Eva
Bridges Elizabeth
Ceccarelli Alessandro
Cesaretti Rudolf
Christian David
Cioni Enrico
Collins Christina
Cook Connie
Covey Alan
Currie Thomas E
Dupeyron Agathe
Feeney Kevin
Feinman Gary
Figliulo-Rosswurm Joe
François Pieter
Grohmann Stephanie
Hoyer Daniel
Jordan Greine
Júlíusson Árni Daníel
Korotayev Andrey
Kradin Nikolay
Kristinsson Axel
Krueger Marta
Levine Jill
Lockhart Bruce
Mair Victor
Manning Joseph
Marciniak Arkadiusz
Mendel-Gleason Gavin
Miksic John
Mostern Ruth
Mullins Daniel
Palmisano Alessio
Peregrine Peter
Petrie Cameron
Preiser-Kapeller Johannes
Reddish Jenny
Rudiak-Gould Peter
Savage Patrick
Spencer Charles
Ter Haar Barend
Tuan Po-Ju
Turchin Peter
Turner Edward
Wallace Vesna
Whitehouse Harvey
Williams Alice
Xie Liye
Publication venue: Proc Natl Acad Sci U S A
Publication date: 16/11/2017
Field of study

Do human societies from around the world exhibit similarities in the way that they are structured, and show commonalities in the ways that they have evolved? These are long-standing questions that have proven difficult to answer. To test between competing hypotheses, we constructed a massive repository of historical and archaeological information known as "Seshat: Global History Databank." We systematically coded data on 414 societies from 30 regions around the world spanning the last 10,000 years. We were able to capture information on 51 variables reflecting nine characteristics of human societies, such as social scale, economy, features of governance, and information systems. Our analyses revealed that these different characteristics show strong relationships with each other and that a single principal component captures around three-quarters of the observed variation. Furthermore, we found that different characteristics of social complexity are highly predictable across different world regions. These results suggest that key aspects of social organization are functionally related and do indeed coevolve in predictable ways. Our findings highlight the power of the sciences and humanities working together to rigorously test hypotheses about general rules that may have shaped human history

Harvard University - DASH

Opin visindi

Edinburgh Research Explorer

eScholarship - University of California

Apollo (Cambridge)

Chapman University Digital Commons

Emergence of Visual Saliency from Natural Scenes via Context-Mediated Probability Distributions Coding

Author: A Hyvarinen
A Hyvarinen
A Olmos
A Torralba
AJ Bell
AM Treisman
B Julesz
BA Olshausen
BA Olshausen
BW Tatler
C Kayser
C Koch
D Field
D Gao
D Gao
D Gao
EP Simoncelli
EP Simoncelli
F Attneave
G Felsen
GC DeAngelis
H Barlow
HJ Seo
JH van Hateren
JH van Hateren
Jinhua Xu
JJ Atick
JM Wolf
Joe Z. Tsien
L Itti
L Itti
L Itti
L Itti
L Zhang
L Zhang
L Zhaoping
M Carandini
Matjaz Perc
MS Caywood
NC Rust
ND Bruce
NDB Bruce
O Le Meur
PO Hoyer
RP Rao
RPN Rao
T Wachtler
TD Albright
WE Vinje
WJ Ma
WS Geisler
X Chen
Y Karklin
Z Li
Zhiyong Yang
Publication venue: Public Library of Science
Publication date: 29/12/2010
Field of study

Visual saliency is the perceptual quality that makes some items in visual scenes stand out from their immediate contexts. Visual saliency plays important roles in natural vision in that saliency can direct eye movements, deploy attention, and facilitate tasks like object detection and scene understanding. A central unsolved issue is: What features should be encoded in the early visual cortex for detecting salient features in natural scenes? To explore this important issue, we propose a hypothesis that visual saliency is based on efficient encoding of the probability distributions (PDs) of visual variables in specific contexts in natural scenes, referred to as context-mediated PDs in natural scenes. In this concept, computational units in the model of the early visual system do not act as feature detectors but rather as estimators of the context-mediated PDs of a full range of visual variables in natural scenes, which directly give rise to a measure of visual saliency of any input stimulus. To test this hypothesis, we developed a model of the context-mediated PDs in natural scenes using a modified algorithm for independent component analysis (ICA) and derived a measure of visual saliency based on these PDs estimated from a set of natural scenes. We demonstrated that visual saliency based on the context-mediated PDs in natural scenes effectively predicts human gaze in free-viewing of both static and dynamic natural scenes. This study suggests that the computation based on the context-mediated PDs of visual variables in natural scenes may underlie the neural mechanism in the early visual cortex for detecting salient features in natural scenes

Chapman University Digital Commons

Quantitative Historical Analysis Uncovers a Single Dimension of Complexity that Structures Global Variation in Human Social Organization

Author: Baines John
Baker David
Bidmead Julye
Bogaard Amy
Bol Peter
Brandl Eva
Bridges Elizabeth
Ceccarelli Alessandro
Cesaretti Rudolf
Christian David
Cioni Enrico
Collins Christina
Cook Connie
Covey Alan
Currie Thomas E.
Dupeyron Agathe
Feeney Kevin
Feinman Gary
Figliulo-Rosswurm Joe
François Pieter
Grohmann Stephanie
Hoyer Daniel
Jordan Greine
Júlíusson Árni Daníel
Korotayev Andrey
Kradin Nikolay
Kristinsson Axel
Krueger Marta
Levine Jill
Lockhart Bruce
Mair Victor
Manning Joseph
Marciniak Arkadiusz
Mendel-Gleason Gavin
Miksic John
Mostern Ruth
Mullins Daniel
Palmisano Alessio
Peregrine Peter
Petrie Camero
Preiser-Kapeller Johannes
Reddish Jenny
Rudiak-Gould Peter
Savage Patrick
Spencer Charles
ter Haar Barend
Tuan Po-Ju
Turchin Peter
Turner Edward
Wallace Vesna
Whitehouse Harvey
Williams Alice
Xie Liye
Publication venue: Chapman University Digital Commons
Publication date: 16/11/2017
Field of study

Do human societies from around the world exhibit similarities in the way that they are structured, and show commonalities in the ways that they have evolved? These are long-standing questions that have proven difficult to answer. To test between competing hypotheses, we constructed a massive repository of historical and archaeological information known as “Seshat: Global History Databank.” We systematically coded data on 414 societies from 30 regions around the world spanning the last 10,000 years. We were able to capture information on 51 variables reflecting nine characteristics of human societies, such as social scale, economy, features of governance, and information systems. Our analyses revealed that these different characteristics show strong relationships with each other and that a single principal component captures around three-quarters of the observed variation. Furthermore, we found that different characteristics of social complexity are highly predictable across different world regions. These results suggest that key aspects of social organization are functionally related and do indeed coevolve in predictable ways. Our findings highlight the power of the sciences and humanities working together to rigorously test hypotheses about general rules that may have shaped human history

An Efficient Coding Hypothesis Links Sparsity and Selectivity of Neural Responses

Author: A Hyvarinen
A Hyvarinen
A Hyvrinen
AJ Bell
AS Andalman
AW Moreau
BA Olshausen
BA Olshausen
D Aronov
D Griffin
D Margoliash
D Margoliash
DD Lee
DM Green
E Tumer
EC Smith
EE Bauer
F Nottebohm
FE Theunissen
FE Theunissen
Florian Blättler
G Bi
G Greene
G Laurent
G Vates
HB Barlow
IG Davison
IR Fiete
J Perez-Orive
J Perez-Orive
JA Grace
JE Heiss
JF Prather
JK Jun
K Nagel
K Sen
M Plumbley
M Weliky
MJ Coleman
MJ Rosen
MS Brainard
MS Lewicki
N Amin
N Amin
N Amin
P D'Souza
PO Hoyer
R Mooney
RE Crist
Richard H. R. Hahnloser
RQ Quiroga
S Hochstein
S Waydo
SM Woolley
SM Woolley
Stefan J. Kiebel
T Hromdka
T Nick
T Sharpee
TQ Gentner
Publication venue: Public Library of Science
Publication date: 01/10/2011
Field of study

To what extent are sensory responses in the brain compatible with first-order principles? The efficient coding hypothesis projects that neurons use as few spikes as possible to faithfully represent natural stimuli. However, many sparsely firing neurons in higher brain areas seem to violate this hypothesis in that they respond more to familiar stimuli than to nonfamiliar stimuli. We reconcile this discrepancy by showing that efficient sensory responses give rise to stimulus selectivity that depends on the stimulus-independent firing threshold and the balance between excitatory and inhibitory inputs. We construct a cost function that enforces minimal firing rates in model neurons by linearly punishing suprathreshold synaptic currents. By contrast, subthreshold currents are punished quadratically, which allows us to optimally reconstruct sensory inputs from elicited responses. We train synaptic currents on many renditions of a particular bird's own song (BOS) and few renditions of conspecific birds' songs (CONs). During training, model neurons develop a response selectivity with complex dependence on the firing threshold. At low thresholds, they fire densely and prefer CON and the reverse BOS (REV) over BOS. However, at high thresholds or when hyperpolarized, they fire sparsely and prefer BOS over REV and over CON. Based on this selectivity reversal, our model suggests that preference for a highly familiar stimulus corresponds to a high-threshold or strong-inhibition regime of an efficient coding strategy. Our findings apply to songbird mirror neurons, and in general, they suggest that the brain may be endowed with simple mechanisms to rapidly change selectivity of neural responses to focus sensory processing on either familiar or nonfamiliar stimuli. In summary, we find support for the efficient coding hypothesis and provide new insights into the interplay between the sparsity and selectivity of neural responses

Repository for Publications and Research Data