Search CORE

692 research outputs found

Representing complex data using localized principal components with application to astronomical data

Author: A Gersho
A Gorban
AH Monaghan
AR Webb
B Chalmond
B Kégl
C Allende Prieto
CAL Bailer-Jones
CAL Bailer-Jones
DJ Marchette
E Diday
E Oja
EC Malthouse
EM Braverman
FL Hall
H Hotelling
H Späth
H Wold
IT Jolliffe
J Einbeck
J Einbeck
JH Friedman
JH Friedman
JH Friedman
JJ Verbeek
JM Chambers
K Fukunaga
K Hornik
L Breiman
MAC Perryman
MG Kendall
N Kambhatla
P Delicado
P Delicado
PG Willemsen
R Tibshirani
RJ Bolton
S de Jong
T Aluja-Banet
T Duchamps
T Hastie
T Hastie
WS Cleveland
Z-Y Liu
Publication venue
Publication date: 01/01/2007
Field of study

Often the relation between the variables constituting a multivariate data space might be characterized by one or more of the terms: ``nonlinear'', ``branched'', ``disconnected'', ``bended'', ``curved'', ``heterogeneous'', or, more general, ``complex''. In these cases, simple principal component analysis (PCA) as a tool for dimension reduction can fail badly. Of the many alternative approaches proposed so far, local approximations of PCA are among the most promising. This paper will give a short review of localized versions of PCA, focusing on local principal curves and local partitioning algorithms. Furthermore we discuss projections other than the local principal components. When performing local dimension reduction for regression or classification problems it is important to focus not only on the manifold structure of the covariates, but also on the response variable(s). Local principal components only achieve the former, whereas localized regression approaches concentrate on the latter. Local projection directions derived from the partial least squares (PLS) algorithm offer an interesting trade-off between these two objectives. We apply these methods to several real data sets. In particular, we consider simulated astrophysical data from the future Galactic survey mission Gaia.Comment: 25 pages. In "Principal Manifolds for Data Visualization and Dimension Reduction", A. Gorban, B. Kegl, D. Wunsch, and A. Zinovyev (eds), Lecture Notes in Computational Science and Engineering, Springer, 2007, pp. 180--204, http://www.springer.com/dal/home/generic/search/results?SGWID=1-40109-22-173750210-

arXiv.org e-Print Archive

Durham Research Online

Crossref

Enlighten

Explore Bristol Research

Survival associated pathway identification with group Lp penalized global AUC maximization

Author: A Kaban
F Bach
H Van Houwelingen
H Zou
H Zou
I Sohn
J Fan
J Gui
K Elenitoba-Johnson
L Kanehisa
L Meier
L Tian
Laurence S Magder
Li Mao
M Jordan
M Park
M Pepe
M Pepe
M Segal
P Heagerty
R Tibshirani
R Tibshirani
S Dave
S Ma
S Ma
T Hastie
Terry Hyslop
Z Liu
Z Liu
Z Liu
Z Liu
Z Wei
Zhenqiu Liu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

It has been demonstrated that genes in a cell do not act independently. They interact with one another to complete certain biological processes or to implement certain molecular functions. How to incorporate biological pathways or functional groups into the model and identify survival associated gene pathways is still a challenging problem. In this paper, we propose a novel iterative gradient based method for survival analysis with group Lp penalized global AUC summary maximization. Unlike LASSO, Lp (p < 1) (with its special implementation entitled adaptive LASSO) is asymptotic unbiased and has oracle properties [1]. We first extend Lp for individual gene identification to group Lp penalty for pathway selection, and then develop a novel iterative gradient algorithm for penalized global AUC summary maximization (IGGAUCS). This method incorporates the genetic pathways into global AUC summary maximization and identifies survival associated pathways instead of individual genes. The tuning parameters are determined using 10-fold cross validation with training data only. The prediction performance is evaluated using test data. We apply the proposed method to survival outcome analysis with gene expression profile and identify multiple pathways simultaneously. Experimental results with simulation and gene expression data demonstrate that the proposed procedures can be used for identifying important biological pathways that are related to survival phenotype and for building a parsimonious model for predicting the survival times

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Jefferson Digital Commons

Indexes to Find the Optimal Number of Clusters in a Hierarchical Clustering

Author: A Fahad
A Sharma
AK Patnaik
David L. Davies
H Guo
HM Krumholz
I Uchiyama
J Dean
JC Dunn
JM Luna-Romera
PJ Rousseeuw
R Pérez-Chacón
T Hastie
Y Loewenstein
Z Su
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Clustering analysis is one of the most commonly used techniques for uncovering patterns in data mining. Most clustering methods require establishing the number of clusters beforehand. However, due to the size of the data currently used, predicting that value is at a high computational cost task in most cases. In this article, we present a clustering technique that avoids this requirement, using hierarchical clustering. There are many examples of this procedure in the literature, most of them focusing on the dissociative or descending subtype, while in this article we cover the agglomerative or ascending subtype. Being more expensive in computational and temporal cost, it nevertheless allows us to obtain very valuable information, regarding elements membership to clusters and their groupings, that is to say, their dendrogram. Finally, several sets of data have been used, varying their dimensionality. For each of them, we provide the calculations of internal validation indexes to test the algorithm developed, studying which of them provides better results to obtain the best possible clustering

Crossref

idUS. Depósito de Investigación Universidad de Sevilla

Multi-Target Prediction: A Unifying View on Problems and Methods

Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

V3 Loop Sequence Space Analysis Suggests Different Evolutionary Patterns of CCR5- and CXCR4-Tropic HIV

Author: A Altmann
AJ Low
Alexander Thielen
B Chesebro
CC Bleul
Derya Unutmaz
DH Huson
EA Berger
F Miedema
H Bandelt
H Deng
H Schuitemaker
J Shepherd
JA Nelson
Katarzyna Bozek
L Milich
L Waters
L Zhang
MA Jensen
NG Hoffman
PJ Rousseeuw
R Shankarappa
RA Fouchier
Rolf Kaiser
RP van Rij
S Henikoff
S Pillai
Saleta Sierra
T Hastie
T McNearney
T Sing
T Sing
Thomas Lengauer
TW Chun
W Resch
Z Yang
Z Yang
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

The V3 loop of human immunodeficiency virus type 1 (HIV-1) is critical for coreceptor binding and is the main determinant of which of the cellular coreceptors, CCR5 or CXCR4, the virus uses for cell entry. The aim of this study is to provide a large-scale data driven analysis of HIV-1 coreceptor usage with respect to the V3 loop evolution and to characterize CCR5- and CXCR4-tropic viral phenotypes previously studied in small- and medium-scale settings. We use different sequence similarity measures, phylogenetic and clustering methods in order to analyze the distribution in sequence space of roughly 1000 V3 loop sequences and their tropism phenotypes. This analysis affords a means of characterizing those sequences that are misclassified by several sequence-based coreceptor prediction methods, as well as predicting the coreceptor using the location of the sequence in sequence space and of relating this location to the CD4+ T-cell count of the patient. We support previous findings that the usage of CCR5 is correlated with relatively high sequence conservation whereas CXCR4-tropic viruses spread over larger regions in sequence space. The incorrectly predicted sequences are mostly located in regions in which their phenotype represents the minority or in close vicinity of regions dominated by the opposite phenotype. Nevertheless, the location of the sequence in sequence space can be used to improve the accuracy of the prediction of the coreceptor usage. Sequences from patients with high CD4+ T-cell counts are relatively highly conserved as compared to those of immunosuppressed patients. Our study thus supports hypotheses of an association of immune system depletion with an increase in V3 loop sequence variability and with the escape of the viral sequence to distant parts of the sequence space

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MPG.PuRe

Weighted Fisher Discriminant Analysis in the Input and Feature Spaces

Author: BN Parlett
H Mohammadzade
J Ah-Pine
J Friedman
J Nocedal
JL Alperin
JS Hamid
M Loog
P Jain
PN Belhumeur
R Lotlikar
RA Fisher
S Boyd
T Hastie
V Perlibakas
XY Zhang
YQ Wang
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/04/2020
Field of study

Fisher Discriminant Analysis (FDA) is a subspace learning method which minimizes and maximizes the intra- and inter-class scatters of data, respectively. Although, in FDA, all the pairs of classes are treated the same way, some classes are closer than the others. Weighted FDA assigns weights to the pairs of classes to address this shortcoming of FDA. In this paper, we propose a cosine-weighted FDA as well as an automatically weighted FDA in which weights are found automatically. We also propose a weighted FDA in the feature space to establish a weighted kernel FDA for both existing and newly proposed weights. Our experiments on the ORL face recognition dataset show the effectiveness of the proposed weighting schemes.Comment: Accepted (to appear) in International Conference on Image Analysis and Recognition (ICIAR) 2020, Springe

arXiv.org e-Print Archive

Crossref

Dating Phylogenies with Hybrid Local Molecular Clocks

Author: A Rambaut
AB Smith
AD Yoder
AJ Drummond
AW Edwards
B Prud'homme
B Rannala
E Zuckerkandl
ER Seiffert
F Ronquist
H Amrine-Madsen
H Kishino
H Kishino
H Won
J Felsenstein
JA Hartigan
JJ Welch
JL Thorne
JP Huelsenbeck
KS Pollard
L Bromham
L Kaufman
M Hasegawa
M Pagel
M van der Laan
MJ Sanderson
MJ Sanderson
MJ Sanderson
N Goldman
Oliver Pybus
R Tibshirani
S Aris-Brosou
S Aris-Brosou
S Aris-Brosou
S Tavare
Stéphane Aris-Brosou
SY Ho
T Hastie
Z Yang
Z Yang
Z Yang
Z Yang
Z Yang
Z Yang
Publication venue: Public Library of Science
Publication date: 12/09/2007
Field of study

BACKGROUND: Because rates of evolution and species divergence times cannot be estimated directly from molecular data, all current dating methods require that specific assumptions be made before inferring any divergence time. These assumptions typically bear either on rates of molecular evolution (molecular clock hypothesis, local clocks models) or on both rates and times (penalized likelihood, Bayesian methods). However, most of these assumptions can affect estimated dates, oftentimes because they underestimate large amounts of rate change. PRINCIPAL FINDINGS: A significant modification to a recently proposed ad hoc rate-smoothing algorithm is described, in which local molecular clocks are automatically placed on a phylogeny. This modification makes use of hybrid approaches that borrow from recent theoretical developments in microarray data analysis. An ad hoc integration of phylogenetic uncertainty under these local clock models is also described. The performance and accuracy of the new methods are evaluated by reanalyzing three published data sets. CONCLUSIONS: It is shown that the new maximum likelihood hybrid methods can perform better than penalized likelihood and almost as well as uncorrelated Bayesian models. However, the new methods still tend to underestimate the actual amount of rate change. This work demonstrates the difficulty of estimating divergence times using local molecular clocks

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Vertical-external-cavity surface-emitting lasers and quantum dot lasers

Author: A. A. Lagatsy
A. Garnache
A. J. Kemp
A. Lenz
A. Mutig
A. R. Albrecht
A. Rantamäki
A. Schliwa
A. Schliwa
A. Strittmatter
A. Strittmatter
B. Lita
B. S. Song
C. B. Murray
C. Ribbat
D. Bimberg
D. Guimard
D. Leonard
D. Lorenser
D. Ouyang
E. J. Saarinen
E. Kapon
F. Heinrichsdorff
F. Heinrichsdorff
G. Bester
G. C. Shan
G. C. Shan
G. Shan
H. Jiang
H. Kumano
H. Li
I. Kaminow
I. P. Marko
I. R. Sellers
J. Chilla
J. E. Hastie
J. E. Hastie
J. I. Cirac
J. Konttinen
J. M. Gerard
J. P. Reithmaier
J. Rautiainen
K. Hennessy
K. J. Vahala
L. A. Coldren
L. Fan
L. Fan
L. Fan
M. Asada
M. Fallahi
M. Kuznetsov
M. Nomura
M. Pelton
M. Pelton
M. V. Maximov
M. Yamaguchi
N. Kirstaedter
N. Kirstaedter
O. B. Shchekin
O. G. Okhotnikov
P. J. Yao
P. Lodahl
P. Michler
R. Diehl
R. Haring
R. L. Sellin
R. L. Sellin
S. A. Blokhin
S. Fathpour
S. Giet
S. Giet
S. Hilbich
S. Lutgen
S. S. Mikhrin
S. Strauf
S. Strauf
T. D. Germann
T. D. Germann
T. D. Germann
T. D. Germann
T. Vallaitis
T. Vallaitis
T. Yoshie
U. Keller
V. A. Fonoberov
W. J. Alford
X. H. Zhao
Y. He
Z. Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2012
Field of study

The use of cavity to manipulate photon emission of quantum dots (QDs) has been opening unprecedented opportunities for realizing quantum functional nanophotonic devices and also quantum information devices. In particular, in the field of semiconductor lasers, QDs were introduced as a superior alternative to quantum wells to suppress the temperature dependence of the threshold current in vertical-external-cavity surface-emitting lasers (VECSELs). In this work, a review of properties and development of semiconductor VECSEL devices and QD laser devices is given. Based on the features of VECSEL devices, the main emphasis is put on the recent development of technological approach on semiconductor QD VECSELs. Then, from the viewpoint of both single QD nanolaser and cavity quantum electrodynamics (QED), a single-QD-cavity system resulting from the strong coupling of QD cavity is presented. A difference of this review from the other existing works on semiconductor VECSEL devices is that we will cover both the fundamental aspects and technological approaches of QD VECSEL devices. And lastly, the presented review here has provided a deep insight into useful guideline for the development of QD VECSEL technology and future quantum functional nanophotonic devices and monolithic photonic integrated circuits (MPhICs).Comment: 21 pages, 4 figures. arXiv admin note: text overlap with arXiv:0904.369

arXiv.org e-Print Archive

Crossref

Peak intensity prediction in MALDI-TOF mass spectrometry: A machine learning study to support quantitative proteomics

Author: A Savitzky
A Scherbart
AHP America
Alexandra Scherbart
B Schölkopf
BM Mayr
C Ji
CJ Burges
D Buhrman
D Radulovic
DJ Pappin
E Dimitriadou
E Mirgorodskaya
F Meyer
G Khanarian
H Naderi-Manesh
H Neubert
H Ritter
H Tang
J Listgarten
JL Fauchére
L Breiman
L Breiman
M Anderle
M Bantscheff
M Vásquez
MCJ Wilce
N Hansmeier
Oliver Kohlbacher
P Lu
P Mallick
PJ Millington
PL Ross
R Development Core Team
S Gay
S Kawashima
SA Gerber
SE Ong
Sebastian Böcker
SP Gygi
T Hastie
T Hastie
T Kohonen
Tim W Nattkemper
VN Vapnik
Wiebke Timm
WS Cleveland
X Yao
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Timm W, Scherbart A, Boecker S, Kohlbacher O, Nattkemper TW. Peak intensity prediction in MALDI-TOF mass spectrometry: A machine learning study to support quantitative proteomics. BMC Bioinformatics. 2008;9(1):443.Background: Mass spectrometry is a key technique in proteomics and can be used to analyze complex samples quickly. One key problem with the mass spectrometric analysis of peptides and proteins, however, is the fact that absolute quantification is severely hampered by the unclear relationship between the observed peak intensity and the peptide concentration in the sample. While there are numerous approaches to circumvent this problem experimentally (e. g. labeling techniques), reliable prediction of the peak intensities from peptide sequences could provide a peptide-specific correction factor. Thus, it would be a valuable tool towards label-free absolute quantification. Results: In this work we present machine learning techniques for peak intensity prediction for MALDI mass spectra. Features encoding the peptides' physico-chemical properties as well as string-based features were extracted. A feature subset was obtained from multiple forward feature selections on the extracted features. Based on these features, two advanced machine learning methods (support vector regression and local linear maps) are shown to yield good results for this problem (Pearson correlation of 0.68 in a ten-fold cross validation). Conclusion: The techniques presented here are a useful first step going beyond the binary prediction of proteotypic peptides towards a more quantitative prediction of peak intensities. These predictions in turn will turn out to be beneficial for mass spectrometry-based quantitative proteomics

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University

Multiple Frequencies Sequential Coding for SSVEP-Based Brain-Computer Interface

Author: C Jia
CS Herrmann
D Regan
D Yao
Dezhong Yao
DR Brillinger
DR Hardoon
E Donchin
F Teng
FB Vialatte
G Bin
G Bin
G Pfurtscheller
G Pfurtscheller
H Cecotti
H Cecotti
J Ding
JR Wolpaw
Jun Hu
KK Shyu
M Cheng
M Middendorf
O Friman
O Friman
P Xu
Pedro Antonio Valdes-Sosa
Peng Xu
PL Lee
R Panicker
Rui Zhang
T Hastie
T Melzer
Tiejun Liu
TM Srihari Mukesh
Y Wang
Y Wang
Yangsong Zhang
YT Wang
Z Lin
Z Wu
Z Wu
Z Yan
Publication venue: Public Library of Science
Publication date: 06/03/2012
Field of study

BACKGROUND: Steady-state visual evoked potential (SSVEP)-based brain-computer interface (BCI) has become one of the most promising modalities for a practical noninvasive BCI system. Owing to both the limitation of refresh rate of liquid crystal display (LCD) or cathode ray tube (CRT) monitor, and the specific physiological response property that only a very small number of stimuli at certain frequencies could evoke strong SSVEPs, the available frequencies for SSVEP stimuli are limited. Therefore, it may not be enough to code multiple targets with the traditional frequencies coding protocols, which poses a big challenge for the design of a practical SSVEP-based BCI. This study aimed to provide an innovative coding method to tackle this problem. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we present a novel protocol termed multiple frequencies sequential coding (MFSC) for SSVEP-based BCI. In MFSC, multiple frequencies are sequentially used in each cycle to code the targets. To fulfill the sequential coding, each cycle is divided into several coding epochs, and during each epoch, certain frequency is used. Obviously, different frequencies or the same frequency can be presented in the coding epochs, and the different epoch sequence corresponds to the different targets. To show the feasibility of MFSC, we used two frequencies to realize four targets and carried on an offline experiment. The current study shows that: 1) MFSC is feasible and efficient; 2) the performance of SSVEP-based BCI based on MFSC can be comparable to some existed systems. CONCLUSIONS/SIGNIFICANCE: The proposed protocol could potentially implement much more targets with the limited available frequencies compared with the traditional frequencies coding protocol. The efficiency of the new protocol was confirmed by real data experiment. We propose that the SSVEP-based BCI under MFSC might be a promising choice in the future

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central