115 research outputs found

    Penalized Partial Least Squares Based on B-Splines Transformations

    Get PDF
    We propose a novel method to model nonlinear regression problems by adapting the principle of penalization to Partial Least Squares (PLS). Starting with a generalized additive model, we expand the additive component of each variable in terms of a generous amount of B-Splines basis functions. In order to prevent overfitting and to obtain smooth functions, we estimate the regression model by applying a penalized version of PLS. Although our motivation for penalized PLS stems from its use for B-Splines transformed data, the proposed approach is very general and can be applied to other penalty terms or to other dimension reduction techniques. It turns out that penalized PLS can be computed virtually as fast as PLS. We prove a close connection of penalized PLS to the solutions of preconditioned linear systems. In the case of high-dimensional data, the new method is shown to be an attractive competitor to other techniques for estimating generalized additive models. If the number of predictor variables is high compared to the number of examples, traditional techniques often suffer from overfitting. We illustrate that penalized PLS performs well in these situations

    Multi-score Learning for Affect Recognition: the Case of Body Postures

    Get PDF
    An important challenge in building automatic affective state recognition systems is establishing the ground truth. When the groundtruth is not available, observers are often used to label training and testing sets. Unfortunately, inter-rater reliability between observers tends to vary from fair to moderate when dealing with naturalistic expressions. Nevertheless, the most common approach used is to label each expression with the most frequent label assigned by the observers to that expression. In this paper, we propose a general pattern recognition framework that takes into account the variability between observers for automatic affect recognition. This leads to what we term a multi-score learning problem in which a single expression is associated with multiple values representing the scores of each available emotion label. We also propose several performance measurements and pattern recognition methods for this framework, and report the experimental results obtained when testing and comparing these methods on two affective posture datasets

    Multi-modal Synthesis of ASL-MRI Features with KPLS Regression on Heterogeneous Data

    Get PDF
    Machine learning classifiers are frequently trained on heterogeneous multi-modal imaging data, where some patients have missing modalities. We address the problem of synthesising arterial spin labelling magnetic resonance imaging (ASL-MRI) - derived cerebral blood flow (CBF) - features in a heterogeneous data set. We synthesise ASL-MRI features using T1-weighted structural MRI (sMRI) and carotid ultrasound flow features. To deal with heterogeneous data, we extend the kernel partial least squares regression (kPLSR) - method to the case where both input and output data have partial coverage. The utility of the synthetic CBF features is tested on a binary classification problem of mild cognitive impairment patients vs. controls. Classifiers based on sMRI and synthetic ASL-MRI features are combined using a maximum probability rule, achieving a balanced accuracy of 92% (sensitivity 100 %, specificity 80 %) in a separate validation set. Comparison is made against support vector machine-classifiers from literature

    A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations

    Get PDF
    Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomedical terms, only a fraction of those term lexicons are translated to other languages. Manually compiling large-scale bilingual dictionaries for technical domains is a challenging task because it is difficult to find a sufficiently large number of bilingual experts. We propose a cross-lingual similarity measure for detecting most similar translation candidates for a biomedical term specified in one language (source) from another language (target). Specifically, a biomedical term in a language is represented using two types of features: (a) intrinsic features that consist of character n-grams extracted from the term under consideration, and (b) extrinsic features that consist of unigrams and bigrams extracted from the contextual windows surrounding the term under consideration. We propose a cross-lingual similarity measure using each of those feature types. First, to reduce the dimensionality of the feature space in each language, we propose prototype vector projection (PVP)—a non-negative lower-dimensional vector projection method. Second, we propose a method to learn a mapping between the feature spaces in the source and target language using partial least squares regression (PLSR). The proposed method requires only a small number of training instances to learn a cross-lingual similarity measure. The proposed PVP method outperforms popular dimensionality reduction methods such as the singular value decomposition (SVD) and non-negative matrix factorization (NMF) in a nearest neighbor prediction task. Moreover, our experimental results covering several language pairs such as English–French, English–Spanish, English–Greek, and English–Japanese show that the proposed method outperforms several other feature projection methods in biomedical term translation prediction tasks

    SlimPLS: A Method for Feature Selection in Gene Expression-Based Disease Classification

    Get PDF
    A major challenge in biomedical studies in recent years has been the classification of gene expression profiles into categories, such as cases and controls. This is done by first training a classifier by using a labeled training set containing labeled samples from the two populations, and then using that classifier to predict the labels of new samples. Such predictions have recently been shown to improve the diagnosis and treatment selection practices for several diseases. This procedure is complicated, however, by the high dimensionality if the data. While microarrays can measure the levels of thousands of genes per sample, case-control microarray studies usually involve no more than several dozen samples. Standard classifiers do not work well in these situations where the number of features (gene expression levels measured in these microarrays) far exceeds the number of samples. Selecting only the features that are most relevant for discriminating between the two categories can help construct better classifiers, in terms of both accuracy and efficiency. In this work we developed a novel method for multivariate feature selection based on the Partial Least Squares algorithm. We compared the method's variants with common feature selection techniques across a large number of real case-control datasets, using several classifiers. We demonstrate the advantages of the method and the preferable combinations of classifier and feature selection technique

    Hierarchical Anatomical Brain Networks for MCI Prediction: Revisiting Volumetric Measures

    Get PDF
    Owning to its clinical accessibility, T1-weighted MRI (Magnetic Resonance Imaging) has been extensively studied in the past decades for prediction of Alzheimer's disease (AD) and mild cognitive impairment (MCI). The volumes of gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF) are the most commonly used measurements, resulting in many successful applications. It has been widely observed that disease-induced structural changes may not occur at isolated spots, but in several inter-related regions. Therefore, for better characterization of brain pathology, we propose in this paper a means to extract inter-regional correlation based features from local volumetric measurements. Specifically, our approach involves constructing an anatomical brain network for each subject, with each node representing a Region of Interest (ROI) and each edge representing Pearson correlation of tissue volumetric measurements between ROI pairs. As second order volumetric measurements, network features are more descriptive but also more sensitive to noise. To overcome this limitation, a hierarchy of ROIs is used to suppress noise at different scales. Pairwise interactions are considered not only for ROIs with the same scale in the same layer of the hierarchy, but also for ROIs across different scales in different layers. To address the high dimensionality problem resulting from the large number of network features, a supervised dimensionality reduction method is further employed to embed a selected subset of features into a low dimensional feature space, while at the same time preserving discriminative information. We demonstrate with experimental results the efficacy of this embedding strategy in comparison with some other commonly used approaches. In addition, although the proposed method can be easily generalized to incorporate other metrics of regional similarities, the benefits of using Pearson correlation in our application are reinforced by the experimental results. Without requiring new sources of information, our proposed approach improves the accuracy of MCI prediction from (of conventional volumetric features) to (of hierarchical network features), evaluated using data sets randomly drawn from the ADNI (Alzheimer's Disease Neuroimaging Initiative) dataset

    Unravelling the effects of age, period and cohort on metabolic syndrome components in a Taiwanese population using partial least squares regression

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We investigate whether the changing environment caused by rapid economic growth yielded differential effects for successive Taiwanese generations on 8 components of metabolic syndrome (MetS): body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), fasting plasma glucose (FPG), triglycerides (TG), high-density lipoprotein (HDL), Low-density lipoproteins (LDL) and uric acid (UA).</p> <p>Methods</p> <p>To assess the impact of age, birth year and year of examination on MetS components, we used partial least squares regression to analyze data collected by Mei-Jaw clinics in Taiwan in years 1996 and 2006. Confounders, such as the number of years in formal education, alcohol intake, smoking history status, and betel-nut chewing were adjusted for.</p> <p>Results</p> <p>As the age of individuals increased, the values of components generally increased except for UA. Men born after 1970 had lower FPG, lower BMI, lower DBP, lower TG, Lower LDL and greater HDL; women born after 1970 had lower BMI, lower DBP, lower TG, Lower LDL and greater HDL and UA. There is a similar pattern between the trend in levels of metabolic syndrome components against birth year of birth and economic growth in Taiwan.</p> <p>Conclusions</p> <p>We found cohort effects in some MetS components, suggesting associations between the changing environment and health outcomes in later life. This ecological association is worthy of further investigation.</p
    corecore