Search CORE

582 research outputs found

Outlier Resistant PCA Ensembles

Author: B. Gabrys
D. Ruta
D. Ruta
D. Sanger
E. Oja
H. Hotelling
K. Pearson
L. Breiman
L. Kuncheva
R.D. Cook
R.E. Schapire
W.J. Dixon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Statistical re-sampling techniques have been used extensively and successfully in the machine learning approaches for generation of classifier and predictor ensembles. It has been frequently shown that combining so called unstable predictors has a stabilizing effect on and improves the performance of the prediction system generated in this way. In this paper we use the re-sampling techniques in the context of Principal Component Analysis (PCA). We show that the proposed PCA ensembles exhibit a much more robust behaviour in the presence of outliers which can seriously affect the performance of an individual PCA algorithm. The performance and characteristics of the proposed approaches are illustrated on a number of experimental studies where an individual PCA is compared to the introduced PCA ensemble

CiteSeerX

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Gestion del Repositorio Documental de la Universidad de Salamanca

Data-driven Soft Sensors in the Process Industry

Author: Abdi
Alhoniemi
Angelov
Angelov
Angelov
Arazo-Bravo
Atkeson
Bastin
Bauer
Bishop
Bogdan Gabrys
Bonne
Breiman
Bro
Casali
Chen
Chen
Chen
Chen
Chen
Choi
Chruy
Davies
Dayal
De Wolf
Desai
Devogelaere
Ding
Dong
Dong
Dote
Doyle
Dunia
Dunia
Dunia
Eriksson
Fellner
Fortuna
Fortuna
Frank
Freund
Funahashi
Gabrielsson
Gabrys
Gabrys
Gabrys
Gabrys
Gama
Geladi
Gomez
Gonzalez
Gonzalez
Goodwin
Gosset
Guyon
Han
Hastie
He
Hodge
Hotelling
Jackson
James
Jang
Jiang
Jolliffe
Jordaan
Jos de Assis
Kadlec
Kadlec
Kalos
Kampjarvi
Kittler
Kohavi
Kohonen
Kordon
Kourti
Kourti
Krogh
Kuncheva
Lee
Lee
Lee
Lee
Li
Li
Lin
Lin
Luo
Macias
Mandic
Marjanovic
Meleiro
Menold
Nauck
Neogi
Nomikos
Nomikos
Nomikos
Opitz
Park
Pearson
Pearson
Petr Kadlec
Poggio
Prasad
Principe
Qin
Qin
Qin
Qin
Radhakrishnan
Rnnar
Rong
Rotem
Ruta
Ruta
Schafer
Scheffer
Serneels
Sibylle Strandt
Stanimirova
Su
Tzanakou
van Sprang
van Sprang
Vapnik
Venkatasubramanian
Venkatasubramanian
Venkatasubramanian
Vilalta
Walczak
Walczak
Walczak
Wang
Wang
Wang
Wang
Warne
Weiss
Widmer
Wold
Wold
Wold
Wolpert
Yan
Yang
Zadeh
Zamprogna
Zamprogna
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/04/2009
Field of study

In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work

Crossref

Bournemouth University Research Online

Unfolding simulations reveal the mechanism of extreme unfolding cooperativity in the kinetically stable alpha-lytic protease.

Author: Agard David A
Ho Bosco
Salimi Neema L
Publication venue: eScholarship, University of California
Publication date: 01/01/2010
Field of study

Kinetically stable proteins, those whose stability is derived from their slow unfolding kinetics and not thermodynamics, are examples of evolution's best attempts at suppressing unfolding. Especially in highly proteolytic environments, both partially and fully unfolded proteins face potential inactivation through degradation and/or aggregation, hence, slowing unfolding can greatly extend a protein's functional lifetime. The prokaryotic serine protease alpha-lytic protease (alphaLP) has done just that, as its unfolding is both very slow (t(1/2) approximately 1 year) and so cooperative that partial unfolding is negligible, providing a functional advantage over its thermodynamically stable homologs, such as trypsin. Previous studies have identified regions of the domain interface as critical to alphaLP unfolding, though a complete description of the unfolding pathway is missing. In order to identify the alphaLP unfolding pathway and the mechanism for its extreme cooperativity, we performed high temperature molecular dynamics unfolding simulations of both alphaLP and trypsin. The simulated alphaLP unfolding pathway produces a robust transition state ensemble consistent with prior biochemical experiments and clearly shows that unfolding proceeds through a preferential disruption of the domain interface. Through a novel method of calculating unfolding cooperativity, we show that alphaLP unfolds extremely cooperatively while trypsin unfolds gradually. Finally, by examining the behavior of both domain interfaces, we propose a model for the differential unfolding cooperativity of alphaLP and trypsin involving three key regions that differ between the kinetically stable and thermodynamically stable classes of serine proteases

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Cross validation of bi-modal health-related stress assessment

Author: A Marty
A Tawari
B Arnrich
B Kedem
B Schuller
B Schölkopf
D Morrison
D Ververidis
DA Craig
DF Tolin
DM Hilty
DR Ladd
DW Aha
EB Baum
Egon L. van den Broek
EL Broek van den
EL Broek van den
EN Khalil
F Pallavicini
Frans van der Sluis
IR Murray
J Blascovich
J Krumm
J Sánchez-Meca
J Wolpe
JA Healey
K Domschke
K Nieuwenhuijsen
KR Scherer
LK Hansen
LM Blainlow
M El Ayadi
M Hall
MD Zwaag van der
MG Newman
N Rüscha
N Rüscha
P Rani
PL Bartlett
R Banse
R Cowie
R Likert
RB Fillingim
RC Kessler
RG Lyons
RW Picard
S Wu
T Shimamura
TM Cover
Ton Dijkstra
TR Kosten
Publication venue: Springer Verlag
Publication date: 01/01/2011
Field of study

This study explores the feasibility of objective and ubiquitous stress assessment. 25 post-traumatic stress disorder patients participated in a controlled storytelling (ST) study and an ecologically valid reliving (RL) study. The two studies were meant to represent an early and a late therapy session, and each consisted of a "happy" and a "stress triggering" part. Two instruments were chosen to assess the stress level of the patients at various point in time during therapy: (i) speech, used as an objective and ubiquitous stress indicator and (ii) the subjective unit of distress (SUD), a clinically validated Likert scale. In total, 13 statistical parameters were derived from each of five speech features: amplitude, zero-crossings, power, high-frequency power, and pitch. To model the emotional state of the patients, 28 parameters were selected from this set by means of a linear regression model and, subsequently, compressed into 11 principal components. The SUD and speech model were cross-validated, using 3 machine learning algorithms. Between 90% (2 SUD levels) and 39% (10 SUD levels) correct classification was achieved. The two sessions could be discriminated in 89% (for ST) and 77% (for RL) of the cases. This report fills a gap between laboratory and clinical studies, and its results emphasize the usefulness of Computer Aided Diagnostics (CAD) for mental health care

Crossref

Springer - Publisher Connector

Copenhagen University Research Information System

Radboud Repository

University of Twente Research Information

Molecular Mechanics Study of Protein Folding and Protein-Ligand Binding

Author: Han Tingting
Publication venue: Clemson University Libraries
Publication date: 01/08/2016
Field of study

In this dissertation, molecular dynamics (MD) simulations were applied to study the effect of single point mutations on protein folding free energy and the protein-ligand binding in the bifunctional protein dihydrofolate reductase-thymidylate synthase (TS-DHFR) in plasmodium falciparum (pf). The main goal of current computational studies is to have a deeper understanding of factors related to protein folding stability and protein-ligand binding. Chapter two aims to seek solutions for improving the accuracy of predicting changes of folding free energy upon single point mutations in proteins. While the importance of conformational sampling was adequately addressed, the diverse dielectric properties of proteins were also taken into consideration in this study. Through developing a three-dielectric-constant model and broadening conformational sampling, a method for predicting the effect of point mutations on protein folding free energy is described, and factors of affecting the prediction accuracy are addressed in this chapter. The following two chapters focus on the binding process and domain-domain interactions in the bifunctional protein pfDHFR-TS. This protein usually plays as the target of antimalarial drugs, but the drug resistance in this protein has caused lots of problems. In chapter three, the mechanism of the development of drug resistance was investigated. This study indicated that the accumulation of mutations in pfDHFR caused obvious changes of conformation and interactions among residues in the binding pocket, which further weakened the binding affinity between pfDHFR and the inhibitor drug. Furthermore, the high rigidity and significantly weakened communications among key residues in the protein binding pocket were exhibited in the pfDHFR quadruple mutant. The rigid binding site was associated with the failure of conformational reorganization upon the binding of pyrimethamine in the quadruple mutant. Chapter four investigated the effect of the N-terminus in pfDHFR-TS on enzyme activity and domain-domain communications. This is the first computational study that focuses on the full-length pfDHFR-TS dimer. This study provided computational evidence to support that remote mutations could disturb the interactions and conformations of the binding site through disrupting dynamic motions in pfDHFR-TS

Clemson University: TigerPrints

A Principled Methodology: A Dozen Principles of Software Effort Estimation

Author: Kocaguneli Ekrem
Publication venue: The Research Repository @ WVU
Publication date: 01/12/2012
Field of study

Software effort estimation (SEE) is the activity of estimating the total effort required to complete a software project. Correctly estimating the effort required for a software project is of vital importance for the competitiveness of the organizations. Both under- and over-estimation leads to undesirable consequences for the organizations. Under-estimation may result in overruns in budget and schedule, which in return may cause the cancellation of projects; thereby, wasting the entire effort spent until that point. Over-estimation may cause promising projects not to be funded; hence, harming the organizational competitiveness.;Due to the significant role of SEE for software organizations, there is a considerable research effort invested in SEE. Thanks to the accumulation of decades of prior research, today we are able to identify the core issues and search for the right principles to tackle pressing questions. For example, regardless of decades of work, we still lack concrete answers to important questions such as: What is the best SEE method? The introduced estimation methods make use of local data, however not all the companies have their own data, so: How can we handle the lack of local data? Common SEE methods take size attributes for granted, yet size attributes are costly and the practitioners place very little trust in them. Hence, we ask: How can we avoid the use of size attributes? Collection of data, particularly dependent variable information (i.e. effort values) is costly: How can find an essential subset of the SEE data sets? Finally, studies make use of sampling methods to justify a new method\u27s performance on SEE data sets. Yet, trade-off among different variants is ignored: How should we choose sampling methods for SEE experiments? ;This thesis is a rigorous investigation towards identification and tackling of the pressing issues in SEE. Our findings rely on extensive experimentation performed with a large corpus of estimation techniques on a large set of public and proprietary data sets. We summarize our findings and industrial experience in the form of 12 principles: 1) Know your domain 2) Let the Experts Talk 3) Suspect your data 4) Data Collection is Cyclic 5) Use a Ranking Stability Indicator 6) Assemble Superior Methods 7) Weighting Analogies is Over-elaboration 8) Use Easy-path Design 9) Use Relevancy Filtering 10) Use Outlier Pruning 11) Combine Outlier and Synonym Pruning 12) Be Aware of Sampling Method Trade-off

The Research Repository @ WVU (West Virginia University)