582 research outputs found

    Outlier Resistant PCA Ensembles

    Get PDF
    Statistical re-sampling techniques have been used extensively and successfully in the machine learning approaches for generation of classifier and predictor ensembles. It has been frequently shown that combining so called unstable predictors has a stabilizing effect on and improves the performance of the prediction system generated in this way. In this paper we use the re-sampling techniques in the context of Principal Component Analysis (PCA). We show that the proposed PCA ensembles exhibit a much more robust behaviour in the presence of outliers which can seriously affect the performance of an individual PCA algorithm. The performance and characteristics of the proposed approaches are illustrated on a number of experimental studies where an individual PCA is compared to the introduced PCA ensemble

    Data-driven Soft Sensors in the Process Industry

    Get PDF
    In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work

    Unfolding simulations reveal the mechanism of extreme unfolding cooperativity in the kinetically stable alpha-lytic protease.

    Get PDF
    Kinetically stable proteins, those whose stability is derived from their slow unfolding kinetics and not thermodynamics, are examples of evolution's best attempts at suppressing unfolding. Especially in highly proteolytic environments, both partially and fully unfolded proteins face potential inactivation through degradation and/or aggregation, hence, slowing unfolding can greatly extend a protein's functional lifetime. The prokaryotic serine protease alpha-lytic protease (alphaLP) has done just that, as its unfolding is both very slow (t(1/2) approximately 1 year) and so cooperative that partial unfolding is negligible, providing a functional advantage over its thermodynamically stable homologs, such as trypsin. Previous studies have identified regions of the domain interface as critical to alphaLP unfolding, though a complete description of the unfolding pathway is missing. In order to identify the alphaLP unfolding pathway and the mechanism for its extreme cooperativity, we performed high temperature molecular dynamics unfolding simulations of both alphaLP and trypsin. The simulated alphaLP unfolding pathway produces a robust transition state ensemble consistent with prior biochemical experiments and clearly shows that unfolding proceeds through a preferential disruption of the domain interface. Through a novel method of calculating unfolding cooperativity, we show that alphaLP unfolds extremely cooperatively while trypsin unfolds gradually. Finally, by examining the behavior of both domain interfaces, we propose a model for the differential unfolding cooperativity of alphaLP and trypsin involving three key regions that differ between the kinetically stable and thermodynamically stable classes of serine proteases

    Cross validation of bi-modal health-related stress assessment

    Get PDF
    This study explores the feasibility of objective and ubiquitous stress assessment. 25 post-traumatic stress disorder patients participated in a controlled storytelling (ST) study and an ecologically valid reliving (RL) study. The two studies were meant to represent an early and a late therapy session, and each consisted of a "happy" and a "stress triggering" part. Two instruments were chosen to assess the stress level of the patients at various point in time during therapy: (i) speech, used as an objective and ubiquitous stress indicator and (ii) the subjective unit of distress (SUD), a clinically validated Likert scale. In total, 13 statistical parameters were derived from each of five speech features: amplitude, zero-crossings, power, high-frequency power, and pitch. To model the emotional state of the patients, 28 parameters were selected from this set by means of a linear regression model and, subsequently, compressed into 11 principal components. The SUD and speech model were cross-validated, using 3 machine learning algorithms. Between 90% (2 SUD levels) and 39% (10 SUD levels) correct classification was achieved. The two sessions could be discriminated in 89% (for ST) and 77% (for RL) of the cases. This report fills a gap between laboratory and clinical studies, and its results emphasize the usefulness of Computer Aided Diagnostics (CAD) for mental health care

    Molecular Mechanics Study of Protein Folding and Protein-Ligand Binding

    Get PDF
    In this dissertation, molecular dynamics (MD) simulations were applied to study the effect of single point mutations on protein folding free energy and the protein-ligand binding in the bifunctional protein dihydrofolate reductase-thymidylate synthase (TS-DHFR) in plasmodium falciparum (pf). The main goal of current computational studies is to have a deeper understanding of factors related to protein folding stability and protein-ligand binding. Chapter two aims to seek solutions for improving the accuracy of predicting changes of folding free energy upon single point mutations in proteins. While the importance of conformational sampling was adequately addressed, the diverse dielectric properties of proteins were also taken into consideration in this study. Through developing a three-dielectric-constant model and broadening conformational sampling, a method for predicting the effect of point mutations on protein folding free energy is described, and factors of affecting the prediction accuracy are addressed in this chapter. The following two chapters focus on the binding process and domain-domain interactions in the bifunctional protein pfDHFR-TS. This protein usually plays as the target of antimalarial drugs, but the drug resistance in this protein has caused lots of problems. In chapter three, the mechanism of the development of drug resistance was investigated. This study indicated that the accumulation of mutations in pfDHFR caused obvious changes of conformation and interactions among residues in the binding pocket, which further weakened the binding affinity between pfDHFR and the inhibitor drug. Furthermore, the high rigidity and significantly weakened communications among key residues in the protein binding pocket were exhibited in the pfDHFR quadruple mutant. The rigid binding site was associated with the failure of conformational reorganization upon the binding of pyrimethamine in the quadruple mutant. Chapter four investigated the effect of the N-terminus in pfDHFR-TS on enzyme activity and domain-domain communications. This is the first computational study that focuses on the full-length pfDHFR-TS dimer. This study provided computational evidence to support that remote mutations could disturb the interactions and conformations of the binding site through disrupting dynamic motions in pfDHFR-TS

    A Principled Methodology: A Dozen Principles of Software Effort Estimation

    Get PDF
    Software effort estimation (SEE) is the activity of estimating the total effort required to complete a software project. Correctly estimating the effort required for a software project is of vital importance for the competitiveness of the organizations. Both under- and over-estimation leads to undesirable consequences for the organizations. Under-estimation may result in overruns in budget and schedule, which in return may cause the cancellation of projects; thereby, wasting the entire effort spent until that point. Over-estimation may cause promising projects not to be funded; hence, harming the organizational competitiveness.;Due to the significant role of SEE for software organizations, there is a considerable research effort invested in SEE. Thanks to the accumulation of decades of prior research, today we are able to identify the core issues and search for the right principles to tackle pressing questions. For example, regardless of decades of work, we still lack concrete answers to important questions such as: What is the best SEE method? The introduced estimation methods make use of local data, however not all the companies have their own data, so: How can we handle the lack of local data? Common SEE methods take size attributes for granted, yet size attributes are costly and the practitioners place very little trust in them. Hence, we ask: How can we avoid the use of size attributes? Collection of data, particularly dependent variable information (i.e. effort values) is costly: How can find an essential subset of the SEE data sets? Finally, studies make use of sampling methods to justify a new method\u27s performance on SEE data sets. Yet, trade-off among different variants is ignored: How should we choose sampling methods for SEE experiments? ;This thesis is a rigorous investigation towards identification and tackling of the pressing issues in SEE. Our findings rely on extensive experimentation performed with a large corpus of estimation techniques on a large set of public and proprietary data sets. We summarize our findings and industrial experience in the form of 12 principles: 1) Know your domain 2) Let the Experts Talk 3) Suspect your data 4) Data Collection is Cyclic 5) Use a Ranking Stability Indicator 6) Assemble Superior Methods 7) Weighting Analogies is Over-elaboration 8) Use Easy-path Design 9) Use Relevancy Filtering 10) Use Outlier Pruning 11) Combine Outlier and Synonym Pruning 12) Be Aware of Sampling Method Trade-off
    corecore