20 research outputs found

    Using Peer Comparison Approaches to Measure Software Stability

    Get PDF
    Software systems must change to adapt to new functional requirements and new nonfunctional requirements. This is called software revision. However, not all the modules within the system need to be changed during each revision. In this paper, we study how frequently each module is modified. Our study is performed through comparing the stability of peer software modules. The study is performed on six open-source Java projects: Ant, Flow4j, Jena, Lucence, Struct, and Xalan, in which classes are identified as basic software modules. Our study shows (1) about half of the total classes never changed; (2) frequent changes occur to small number of classes; and (3) the number of changed classes between current release and next release has no significant relations with the time duration between current release and next release. Keywords: software evolution; software revision; software stability; class stability; open-source project; Java clas

    Fast learning optimized prediction methodology for protein secondary structure prediction, relative solvent accessibility prediction and phosphorylation prediction

    Get PDF
    Computational methods are rapidly gaining importance in the field of structural biology, mostly due to the explosive progress in genome sequencing projects and the large disparity between the number of sequences and the number of structures. There has been an exponential growth in the number of available protein sequences and a slower growth in the number of structures. There is therefore an urgent need to develop computed structures and identify the functions of these sequences. Developing methods that will satisfy these needs both efficiently and accurately is of paramount importance for advances in many biomedical fields, for a better basic understanding of aberrant states of stress and disease, including drug discovery and discovery of biomarkers. Several aspects of secondary structure predictions and other protein structure-related predictions are investigated using different types of information such as data obtained from knowledge-based potentials derived from amino acids in protein sequences, physicochemical properties of amino acids and propensities of amino acids to appear at the ends of secondary structures. Investigating the performance of these secondary structure predictions by type of amino acid highlights some interesting aspects relating to the influences of the individual amino acid types on formation of secondary structures and points toward ways to make further gains. Other research areas include Relative Solvent Accessibility (RSA) predictions and predictions of phosphorylation sites, which is one of the Post-Translational Modification (PTM) sites in proteins. Protein secondary structures and other features of proteins are predicted efficiently, reliably, less expensively and more accurately. A novel method called Fast Learning Optimized PREDiction (FLOPRED) Methodology is proposed for predicting protein secondary structures and other features, using knowledge-based potentials, a Neural Network based Extreme Learning Machine (ELM) and advanced Particle Swarm Optimization (PSO) techniques that yield better and faster convergence to produce more accurate results. These techniques yield superior classification of secondary structures, with a training accuracy of 93.33% and a testing accuracy of 92.24% with a standard deviation of 0.48% obtained for a small group of 84 proteins. We have a Matthew\u27s correlation-coefficient ranging between 80.58% and 84.30% for these secondary structures. Accuracies for individual amino acids range between 83% and 92% with an average standard deviation between 0.3% and 2.9% for the 20 amino acids. On a larger set of 415 proteins, we obtain a testing accuracy of 86.5% with a standard deviation of 1.38%. These results are significantly higher than those found in the literature. Prediction of protein secondary structure based on amino acid sequence is a common technique used to predict its 3-D structure. Additional information such as the biophysical properties of the amino acids can help improve the results of secondary structure prediction. A database of protein physicochemical properties is used as features to encode protein sequences and this data is used for secondary structure prediction using FLOPRED. Preliminary studies using a Genetic Algorithm (GA) for feature selection, Principal Component Analysis (PCA) for feature reduction and FLOPRED for classification give promising results. Some amino acids appear more often at the ends of secondary structures than others. A preliminary study has indicated that secondary structure accuracy can be improved as much as 6% by including these effects for those residues present at the ends of alpha-helix, beta-strand and coil. A study on RSA prediction using ELM shows large gains in processing speed compared to using support vector machines for classification. This indicates that ELM yields a distinct advantage in terms of processing speed and performance for RSA. Additional gains in accuracies are possible when the more advanced FLOPRED algorithm and PSO optimization are implemented. Phosphorylation is a post-translational modification on proteins often controls and regulates their activities. It is an important mechanism for regulation. Phosphorylated sites are known to be present often in intrinsically disordered regions of proteins lacking unique tertiary structures, and thus less information is available about the structures of phosphorylated sites. It is important to be able to computationally predict phosphorylation sites in protein sequences obtained from mass-scale sequencing of genomes. Phosphorylation sites may aid in the determination of the functions of a protein and to better understanding the mechanisms of protein functions in healthy and diseased states. FLOPRED is used to model and predict experimentally determined phosphorylation sites in protein sequences. Our new PSO optimization included in FLOPRED enable the prediction of phosphorylation sites with higher accuracy and with better generalization. Our preliminary studies on 984 sequences demonstrate that this model can predict phosphorylation sites with a training accuracy of 92.53% , a testing accuracy 91.42% and Matthew\u27s correlation coefficient of 83.9%. In summary, secondary structure prediction, Relative Solvent Accessibility and phosphorylation site prediction have been carried out on multiple sets of data, encoded with a variety of information drawn from proteins and the physicochemical properties of their constituent amino acids. Improved and efficient algorithms called S-ELM and FLOPRED, which are based on Neural Networks and Particle Swarm Optimization are used for classifying and predicting protein sequences. Analysis of the results of these studies provide new and interesting insights into the influence of amino acids on secondary structure prediction. S-ELM and FLOPRED have also proven to be robust and efficient for predicting relative solvent accessibility of proteins and phosphorylation sites. These studies show that our method is robust and resilient and can be applied for a variety of purposes. It can be expected to yield higher classification accuracy and better generalization performance compared to previous methods

    Distribution and phylogeny of the bacterial translational GTPases and the Mqsr/YgiT regulatory system

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsioone.Valgud on raku ehituskivideks ja eluks vajalike reaktsioonide katalüüsijateks. Bioinformaatika on meid varustanud võimsate järjestuste analüüsi vahenditega. Järjestuse sarnasuse alusel grupeeruvad valgud perekondadeks. Valguperekonna moodustavad homoloogsed järjestused ehk siis järjestused, mis pärinevad samast eellasjärjestusest. Tihti omavad samasse perekonda kuuluvad valgud ka sama või üksteisele lähedast funktsiooni. Meie teadmised valkude funktsioonidest pärinevad üksikutelt mudelorganismidelt. Tihti huvitab teadlasi kui universaalne või spetsiifiline on üks või teine kirjeldatud funktsioon. Kuidas ja millal evolutsiooni käigus tekib olemasolevast materjalist uute omadustega (uue funktsiooniga) valk läbi geeniduplikatsiooni? Kui tihti on sellised sündmused evolutsioonilises ajaskaalas aset leidud? Oma töös olen ma analüüsinud bakterite translatsioonilisi GTPaase (trGTPaas) ja mqsR/ygiT toksiin-antitoksiin (TA) süsteemi valke. Ühiseks nime¬¬tajaks mõlemale on valgusünteesi aparaat – mõlemad on seotud ribosoomiga ja sealtkaudu raku võimega sõltuvalt vajadusele toota valke. Küsimused, mida selles kontekstis on küsitud, saab laias laastus jagada kaheks: a) valguperekonna esindatusega seotud ja b) valguperekonna evolutsiooni ja funktsionaalse innovatsiooniga seotud. Translatsiooniliste GTPaaside puhul bakterites saame rääkida üheksast erinevast perekonnast – üheksast erinevast funktsioonide komplektist. Täisgenoomidele põhinev analüüs näitas, et üheksast trGTPaaside perekonnast on bakterites konserveerunud neli: IF2, EF-Tu, EFG ja LepA(EF4). Vaatamata sellele, et RF3’e on omistatud klassikalise valgusünteesi mudeli valguses kanooniline roll translatsiooni lõpetamisel, puudus RF3 geen ligikaudu 40% analüüsitud bakteri genoomides. Samas aga ebaselge funktsiooniga LepA osutus bakterite spetsiifiliseks trGTPaasiks. Eelnev analüüs tõi ka välja EFG paraloogide laia esinemise – paljud bakteri¬genoomid sisaldasid 2–3 üksteisest küllaltki erinevat (divergeerunud) EFG geeni. Lähem analüüs tõi välja, et kogu varieeruvuse EFG perekonnas võib jagada neljaks alamperekonnaks: EFG I, spdEFG1, spdEFG2 ja EFG II. Eksperimentaalselt on hästi iseloomustatud EFG I. Uuritud on ka spdEFG’sid ja leitud, et esimene neist omab translokaasi aktiivsust translatsioonil ja teine osaleb ribosoomide retsükleerimisel. Laialt levinud EFG II alamperekond on aga halvasti uuritud. Fülogeneetiline analüüs võimaldab püstitada hüpoteesi nelja EFG alamperekonna iidsest päritolust, st. nad on tekkinud ajalises skaalas enne (või samaaegselt) eukarüootse rakuvormi lahknemist arhedest ja bakteritest. Funktsionaalse innovatsiooni kandjaks EFG II valgus võib pidada eelkõige 12 positsiooni, mis on spetsiifiliselt konserveerunud just EFG II alamperekonnal. EFG II’e iseloomulikus kõrge divergentsuse taustal tõusevad need positsioonid esile GTPaasi domäänis, domäänis II ja neljandas domäänis. Konserveerunud muutused GTPaasi domäänis, millest osad on GTP’d siduvas G1 motiivis, võimaldavad teha järeldusi muutunud GTP sidumise ja hüdrolüüsi tingimuste kohta. Suurenenud laeng neljanda domääni lingu otsas, mis E. coli EFG’l siseneb A-saiti, võimaldab spekuleerida muutuse üle translokatsiooni keskkonnas. Konserveerunud muutused domään II piirkonnas viitavad muutunud interaktsioonile ribosoomi, domään I ja domään III vahel. EFG II alamperekonna fülogeneetiline ja järjestuste analüüs näitab selgelt hõimkonna/klassi spetsiifiliste alam-alamgruppide olemasolu. Need alam-alamgrupid erinevad teineteisest G2 motiivi konserveeruvuse ja insertsioonide/deletsioonide mustri alusel. See teine tase kirjeldab EFG II kui hõimkonna/klassi spetsiifilist faktorit. Mis on EFG II roll tegelikult ja kuidas ning millistes tingimustes ta komplementeerib EFG I, ootab alles vastuseid. Antud töö on loonud raamistiku tulevaste eksperimentide tarvis.Proteins are vital for the cell – they serve as building blocks and catalysts for many different reactions. Bioinformatics has equipped us with powerful analysis tools. According to sequence similarity, proteins can be grouped into families. Protein family is composed of homologous sequences, i. e. from sequences, which share a common ancestor. Proteins, which belong to the same family, perform their function in a similar way. Our knowledge about functional properties of proteins originates from experimental works performed with a limited number of model organisms. Scientists are often interested in the universality or specificity of one or another described protein and function. How often is gene duplication and following innovation the source for genes/proteins with a new function? How often such events take place in the evolutionary timescale? In my dissertation I have analyzed gene and protein sequences of translational GTPases (trGTPases) and mqsR/ygiT toxin-antitoxin of bacteria. Common denominator for both protein families is their connection to cells protein synthesis machinery. Two types of questions can be asked in this context: those that are related to a) the representation of specific proteins/function, and b) the evolution and functional innovation. In the case of trGTPases nine different protein families, i. e. presence or absence of nine different functional complexes in the cell were described. Analyzes carried on completed genome sequences of bacteria revealed four conserved families: IF2, EF-Tu, EFG, and LepA(EF4). Despite the fact that in the classical model of protein synthesis RF3 carries canonic role at the final step of translation, RF3 coding gene was found missing approximately in 40% of analyzed bacteria. Surprisingly, LepA, whose function is still not well understood, appears to be specific trGTPase for bacteria. The analysis also revealed a wide distribution of EFG paralogs – many bacteria contained two to three relatively diverged gene copies for EFG. The phylogenetic tree of EFG revealed four subfamilies: EFG I, spdEFG1, spdEFG2, and EFG II. The EFG I subfamily is experimentally well characterized. Also, spdEFG1 was found to act as translocase and spdEFG2 helps recycle ribosome, indicating functional split between co-occurring paralogs. However, little research has been done on widely distributed EFG II subfamily. Phylogenetic analyses, performed by us, enable to propose hypothesis about ancient origin of EFG subfamilies - they have appeared at the same timescale with (or even before) arousing eukaryotic life-forms. Functional innovation, common for the whole subfamily, is carried by 12 EFG II specific positions. In contrast to overall high divergeny, these conserved positions have spotlighted in the GTPase domain, and in the domain II and IV. Conserved changes in the GTPase domain, some of which are located in the G1 motif, indicate changed conditions in GTP binding and hydrolysis. Increased charge in protruding loop of the fourth domain, which inserts into A-site, enables us to speculate about changes in the local conditions of the A-site during translocation. Conserved changes in the domain II indicate changed interaction between EFG domains I, II, and III and the ribosome. Phylogenetic analysis of the EFG II subfamily reveals phyla/class specific sub-subgroups. These sub-subgroups differ from each other by conserved amino acids pattern of the G2 motif and insertion/deletion pattern detected from multiple sequence alignment. This another level characterizes EFG II as phyla/class specific factor. Further research should be conducted on what role EFG II actually performs and how it complements EFG I. Current study can serve as framework for future experiments

    Architectural stability of self-adaptive software systems

    Get PDF
    This thesis studies the notion of stability in software engineering with the aim of understanding its dimensions, facets and aspects, as well as characterising it. The thesis further investigates the aspect of behavioural stability at the architectural level, as a property concerned with the architecture's capability in maintaining the achievement of expected quality of service and accommodating runtime changes, in order to delay the architecture drifting and phasing-out as a consequence of the continuous unsuccessful provision of quality requirements. The research aims to provide a systematic and methodological support for analysing, modelling, designing and evaluating architectural stability. The novelty of this research is the consideration of stability during runtime operation, by focusing on the stable provision of quality of service without violations. As the runtime dimension is associated with adaptations, the research investigates stability in the context of self-adaptive software architectures, where runtime stability is challenged by the quality of adaptation, which in turn affects the quality of service. The research evaluation focuses on the effectiveness, scale and accuracy in handling runtime dynamics, using the self-adaptive cloud architectures

    Land use, national development and global welfare: the economics of biodiversity's conservation and sustainable use

    Get PDF
    Material prosperity of countries depends on the use of their endowment of natural resources. Land management decisions, in particular, also affect the conservation of biological diversity, which is an asset for not only for the host country, but also for the rest of the world. There is a growing recognition that the contribution of biological resources both to sustainable national development and to the well being of the international community has been underestimated in the past. Based on both theoretical analysis and case study material from Mexico, this dissertation discusses the land-use related factors giving rise to the loss of biodiversity, as well policy options and management practices that may allow sustainable land use and biodiversity conservation. The introductory chapter summarises the scientific and economic debate, including disagreements about the definition of biodiversity management objectives. Chapter 2 analyses the sequence of land use changes typically observed in a number of tropical countries, and discusses interventions which could alter the incentives for land conversion. The Convention on Biological Diversity stipulates that developing countries should be reimbursed for the 'incremental cost' of activities that help conserving biodiversity. Chapter 3 proposes a model which addresses the allocative and incentive implications of the incremental cost mechanism. The empirical part of the dissertation first discusses the social and economic factors that have been responsible over the last few decades for land us change and depletion of biological resources in the study area in Mexico (chapters 4 and 5). A linear programming economic model is then proposed, for simulating, at the farm level, further impacts over the next decade (chapter 5). Based on a model of aggregation over space and time of farm-level decisions, chapter 6 analyzes the appropriate mix of conservation and sustainable use management options in the study area, providing estimates of their cost implications and discussing possible funding sources. Chapter 7 concludes with policy implications and options for future research

    Proceedings, MSVSCC 2014

    Get PDF
    Proceedings of the 8th Annual Modeling, Simulation & Visualization Student Capstone Conference held on April 17, 2014 at VMASC in Suffolk, Virginia

    Deep Model for Improved Operator Function State Assessment

    Get PDF
    A deep learning framework is presented for engagement assessment using EEG signals. Deep learning is a recently developed machine learning technique and has been applied to many applications. In this paper, we proposed a deep learning strategy for operator function state (OFS) assessment. Fifteen pilots participated in a flight simulation from Seattle to Chicago. During the four-hour simulation, EEG signals were recorded for each pilot. We labeled 20- minute data as engaged and disengaged to fine-tune the deep network and utilized the remaining vast amount of unlabeled data to initialize the network. The trained deep network was then used to assess if a pilot was engaged during the four-hour simulation
    corecore