4,203 research outputs found

    Feature subset selection using support-vector machines by averaging over probabilistic genotype data

    Get PDF
    Despite the grand promises of the postgenomic era, such as personalized prevention, diagnosis, drugs, and treatments, the landscape of biomedicine looks more and more complex. The fullfillment of these promises for diseases significant in public health requires new approaches to induction for statistical and causal inferences from observations and interventions. Within the biomedical world an important response to this challenge is the mapping and relatively cheap measuring of the genetic variations, such as single nucleotide polymorphisms (SNPs). The recent mapping of the genetic variations has opened a new dimension in the postgenomic research at all phenotypic levels, such as genomic, proteomic, and clinical, and it has sparked a series of Genetic Association Studies (GAS), based on the application of machine learning and data mining techniques. To overcome such problems, different strategies are being investigated within the research community. The aim of this thesis work is to contribute to the progress in this field giving a step forward towards the solution. I have investigated the suitable machine learning and data mining algorithms for this task and the state of the art of the currently available implementations of them intended for biomedical research applications. As a result I have proposed a solution strategy, and chosen and extended the functionality of the Java-ML library, an open source machine learning library written in Java, implementing some missing algorithms and functionality that necessary for the proposed approach. This thesis work is structured into three main blocks. Section 3 “An approach to the use of machine learning techniques with genotype data” addresses the faced problem and the proposed solution. It begins with the definition of some introductory GAS concepts and the description of the solution strategy and elaborates in subsequent subsections on the description of the theoretical underpinnings of the algorithms setting up the solution. Specifically, the first subsection, “The feature selection problem in the bioinformatics domain”, justifies the necessity of reducing the dimensionality of data sets in order to allow for acceptable performance in the application of machine learning techniques to the broader field of bioinformatics implications and establishes a comparative taxonomy of the currently available techniques. In the second subsection, entitled “Feature selection using support-vector machines”, the idea behind support-vector machines classifiers and their application to feature subset selection is defined while the third subsection, “Ranking fusion as averaging technique: Markov chain based algorithms”, describes the ranking fusion algorithms which implementation has been chosen for the combination of the feature subsets obtained from different data sets. Section 4 “Analysis of available tools for experimental design” analyses the available suitable tools for experimental design in GAS based on machine learning techniques. In this sense in the first subsection, “Advantages of high level languages for machine learning algorithms”, the convenience of using high level languages for the kind of applications we are working in is discussed. In the second subsection, “Machine learning algorithms implementations in Java”, the election of the Java language is justified followed by an analysis of the currently available implementations of machine learning algorithms in this language that are worthwhile to be considered for our purposes, namely WEKA, RapidMiner and Java-ML. In Section 5 “Implemented extensions to the Java-ML library” a description of the functionalities that have been added to enable a framework suitable for the design of GAS experiments in order to test the proposed approach is provided. The “Missing values imputation: the dataset.tools package” subsection focuses on data sets handling functionalities while the “Averaging through ranking fusion: rankingfusion and rankingfusion.scoring package” subsection details the ranking fusion algorithms implementations. Finally the “How to use the code” subsection is a tutorial on how to use both the library and its extension for the development of applications. In addition to these main blocks, a final section called “Future Work” reflects how the developed work can be used by GAS domain experts to evaluate the usefulness of the proposed technique.Ingeniería de Telecomunicació

    A Survey on Compiler Autotuning using Machine Learning

    Full text link
    Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

    Análise de métodos de otimização avançados em projeto mecânico

    Get PDF
    Advanced optimization methods are widely applied to mechanical design, mainly for its abilities to solve complex problems that traditional optimization techniques such as gradient-based methods do not present. With its increasing popularity, the number of algorithms found in the literature is vast. In this work three algorithms are implemented, namely Particle Swarm Optimization (PSO), Differential Evolution (DE) and Teaching-Learning- Based Optimization (TLBO). Firstly, the application of these algorithms is analyzed for a composition function benchmark and three mechanical design minimization problems (the weight of a speed reducer, the volume of a three-bar truss and the area of a square plate with a cut-out hole). Furthermore, as the scope of available algorithms increases, the choice of programming tools to implement them is also vast, and generally made considering subjective criteria or difficulties in using enhancing strategies such as parallel processing. Thereby an analysis of programming tools applied to metaheuristic algorithms is carried out using four programming languages with distinct characteristics: Python, MATLAB, Java and C++. The selected algorithms and problems are coded using each programming language, which are initially compared in a sequential processing implementation. Additionally, in order to analyze potential gains in performance, parallel processing procedures are implemented using features of each programming language. The application of the algorithms to the mechanical design problems demonstrates good results in the achieved solutions. In what concerns to the computational time, sequential and processing results present considerable differences between programming languages while the implementation of parallel processing procedures demonstrates significant benefits for complex problems.Métodos avançados de otimização têm sido amplamente aplicados ao projeto mecânico, principalmente pela sua capacidade de resolver problemas complexos que técnicas tradicionais de otimização como os métodos baseados em gradiente não apresentam. Devido à sua crescente popularidade, o número de algoritmos encontrados na literatura é vasto. Neste trabalho são implementados três algoritmos distintos, Otimização por Bando de Partículas (PSO), Evolução Diferencial (DE) e Otimização Baseada no Ensino-Aprendizagem (TLBO). Inicialmente, a aplicação destes algoritmos é analisada numa função composta e em três problemas de minimização de projeto mecânico (o peso de um redutor de velocidade, o volume de uma estrutura de três barras e a área de uma placa quadrada com um furo circular). Além disso, com o aumento do número de algoritmos existentes, a escolha de ferramentas de programação para implementá-los também é vasta e geralmente feita considerando critérios subjetivos ou dificuldades no uso de estratégias de melhoria como processamento paralelo. Deste modo, no presente trabalho é realizada uma análise de ferramentas de programação aplicadas a algoritmos metaheurísticos, utilizando linguagens de programação com distintas características: Python, MATLAB, Java e C++. Os algoritmos e problemas selecionados são programados em cada linguagem de programação, e inicialmente comparados numa implementação de processamento sequencial. Além disso, de forma a analisar possíveis ganhos de desempenho, são implementados procedimentos de processamento paralelo utilizando recursos de cada linguagem de programação. A aplicação dos algoritmos aos problemas de projeto mecânico demonstra bons resultados nas soluções obtidas. Os resultados, em termos de tempo computacional, de processamento sequencial e paralelo, apresentam diferenças consideráveis entre as linguagens de programação. A implementação de procedimentos de processamento paralelo demonstra benefícios significativos em problemas complexos.Mestrado em Engenharia Mecânic

    Automatically Fixing Syntax Errors Using the Levenshtein Distance

    Get PDF
    Abstract:To ensure high quality software, much emphasis is laid on software testing. While a number of techniques and tools already exist to identify and locate syntax errors, it is still the duty of programmers to manually fix each of these uncovered syntax errors. In this paper we propose an approach to automate the task of fixing syntax errors by using existing compilers and the levenshtein distance between the identified bug and the possible fixes. The levenshtein distance is a measure of the similarity between two strings. A prototype, called ASBF, has also been built and a number of tests carried out which show that the technique works well in most cases. ASBF is able to automatically fix syntax errors in any erroneous source file and can also process several erroneous files in a source folder. The tests carried out also show that the technique can also be applied to multiple programming languages. Currently ASBF can automatically fix software bugs in the Java and the Python programming languages. The tool also has auto-learning capabilities where it can automatically learn from corrections made manually by a user. It can thereafter couple this learning process with the levenshtein distance to improve its software bugcorrection capabilities.Keywords: Automatically fixing syntax errors, bug fixing, auto-learn, levenshtein distance, Java, Python(Article history: Received 16 September 2016 and accepted 9 December 2016

    SusTrainable: Promoting Sustainability as a Fundamental Driver in Software Development Training and Education. 2nd Teacher Training, January 23-27, 2023, Pula, Croatia. Revised lecture notes

    Full text link
    This volume exhibits the revised lecture notes of the 2nd teacher training organized as part of the project Promoting Sustainability as a Fundamental Driver in Software Development Training and Education, held at the Juraj Dobrila University of Pula, Croatia, in the week January 23-27, 2023. It is the Erasmus+ project No. 2020-1-PT01-KA203-078646 - Sustrainable. More details can be found at the project web site https://sustrainable.github.io/ One of the most important contributions of the project are two summer schools. The 2nd SusTrainable Summer School (SusTrainable - 23) will be organized at the University of Coimbra, Portugal, in the week July 10-14, 2023. The summer school will consist of lectures and practical work for master and PhD students in computing science and closely related fields. There will be contributions from Babe\c{s}-Bolyai University, E\"{o}tv\"{o}s Lor\'{a}nd University, Juraj Dobrila University of Pula, Radboud University Nijmegen, Roskilde University, Technical University of Ko\v{s}ice, University of Amsterdam, University of Coimbra, University of Minho, University of Plovdiv, University of Porto, University of Rijeka. To prepare and streamline the summer school, the consortium organized a teacher training in Pula, Croatia. This was an event of five full days, organized by Tihana Galinac Grbac and Neven Grbac. The Juraj Dobrila University of Pula is very concerned with the sustainability issues. The education, research and management are conducted with sustainability goals in mind. The contributions in the proceedings were reviewed and provide a good overview of the range of topics that will be covered at the summer school. The papers in the proceedings, as well as the very constructive and cooperative teacher training, guarantee the highest quality and beneficial summer school for all participants.Comment: 85 pages, 8 figures, 3 code listings and 1 table; editors: Tihana Galinac Grbac, Csaba Szab\'{o}, Jo\~{a}o Paulo Fernande

    Discovering Complex Relationships between Drugs and Diseases

    Get PDF
    Finding the complex semantic relations between existing drugs and new diseases will help in the drug development in a new way. Most of the drugs which have found new uses have been discovered due to serendipity. Hence, the prediction of the uses of drugs for more than one disease should be done in a systematic way by studying the semantic relations between the drugs and diseases and also the other entities involved in the relations. Hence, in order to study the complex semantic relations between drugs and diseases an application was developed that integrates the heterogeneous data in different formats from different public databases which are available online. A high level ontology was also developed to integrate the data and only the fields required for the current study were used. The data was collected from different data sources such as DrugBank, UniProt/SwissProt, GeneCards and OMIM. Most of these data sources are the standard data sources and have been used by National Committee of Biotechnology Information of Nation Institute of Health. The data was parsed and integrated and complex semantic relations were discovered. This is a simple and novel effort which may find uses in development of new drug targets and polypharmacology
    corecore