371 research outputs found

    Improving the hierarchical classification of protein functions With swarm intelligence

    Get PDF
    This thesis investigates methods to improve the performance of hierarchical classification. In terms of this thesis hierarchical classification is a form of supervised learning, where the classes in a data set are arranged in a tree structure. As a base for our new methods we use the TDDC (top-down divide-and-conquer) approach for hierarchical classification, where each classifier is built only to discriminate between sibling classes. Firstly, we propose a swarm intelligence technique which varies the types of classifiers used at each divide within the TDDC tree. Our technique, PSO/ACO-CS (Particle Swarm Optimisation/Ant Colony Optimisation Classifier Selection), finds combinations of classifiers to be used in the TDDC tree using the global search ability of PSO/ACO. Secondly, we propose a technique that attempts to mitigate a major drawback of the TDDC approach. The drawback is that if at any point in the TDDC tree an example is misclassified it can never be correctly classified further down the TDDC tree. Our approach, PSO/ACO-RO (PSO/ACO-Recovery Optimisation) decides whether to redirect examples at a given classifier node using, again, the global search ability of PSO/ACO. Thirdly, we propose an ensemble based technique, HEHRS (Hierarchical Ensembles of Hierarchical Rule Sets), which attempts to boost the accuracy at each classifier node in the TDDC tree by using information from classifiers (rule sets) in the rest of that tree. We use Particle Swarm Optimisation to weight the individual rules within each ensemble. We evaluate these three new methods in hierarchical bioinformatics datasets that we have created for this research. These data sets represent the real world problem of protein function prediction. We find through extensive experimentation that the three proposed methods improve upon the baseline TDDC method to varying degrees. Overall the HEHRS and PSO/ACO- CS-RO approaches are most effective, although they are associated with a higher computational cost

    Cooperative Metaheuristics for Exploring Proteomic Data

    Get PDF
    Most combinatorial optimization problems cannotbe solved exactly. A class of methods, calledmetaheuristics, has proved its efficiency togive good approximated solutions in areasonable time. Cooperative metaheuristics area sub-set of metaheuristics, which implies aparallel exploration of the search space byseveral entities with information exchangebetween them. The importance of informationexchange in the optimization process is relatedto the building block hypothesis ofevolutionary algorithms, which is based onthese two questions: what is the pertinentinformation of a given potential solution andhow this information can be shared? Aclassification of cooperative metaheuristicsmethods depending on the nature of cooperationinvolved is presented and the specificproperties of each class, as well as a way tocombine them, is discussed. Severalimprovements in the field of metaheuristics arealso given. In particular, a method to regulatethe use of classical genetic operators and todefine new more pertinent ones is proposed,taking advantage of a building block structuredrepresentation of the explored space. Ahierarchical approach resting on multiplelevels of cooperative metaheuristics is finallypresented, leading to the definition of acomplete concerted cooperation strategy. Someapplications of these concepts to difficultproteomics problems, including automaticprotein identification, biological motifinference and multiple sequence alignment arepresented. For each application, an innovativemethod based on the cooperation concept isgiven and compared with classical approaches.In the protein identification problem, a firstlevel of cooperation using swarm intelligenceis applied to the comparison of massspectrometric data with biological sequencedatabase, followed by a genetic programmingmethod to discover an optimal scoring function.The multiple sequence alignment problem isdecomposed in three steps involving severalevolutionary processes to infer different kindof biological motifs and a concertedcooperation strategy to build the sequencealignment according to their motif conten

    Iterated local search using an add and delete hyper- heuristic for university course timetabling

    Get PDF
    Hyper-heuristics are (meta-)heuristics that operate at a higher level to choose or generate a set of low-level (meta-)heuristics in an attempt of solve difficult optimization problems. Iterated local search (ILS) is a well-known approach for discrete optimization, combining perturbation and hill-climbing within an iterative framework. In this study, we introduce an ILS approach, strengthened by a hyper-heuristic which generates heuristics based on a fixed number of add and delete operations. The performance of the proposed hyper-heuristic is tested across two different problem domains using real world benchmark of course timetabling instances from the second International Timetabling Competition Tracks 2 and 3. The results show that mixing add and delete operations within an ILS framework yields an effective hyper-heuristic approach

    Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment

    Get PDF
    Many Bioinformatics studies begin with a multiple sequence alignment as the foundation for their research. This is because multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence structure relationships.In this paper, we have proposed a Vertical Decomposition with Genetic Algorithm (VDGA) for Multiple Sequence Alignment (MSA). In VDGA, we divide the sequences vertically into two or more subsequences, and then solve them individually using a guide tree approach. Finally, we combine all the subsequences to generate a new multiple sequence alignment. This technique is applied on the solutions of the initial generation and of each child generation within VDGA. We have used two mechanisms to generate an initial population in this research: the first mechanism is to generate guide trees with randomly selected sequences and the second is shuffling the sequences inside such trees. Two different genetic operators have been implemented with VDGA. To test the performance of our algorithm, we have compared it with existing well-known methods, namely PRRP, CLUSTALX, DIALIGN, HMMT, SB_PIMA, ML_PIMA, MULTALIGN, and PILEUP8, and also other methods, based on Genetic Algorithms (GA), such as SAGA, MSA-GA and RBT-GA, by solving a number of benchmark datasets from BAliBase 2.0.The experimental results showed that the VDGA with three vertical divisions was the most successful variant for most of the test cases in comparison to other divisions considered with VDGA. The experimental results also confirmed that VDGA outperformed the other methods considered in this research

    Multiple Biolgical Sequence Alignment: Scoring Functions, Algorithms, and Evaluations

    Get PDF
    Aligning multiple biological sequences such as protein sequences or DNA/RNA sequences is a fundamental task in bioinformatics and sequence analysis. These alignments may contain invaluable information that scientists need to predict the sequences\u27 structures, determine the evolutionary relationships between them, or discover drug-like compounds that can bind to the sequences. Unfortunately, multiple sequence alignment (MSA) is NP-Complete. In addition, the lack of a reliable scoring method makes it very hard to align the sequences reliably and to evaluate the alignment outcomes. In this dissertation, we have designed a new scoring method for use in multiple sequence alignment. Our scoring method encapsulates stereo-chemical properties of sequence residues and their substitution probabilities into a tree-structure scoring scheme. This new technique provides a reliable scoring scheme with low computational complexity. In addition to the new scoring scheme, we have designed an overlapping sequence clustering algorithm to use in our new three multiple sequence alignment algorithms. One of our alignment algorithms uses a dynamic weighted guidance tree to perform multiple sequence alignment in progressive fashion. The use of dynamic weighted tree allows errors in the early alignment stages to be corrected in the subsequence stages. Other two algorithms utilize sequence knowledge-bases and sequence consistency to produce biological meaningful sequence alignments. To improve the speed of the multiple sequence alignment, we have developed a parallel algorithm that can be deployed on reconfigurable computer models. Analytically, our parallel algorithm is the fastest progressive multiple sequence alignment algorithm

    Filtered Distance Matrix For Constructing High-Throughput Multiple Sequence Alignment On Protein Data

    Get PDF
    Urutan Penjajaran Berganda (MSA) adalah satu proses yang penting dalam biologi pengkomputeran dan bioinformatik. MSA optima adalah masalah NP-keras sementara membina penjajaran optimum menggunakan pengaturcaraan dinamik merupakan masalah NP lengkap. Multiple sequence alignment (MSA) is a significant process in computational biology and bioinformatics. Optimal MSA is an NP-hard problem, while building optimal alignment using dynamic programming is an NP complete problem. Although numerous algorithms have been proposed for MSA, producing an efficient MSA with high accuracy remains a huge challenge

    Multiple sequence alignment using particle swarm optimization

    Get PDF
    The recent advent of bioinformatics has given rise to the central and recurrent problem of optimally aligning biological sequences. Many techniques have been proposed in an attempt to solve this complex problem with varying degrees of success. This thesis investigates the application of a computational intelligence technique known as particle swarm optimization (PSO) to the multiple sequence alignment (MSA) problem. Firstly, the performance of the standard PSO (S-PSO) and its characteristics are fully analyzed. Secondly, a scalability study is conducted that aims at expanding the S-PSO’s application to complex MSAs, as well as studying the behaviour of three other kinds of PSOs on the same problems. Experimental results show that the PSO is efficient in solving the MSA problem and compares positively with well-known CLUSTAL X and T-COFFEE.Dissertation (MSc)--University of Pretoria, 2009.Computer ScienceUnrestricte

    Methodology for modified whale optimization algorithm for solving appliances scheduling problem

    Get PDF
    Whale Optimization Algorithm (WOA) is considered as one of the newest metaheuristic algorithms to be used for solving a type of NP-hard problems. WOA is known of having slow convergence and at the same time, the computation of the algorithm will also be increased exponentially with multiple objectives and huge request from n users. The current constraints surely limit for solving and optimizing the quality of Demand Side Management (DSM) case, such as the energy consumption of indoor comfort index parameters which consist of thermal comfort, air quality, humidity and vision comfort.To address these issues, this proposed work will firstly justify and validate the constraints related to the appliances scheduling problem, and later proposes a new model of the Cluster based Multi-Objective WOA with multiple restart strategy. In order to achieve the objectives, different initialization strategy and cluster-based approaches will be used for tuning the main parameter of WOA under different MapReduce application which helps to control exploration and exploitation, and the proposed model will be tested on a set of well-known test functions and finally, will be applied on a real case project i.e. appliances scheduling problem. It is anticipating that the approach can expedite the convergence of meta-heuristic technique with quality solution

    A GRASP Algorithm Based on New Randomized Heuristic for Vehicle Routing Problem

    Get PDF
    This paper presents a novel GRASP algorithm based on a new randomized heuristic for solving the capacitated vehicle routing problem, which characterized by using a fleet of homogenous vehicle capacity that will start from one depot, to serve a number of customers with demands that are less than the vehicle capacity. The proposed method is based on a new constructive heuristic and a simulated annealing procedure as an improvement phase. The new constructive heuristic uses four steps to generate feasible initial solutions, and the simulated annealing enhances these solutions found to reach the optimal one. We tested our algorithm on two sets of benchmark instances and the obtained results are very encouraging
    corecore