    A Serendipitous Software Framework for Facilitating Collaboration in Computational Intelligence

    A major flaw in the academic system, particularly pertaining to computer science, is that it rewards specialisation. The highly competitive quest for new scientific developments, or rather the quest for a better reputation and more funding, forces researchers to specialise in their own fields, leaving them little time to properly explore what others are doing, sometimes even within their own field of interest. Even the peer review process, which should provide the necessary balance, fails to achieve much diversity, since reviews are typically performed by persons who are again specialists in the particular field of the work. Further, software implementations are rarely reviewed, having as a consequence the publishing of untenable results. Unfortunately, these factors contribute to an environment which is not conducive to collaboration, a cornerstone of academia | building on the work of others. This work takes a step back and examines the general landscape of computational intelligence from a broad perspective, drawing on multiple disciplines to formulate a collaborative software platform, which is flexible enough to support the needs of this diverse research community. Interestingly, this project did not set out with these goals in mind, rather it evolved, over time, from something more specialised into the general framework described in this dissertation. Design patterns are studied as a means to manage the complexity of the computational intelligence paradigm in a flexible software implementation. Further, this dissertation demonstrates that releasing research software under an open source license eliminates some of the deficiencies of the academic process, while preserving, and even improving, the ability to build a reputation and pursue funding. Two software packages have been produced as products of this research: i) CILib, an open source library of computational intelligence algorithms; and ii) CiClops, which is a virtual laboratory for performing experiments that scale over multiple workstations. Together, these software packages are intended to improve the quality of research output and facilitate collaboration by sharing a repository of simulation data, statistical analysis tools and a single software implementation.Dissertation (MSc)--University of Pretoria, 2006.Computer ScienceUnrestricte

    Optimisation du développement de nouveaux produits dans l'industrie pharmaceutique par algorithme génétique multicritère

    Le développement de nouveaux produits constitue une priorité stratégique de l'industrie pharmaceutique, en raison de la présence d'incertitudes, de la lourdeur des investissements mis en jeu, de l'interdépendance entre projets, de la disponibilité limitée des ressources, du nombre très élevé de décisions impliquées dû à la longueur des processus (de l'ordre d'une dizaine d'années) et de la nature combinatoire du problème. Formellement, le problème se pose ainsi : sélectionner des projets de Ret D parmi des projets candidats pour satisfaire plusieurs critères (rentabilité économique, temps de mise sur le marché) tout en considérant leur nature incertaine. Plus précisément, les points clés récurrents sont relatifs à la détermination des projets à développer une fois que les molécules cibles sont identifiées, leur ordre de traitement et le niveau de ressources à affecter. Dans ce contexte, une approche basée sur le couplage entre un simulateur à événements discrets stochastique (approche Monte Carlo) pour représenter la dynamique du système et un algorithme d'optimisation multicritère (de type NSGA II) pour choisir les produits est proposée. Un modèle par objets développé précédemment pour la conception et l'ordonnancement d'ateliers discontinus, de réutilisation aisée tant par les aspects de structure que de logique de fonctionnement, a été étendu pour intégrer le cas de la gestion de nouveaux produits. Deux cas d'étude illustrent et valident l'approche. Les résultats de simulation ont mis en évidence l'intérêt de trois critères d'évaluation de performance pour l'aide à la décision : le bénéfice actualisé d'une séquence, le risque associé et le temps de mise sur le marché. Ils ont été utilisés dans la formulation multiobjectif du problème d'optimisation. Dans ce contexte, des algorithmes génétiques sont particulièrement intéressants en raison de leur capacité à conduire directement au front de Pareto et à traiter l'aspect combinatoire. La variante NSGA II a été adaptée au problème pour prendre en compte à la fois le nombre et l'ordre de lancement des produits dans une séquence. A partir d'une analyse bicritère réalisée pour un cas d'étude représentatif sur différentes paires de critères pour l'optimisation bi- et tri-critère, la stratégie d'optimisation s'avère efficace et particulièrement élitiste pour détecter les séquences à considérer par le décideur. Seules quelques séquences sont détectées. Parmi elles, les portefeuilles à nombre élevé de produits provoquent des attentes et des retards au lancement ; ils sont éliminés par la stratégie d'optimistaion bicritère. Les petits portefeuilles qui réduisent les files d'attente et le temps de lancement sont ainsi préférés. Le temps se révèle un critère important à optimiser simultanément, mettant en évidence tout l'intérêt d'une optimisation tricritère. Enfin, l'ordre de lancement des produits est une variable majeure comme pour les problèmes d'ordonnancement d'atelier. ABSTRACT : New Product Development (NPD) constitutes a challenging problem in the pharmaceutical industry, due to the characteristics of the development pipeline, namely, the presence of uncertainty, the high level of the involved capital costs, the interdependency between projects, the limited availability of resources, the overwhelming number of decisions due to the length of the time horizon (about 10 years) and the combinatorial nature of a portfolio. Formally, the NPD problem can be stated as follows: select a set of R and D projects from a pool of candidate projects in order to satisfy several criteria (economic profitability, time to market) while copying with the uncertain nature of the projects. More precisely, the recurrent key issues are to determine the projects to develop once target molecules have been identified, their order and the level of resources to assign. In this context, the proposed approach combines discrete event stochastic simulation (Monte Carlo approach) with multiobjective genetic algorithms (NSGA II type, Non-Sorted Genetic Algorithm II) to optimize the highly combinatorial portfolio management problem. An object-oriented model previously developed for batch plant scheduling and design is then extended to embed the case of new product management, which is particularly adequate for reuse of both structure and logic. Two case studies illustrate and validate the approach. From this simulation study, three performance evaluation criteria must be considered for decision making: the Net Present Value (NPV) of a sequence, its associated risk defined as the number of positive occurrences of NPV among the samples and the time to market. Theyv have been used in the multiobjective optimization formulation of the problem. In that context, Genetic Algorithms (GAs) are particularly attractive for treating this kind of problem, due to their ability to directly lead to the so-called Pareto front and to account for the combinatorial aspect. NSGA II has been adapted to the treated case for taking into account both the number of products in a sequence and the drug release order. From an analysis performed for a representative case study on the different pairs of criteria both for the bi- and tricriteria optimization, the optimization strategy turns out to be efficient and particularly elitist to detect the sequences which can be considered by the decision makers. Only a few sequences are detected. Among theses sequences, large portfolios cause resource queues and delays time to launch and are eliminated by the bicriteria optimization strategy. Small portfolio reduces queuing and time to launch appear as good candidates. The optimization strategy is interesting to detect the sequence candidates. Time is an important criterion to consider simultaneously with NPV and risk criteria. The order in which drugs are released in the pipeline is of great importance as with scheduling problems

    An Object-Oriented Programming Environment for Parallel Genetic Algorithms

    This thesis investigates an object-oriented programming environment for building parallel applications based on genetic algorithms (GAs). It describes the design of the Genetic Algorithms Manipulation Environment (GAME), which focuses on three major software development requirements: flexibility, expandability and portability. Flexibility is provided by GAME through a set of libraries containing pre-defined and parameterised components such as genetic operators and algorithms. Expandability is offered by GAME'S object-oriented design. It allows applications, algorithms and genetic operators to be easily modified and adapted to satisfy diverse problem's requirements. Lastly, portability is achieved through the use of the standard C++ language, and by isolating machine and operating system dependencies into low-level modules, which are hidden from the application developer by GAME'S application programming interfaces. The development of GAME is central to the Programming Environment for Applications of PArallel GENetic Algorithms project (PAPAGENA). This is the principal European Community (ESPRIT III) funded parallel genetic algorithms project. It has two main goals: to provide a general-purpose tool kit, supporting the development and analysis of large-scale parallel genetic algorithms (PGAs) applications, and to demonstrate the potential of applying evolutionary computing in diverse problem domains. The research reported in this thesis is divided in two parts: i) the analysis of GA models and the study of existing GA programming environments from an application developer perspective; ii) the description of a general-purpose programming environment designed to help with the development of GA and PGA-based computer programs. The studies carried out in the first part provide the necessary understanding of GAs' structure and operation to outline the requirements for the development of complex computer programs. The second part presents GAME as the result of combining development requirements, relevant features of existing environments and innovative ideas, into a powerful programming environment. The system is described in terms of its abstract data structures and sub-systems that allow the representation of problems independently of any particular GA model. GAME's programming model is also presented as general-purpose object-oriented framework for programming coarse-grained parallel applications. GAME has a modular architecture comprising five modules: the Virtual Machine, the Parallel Execution Module, the Genetic Libraries, the Monitoring Control Module, and the Graphic User Interface. GAME's genetic-oriented abstract data structures, and the Virtual Machine, isolates genetic operators and algorithms from low-level operations such as memory management, exception handling, etc. The Parallel Execution Module supports GAME's object- oriented parallel programming model. It defines an application programming interface and a runtime library that allow the same parallel application, created within the environment, to run on different hardware and operating system platforms. The Genetic Libraries outline a hierarchy of components implemented as parameterised versions of standard and custom genetic operators, algorithms and applications. The Monitoring Control Module supports dynamic control and monitoring of simulations, whereas the Graphic User Interface defines a basic framework and graphic 'widgets' for displaying and entering data. This thesis describes the design philosophy and rationale behind these modules, covering in more detail the Virtual Machine, the Parallel Execution Module and the Genetic Libraries. The assessment discusses the system's ability to satisfy the main requirements of GA and PGA software development, as well as the features that distinguish GAME from other programming environments

    Multiobjective optimization of New Product Development in the pharmaceutical industry

    End to end Multi-Objective Optimisation of H.264 and HEVC Codecs

    All multimedia devices now incorporate video CODECs that comply with international video coding standards such as H.264 / MPEG4-AVC and the new High Efficiency Video Coding Standard (HEVC) otherwise known as H.265. Although the standard CODECs have been designed to include algorithms with optimal efficiency, large number of coding parameters can be used to fine tune their operation, within known constraints of for e.g., available computational power, bandwidth, consumer QoS requirements, etc. With large number of such parameters involved, determining which parameters will play a significant role in providing optimal quality of service within given constraints is a further challenge that needs to be met. Further how to select the values of the significant parameters so that the CODEC performs optimally under the given constraints is a further important question to be answered. This thesis proposes a framework that uses machine learning algorithms to model the performance of a video CODEC based on the significant coding parameters. Means of modelling both the Encoder and Decoder performance is proposed. We define objective functions that can be used to model the performance related properties of a CODEC, i.e., video quality, bit-rate and CPU time. We show that these objective functions can be practically utilised in video Encoder/Decoder designs, in particular in their performance optimisation within given operational and practical constraints. A Multi-objective Optimisation framework based on Genetic Algorithms is thus proposed to optimise the performance of a video codec. The framework is designed to jointly minimize the CPU Time, Bit-rate and to maximize the quality of the compressed video stream. The thesis presents the use of this framework in the performance modelling and multi-objective optimisation of the most widely used video coding standard in practice at present, H.264 and the latest video coding standard, H.265/HEVC. When a communication network is used to transmit video, performance related parameters of the communication channel will impact the end-to-end performance of the video CODEC. Network delays and packet loss will impact the quality of the video that is received at the decoder via the communication channel, i.e., even if a video CODEC is optimally configured network conditions will make the experience sub-optimal. Given the above the thesis proposes a design, integration and testing of a novel approach to simulating a wired network and the use of UDP protocol for the transmission of video data. This network is subsequently used to simulate the impact of packet loss and network delays on optimally coded video based on the framework previously proposed for the modelling and optimisation of video CODECs. The quality of received video under different levels of packet loss and network delay is simulated, concluding the impact on transmitted video based on their content and features

    Substructural Analysis Using Evolutionary Computing Techniques

    Substructural analysis (SSA) was one of the very first machine learning techniques to be applied to chemoinformatics in the area of virtual screening. For this method, given a set of compounds typically defined by their fragment occurrence data (such as 2D fingerprints). The SSA computes weights for each of the fragments which outlines its contribution to the activity (or inactivity) of compounds containing that fragment. The overall probability of activity for a compound is then computed by summing up or combining the weights for the fragments present in the compound. A variety of weighting schemes based on specific relationship-bound equations are available for this purpose. This thesis identifies uplift to the effectiveness of SSA, using two evolutionary computation methods based on genetic traits, particularly the genetic algorithm (GA) and genetic programming (GP). Building on previous studies, it was possible to analyse and compare ten published SSA weighting schemes based on a simulated virtual screening experiment. The analysis showed the most effective weighting scheme to be the R4 equation which was a part of document-based weighting schemes. A second experiment was carried out to investigate the application of GA-based weighting scheme for the SSA in comparison to an experiment using the R4 weighting scheme. The GA algorithm is simple in concept focusing purely on suitable weight generation and effective in operation. The findings show that the GA-based SSA is superior to the R4-based SSA, both in terms of active compound retrieval rate and predictive performance. A third experiment investigated the genetic application via a GP-based SSA. Rigorous experiment results showed that the GP was found to be superior to the existing SSA weighting schemes. In general, however, the GP-based SSA was found to be less effective than the GA-based SSA. A final experimented is described in this thesis which sought to explore the feasibility of data fusion on both the GA and GP. It is a method producing a final ranking list from multiple sets of ranking lists, based on several fusion rules. The results indicate that data fusion is a good method to boost GA-and GP-based SSA searching. The RKP rule was considered the most effective fusion rule

    Applications of Computational Intelligence to Power Systems

    In power system operation and control, the basic goal is to provide users with quality electricity power in an economically rational degree for power systems, and to ensure their stability and reliability. However, the increased interconnection and loading of the power system along with deregulation and environmental concerns has brought new challenges for electric power system operation, control, and automation. In the liberalised electricity market, the operation and control of a power system has become a complex process because of the complexity in modelling and uncertainties. Computational intelligence (CI) is a family of modern tools for solving complex problems that are difficult to solve using conventional techniques, as these methods are based on several requirements that may not be true all of the time. Developing solutions with these “learning-based” tools offers the following two major advantages: the development time is much shorter than when using more traditional approaches, and the systems are very robust, being relatively insensitive to noisy and/or missing data/information, known as uncertainty

    A novel computer aided engineering method for comparative evaluation of nonlinear structures in the conceptual design phase

    Selection of the preferred design concept during design represents a major challenge to design engineers as the required level of information and rigour to achieve an objective evaluation at early stage of design is typically not available. This is particularly evident during evaluation of design concepts of complex load-bearing mechanical structures. The engineering design concepts during concept design phase typically lack detail and more specific performance indicators to enable accurate evaluation. Hence in such cases, a prevailing evaluation approach is based primarily on qualitative scores inferred through personal intuition and historical experience of the design team or individual experts. The principal motivation behind this research is to improve the ability and confidence to select a superior design concept early in the design process. The conventional approach is sensitive to individual expertise and availability of experienced designers. Therefore, in order to make more informed decisions especially in case of complex engineering designs, the concept evaluation methods require more detailed and accurate information. This research is concerned with the development of a novel method for comparative evaluation of engineering design concepts that exhibit nonlinear structural behaviour under load. The approach is based on two key concepts: i) an expansion of the conventional substructuring technique into the nonlinear domain to enable FEA to be more applicable, effective and computationally affordable in early stages of the conceptual design phase; and ii) a restructuring of the traditional process by incorporating the optimisation search to provide orderly rule-guided evolution of design concepts in order to produce objective development metrics which alleviates the dependence on personal intuition and historical experience of the engineering designers. A series of experiments and validation case studies conducted in this research provide conclusive evidence that demonstrates the applicability and the significance of the developed method in terms of reduced time for evaluation and amount of recurrent knowledge generated compared to the more traditional approaches based on the application of FEA in the conceptual design phase. Furthermore, a Confidence Index as a performance measure is developed in this research to describe the quality of the obtained solutions. The derived Confidence Index is a novel contribution to the fields of metaheuristic measurements and engineering concept validation methodology
