    Soft Computing Techiniques for the Protein Folding Problem on High Performance Computing Architectures

    The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.

    Parallel evolution strategy for protein threading.

    A protein-sequence folds into a specific shape in order to function in its aqueous state. If the primary sequence of a protein is given, what is its three dimensional structure? This is a long-standing problem in the field of molecular biology and it has large implication to drug design and cure. Among several proposed approaches, protein threading represents one of the most promising technique. The protein threading problem (PTP) is the problem of determining the three-dimensional structure of a given but arbitrary protein sequence from a set of known structures of other proteins. This problem is known to be NP-hard and current computational approaches to threading are time-consuming and data-intensive. In this thesis, we proposed an evolution strategy (ES) based approach for protein threading (EST). We also developed two parallel approaches for the PTP problem and both are parallelizations of our novel EST. The first method, we call SQST-PEST (Single Query Single Template Parallel EST) threads a single query against a single template. We use ES to find the best alignment between the query and the template, and ES is parallelized. The second method, we call SQMT-PEST (Single Query Multiple Templates Parallel EST) to allow for threading a single query against multiple templates within reasonable time. We obtained better results than current comparable approaches, as well as significant reduction in execution time.

    Optimización de algoritmos bioinspirados en sistemas heterogéneos CPU-GPU.

    Los retos científicos del siglo XXI precisan del tratamiento y análisis de una ingente cantidad de información en la conocida como la era del Big Data. Los futuros avances en distintos sectores de la sociedad como la medicina, la ingeniería o la producción eficiente de energía, por mencionar sólo unos ejemplos, están supeditados al crecimiento continuo en la potencia computacional de los computadores modernos. Sin embargo, la estela de este crecimiento computacional, guiado tradicionalmente por la conocida “Ley de Moore”, se ha visto comprometido en las últimas décadas debido, principalmente, a las limitaciones físicas del silicio. Los arquitectos de computadores han desarrollado numerosas contribuciones multicore, manycore, heterogeneidad, dark silicon, etc, para tratar de paliar esta ralentización computacional, dejando en segundo plano otros factores fundamentales en la resolución de problemas como la programabilidad, la fiabilidad, la precisión, etc. El desarrollo de software, sin embargo, ha seguido un camino totalmente opuesto, donde la facilidad de programación a través de modelos de abstracción, la depuración automática de código para evitar efectos no deseados y la puesta en producción son claves para una viabilidad económica y eficiencia del sector empresarial digital. Esta vía compromete, en muchas ocasiones, el rendimiento de las propias aplicaciones; consecuencia totalmente inadmisible en el contexto científico. En esta tesis doctoral tiene como hipótesis de partida reducir las distancias entre los campos hardware y software para contribuir a solucionar los retos científicos del siglo XXI. El desarrollo de hardware está marcado por la consolidación de los procesadores orientados al paralelismo masivo de datos, principalmente GPUs Graphic Processing Unit y procesadores vectoriales, que se combinan entre sí para construir procesadores o computadores heterogéneos HSA. En concreto, nos centramos en la utilización de GPUs para acelerar aplicaciones científicas. Las GPUs se han situado como una de las plataformas con mayor proyección para la implementación de algoritmos que simulan problemas científicos complejos. Desde su nacimiento, la trayectoria y la historia de las tarjetas gráficas ha estado marcada por el mundo de los videojuegos, alcanzando altísimas cotas de popularidad según se conseguía más realismo en este área. Un hito importante ocurrió en 2006, cuando NVIDIA (empresa líder en la fabricación de tarjetas gráficas) lograba hacerse con un hueco en el mundo de la computación de altas prestaciones y en el mundo de la investigación con el desarrollo de CUDA “Compute Unified Device Arquitecture. Esta arquitectura posibilita el uso de la GPU para el desarrollo de aplicaciones científicas de manera versátil. A pesar de la importancia de la GPU, es interesante la mejora que se puede producir mediante su utilización conjunta con la CPU, lo que nos lleva a introducir los sistemas heterogéneos tal y como detalla el título de este trabajo. Es en entornos heterogéneos CPU-GPU donde estos rendimientos alcanzan sus cotas máximas, ya que no sólo las GPUs soportan el cómputo científico de los investigadores, sino que es en un sistema heterogéneo combinando diferentes tipos de procesadores donde podemos alcanzar mayor rendimiento. En este entorno no se pretende competir entre procesadores, sino al contrario, cada arquitectura se especializa en aquella parte donde puede explotar mejor sus capacidades. Donde mayor rendimiento se alcanza es en estos clústeres heterogéneos, donde múltiples nodos son interconectados entre sí, pudiendo dichos nodos diferenciarse no sólo entre arquitecturas CPU-GPU, sino también en las capacidades computacionales dentro de estas arquitecturas. Con este tipo de escenarios en mente, se presentan nuevos retos en los que lograr que el software que hemos elegido como candidato se ejecuten de la manera más eficiente y obteniendo los mejores resultados posibles. Estas nuevas plataformas hacen necesario un rediseño del software para aprovechar al máximo los recursos computacionales disponibles. Se debe por tanto rediseñar y optimizar los algoritmos existentes para conseguir que las aportaciones en este campo sean relevantes, y encontrar algoritmos que, por su propia naturaleza sean candidatos para que su ejecución en dichas plataformas de alto rendimiento sea óptima. Encontramos en este punto una familia de algoritmos denominados bioinspirados, que utilizan la inteligencia colectiva como núcleo para la resolución de problemas. Precisamente esta inteligencia colectiva es la que les hace candidatos perfectos para su implementación en estas plataformas bajo el nuevo paradigma de computación paralela, puesto que las soluciones pueden ser construidas en base a individuos que mediante alguna forma de comunicación son capaces de construir conjuntamente una solución común. Esta tesis se centrará especialmente en uno de estos algoritmos bioinspirados que se engloba dentro del término metaheurísticas bajo el paradigma del Soft Computing, el Ant Colony Optimization “ACO”. Se realizará una contextualización, estudio y análisis del algoritmo. Se detectarán las partes más críticas y serán rediseñadas buscando su optimización y paralelización, manteniendo o mejorando la calidad de sus soluciones. Posteriormente se pasará a implementar y testear las posibles alternativas sobre diversas plataformas de alto rendimiento. Se utilizará el conocimiento adquirido en el estudio teórico-práctico anterior para su aplicación a casos reales, más en concreto se mostrará su aplicación sobre el plegado de proteínas. Todo este análisis es trasladado a su aplicación a un caso concreto. En este trabajo, aunamos las nuevas plataformas hardware de alto rendimiento junto al rediseño e implementación software de un algoritmo bioinspirado aplicado a un problema científico de gran complejidad como es el caso del plegado de proteínas. Es necesario cuando se implementa una solución a un problema real, realizar un estudio previo que permita la comprensión del problema en profundidad, ya que se encontrará nueva terminología y problemática para cualquier neófito en la materia, en este caso, se hablará de aminoácidos, moléculas o modelos de simulación que son desconocidos para los individuos que no sean de un perfil biomédico.Ingeniería, Industria y Construcció

    Computational Methods in Science and Engineering : Proceedings of the Workshop SimLabs@KIT, November 29 - 30, 2010, Karlsruhe, Germany

    In this proceedings volume we provide a compilation of article contributions equally covering applications from different research fields and ranging from capacity up to capability computing. Besides classical computing aspects such as parallelization, the focus of these proceedings is on multi-scale approaches and methods for tackling algorithm and data complexity. Also practical aspects regarding the usage of the HPC infrastructure and available tools and software at the SCC are presented

    Parallel and Distributed Computing

    The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

    Theoretical analysis and simulations applied to rational design strategies of nanostructured materials

    Orientador: Douglas Soares GalvãoTese (doutorado) - Universidade Estadual de Campinas, Instituto de Física Gleb WataghinResumo: Esse documento apresenta uma coleção de trabalhos realizados dentro do amplo campo de materiais nanoestruturados, focando-se em descrições teóricas analíticas e simulações computacionais de diversos novos materias desse tipo. Uma nova fibra supereslástica e condutora é reportada. Essa fibra altamente esticável (até 1320%) é criada envolvendo-se um núcleo cilíndrico de borracha com uma camada de folha de nanotubos de carbono. O material resultante exibe uma interessante estrutura de enrugamentos hierárquicos na sua superfície, o que lhe garante propriedades elétricas úteis como conservar a sua resistencia constante enquanto esticada. Adicionando-se mais camadas de borracha ou nanotubos podemos obter aplicações como sensores de movimento ou deformação, atuadores/músculos artificiais ativados por corrente ou temperatura e operados reversivelmente por um mecanismo de acoplamento entre tensão e torção. Nós explicamos suas propriedades de condução elétrica e os fenômenos físicos envolvidos em cada uma dessas aplicações. Também desenvolvemos um novo método para o desenho racional de polímeros molecularmente impressos usando dinâmica molecular para simular o processo de impressão molecular e a análise subsequente utilizando experimentos de cromatografia simulada. Obtivemos com sucesso a primeira evidência teórica do mecanismo de impressão exibindo afinidade e seletividade para a substância alvo 17-beta-estradiol. Desenhamos e simulamos uma nova estrutura com formato de piramide em kirigami de grafeno, composta de uma folha de grafeno cortada em um padrão específico a fim de formar uma pirâmide quando sofre tensão na direção normal ao plano. Nós calculamos a resposta dessa estrutura a uma carga estática, quando ela age como uma mola de proporções nanométriacs. Também, utilizando simulações de dinâmica molecular de colisões balísticas, constatamos que a resistência desse material a impactos é ainda maior que de uma folha de grafeno puro, sendo ainda mais leve. Um novo método de reforçar fios de nanotubos de carbono, chamado ITAP, também é reportado. Esse método foi capaz de melhorar a resistencia mecanica do fio em até 1,5 vezes e torná-lo muito mais resistente ao ataque de ácido quando comparado com um fio não tratado. Utilizamos simulações de dinâmica molecular para testar a hipótese de que esse tratamento é suficiente para gerar ligações covalentes entre as paredes externas de nanotubos diferentes, o que seria responsável pelas propriedades do material. Aplicamos um algoritmo genético modificado ao problema do folding de proteínas em um modelo de rede 3D HP. Testamos o algoritmo utilizando um conjunto de sequencias de teste que têm estado em uso pelos últimos 20 anos na literatura. Fomos capazes de melhorar um dos resultados e demonstramos a aplicação e utilidade de operadores não canônicos que evitam a convergência prematura do algoritmo, sendo eles o operador de compartilhamento e efeito maternalAbstract: This document presents a colection of works done within the broad subject of nano-structured materials, focusing on analytical theoretical descriptions and computational simulations of new kinds of this class of materials. A new superelastic conducting fiber is reported, with improved properties and functionalities. They are highly stretchable (up to 1320%) conducting fibers created by wrapping carbon nanotube sheets on stretched rubber fiber cores. The resulting structure exhibited an interesting hierarchical buckled structure on its surface. By including more rubber and carbon nanotube layers, we created strain sensors, and electrically or thermally powered tensile and torsional muscles/actuators operating reversibly by a coupled tension-to-torsion actuation mechanism. We explain its electronic properties and quantitatively explain the compounded physical effects involved in each of these applications. We also developed a new method for the rational design of molecularly imprinted polymers using molecular dynamics to simulate the imprinting process and subsequent chromatography studies. We successfully obtained the first theoretical evidence of actual imprinting happening under unconstrained simulations showing affinity and selectivity to the target substance 17-beta estradiol. We designed and simulated a new graphene kirigami pyramid structure, composed of a cut graphene sheet in a specific pattern in order to form a pyramid when under stress perpendicular to the plane. We calculated the response to static loading of this structure that acts like a nano-sized spring. Also, with simulated ballistic collisions we obtained increased resistance to impact in comparison with a pure graphene sheet. A new method of strengthening carbon nanotube yarns, called ITAP, consisting of annealing at high temperature in vacuum is reported. This method is shown to increase the mechanical resistance of the wire up to 1.5 times and make it much more resistant to acid corrosion when compared to pristine non-treated wires. We applied a modified genetic algorithm to the protein folding problem using an 3D HP lattice model using known test sequences that have been in use for the last 20 years and obtained an improvement for the best solution found for one of these proteins. Also, the importance of new non-canonical operators that prevent rapid convergence of the algorithm was demonstrated, namely the Sharing and Maternal Effect operators

    Monte Carlo Method with Heuristic Adjustment for Irregularly Shaped Food Product Volume Measurement

    Volume measurement plays an important role in the production and processing of food products. Various methods have been proposed to measure the volume of food products with irregular shapes based on 3D reconstruction. However, 3D reconstruction comes with a high-priced computational cost. Furthermore, some of the volume measurement methods based on 3D reconstruction have a low accuracy. Another method for measuring volume of objects uses Monte Carlo method. Monte Carlo method performs volume measurements using random points. Monte Carlo method only requires information regarding whether random points fall inside or outside an object and does not require a 3D reconstruction. This paper proposes volume measurement using a computer vision system for irregularly shaped food products without 3D reconstruction based on Monte Carlo method with heuristic adjustment. Five images of food product were captured using five cameras and processed to produce binary images. Monte Carlo integration with heuristic adjustment was performed to measure the volume based on the information extracted from binary images. The experimental results show that the proposed method provided high accuracy and precision compared to the water displacement method. In addition, the proposed method is more accurate and faster than the space carving method

    Studies on distributed approaches for large scale multi-criteria protein structure comparison and analysis

    Protein Structure Comparison (PSC) is at the core of many important structural biology problems. PSC is used to infer the evolutionary history of distantly related proteins; it can also help in the identification of the biological function of a new protein by comparing it with other proteins whose function has already been annotated; PSC is also a key step in protein structure prediction, because one needs to reliably and efficiently compare tens or hundreds of thousands of decoys (predicted structures) in evaluation of 'native-like' candidates (e.g. Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment). Each of these applications, as well as many others where molecular comparison plays an important role, requires a different notion of similarity, which naturally lead to the Multi-Criteria Protein Structure Comparison (MC-PSC) problem. ProCKSI (www.procksi.org), was the first publicly available server to provide algorithmic solutions for the MC-PSC problem by means of an enhanced structural comparison that relies on the principled application of information fusion to similarity assessments derived from multiple comparison methods (e.g. USM, FAST, MaxCMO, DaliLite, CE and TMAlign). Current MC-PSC works well for moderately sized data sets and it is time consuming as it provides public service to multiple users. Many of the structural bioinformatics applications mentioned above would benefit from the ability to perform, for a dedicated user, thousands or tens of thousands of comparisons through multiple methods in real-time, a capacity beyond our current technology. This research is aimed at the investigation of Grid-styled distributed computing strategies for the solution of the enormous computational challenge inherent in MC-PSC. To this aim a novel distributed algorithm has been designed, implemented and evaluated with different load balancing strategies and selection and configuration of a variety of software tools, services and technologies on different levels of infrastructures ranging from local testbeds to production level eScience infrastructures such as the National Grid Service (NGS). Empirical results of different experiments reporting on the scalability, speedup and efficiency of the overall system are presented and discussed along with the software engineering aspects behind the implementation of a distributed solution to the MC-PSC problem based on a local computer cluster as well as with a GRID implementation. The results lead us to conclude that the combination of better and faster parallel and distributed algorithms with more similarity comparison methods provides an unprecedented advance on protein structure comparison and analysis technology. These advances might facilitate both directed and fortuitous discovery of protein similarities, families, super-families, domains, etc, and also help pave the way to faster and better protein function inference, annotation and protein structure prediction and assessment thus empowering the structural biologist to do a science that he/she would not have done otherwise