10 research outputs found

    Global Grids and Software Toolkits: A Study of Four Grid Middleware Technologies

    Full text link
    Grid is an infrastructure that involves the integrated and collaborative use of computers, networks, databases and scientific instruments owned and managed by multiple organizations. Grid applications often involve large amounts of data and/or computing resources that require secure resource sharing across organizational boundaries. This makes Grid application management and deployment a complex undertaking. Grid middlewares provide users with seamless computing ability and uniform access to resources in the heterogeneous Grid environment. Several software toolkits and systems have been developed, most of which are results of academic research projects, all over the world. This chapter will focus on four of these middlewares--UNICORE, Globus, Legion and Gridbus. It also presents our implementation of a resource broker for UNICORE as this functionality was not supported in it. A comparison of these systems on the basis of the architecture, implementation model and several other features is included.Comment: 19 pages, 10 figure

    Grid Information Technology as a New Technological Tool for e-Science, Healthcare and Life Science

    Get PDF
    Nowadays, scientific projects require collaborative environments and powerful computing resources capable of handling huge quantities of data, which gives rise to e-Science. These requirements are evident in the need to optimise time and efforts in activities to do with health. When e-Science focuses on the collaborative handling of all the information generated in clinical medicine and health, e-Health is the result. Scientists are taking increasing interest in an emerging technology – Grid Information Technology – that may offer a solution to their current needs. The current work aims to survey how e-Science is using this technology all around the world. We also argue that the technology may provide an ideal solution for the new challenges facing e-Health and Life Science.Hoy en día, los proyectos científicos requieren poderosos recursos de computación capaces de manejar grandes cantidades de datos, los cuales han dado paso a la ciencia electrónica (e-ciencia). Estos requerimientos se hacen evidentes en la necesidad de optimizar tiempo y esfuerzos en actividades relacionadas con la salud. Cuando la e-ciencia se enfoca en el manejo colaborativo de toda la información generada en la medicina clínica y la salud, da como resultado la salud electrónica (e-salud). Los científicos se han interesado cada vez más y más en una tecnología emergente, como lo es la Tecnología de información en red, la que puede ofrecer solución a sus necesidades cotidianas. El siguiente trabajo apunta a examinar como la e-ciencia es empleada en el mundo. También se discute que la tecnología puede proveer una solución ideal para encarar nuevos desafíos en e-salud y Ciencias de la Vida.Nowadays, scientific projects require collaborative environments and powerful computing resources capable of handling huge quantities of data, which gives rise to e-Science. These requirements are evident in the need to optimise time and efforts in activities to do with health. When e-Science focuses on the collaborative handling of all the information generated in clinical medicine and health, e-Health is the result. Scientists are taking increasing interest in an emerging technology – Grid Information Technology – that may offer a solution to their current needs. The current work aims to survey how e-Science is using this technology all around the world. We also argue that the technology may provide an ideal solution for the new challenges facing e-Health and Life Science

    Development and application of distributed computing tools for virtual screening of large compound libraries

    Get PDF
    Im derzeitigen Drug Discovery Prozess ist die Identifikation eines neuen Targetproteins und dessen potenziellen Liganden langwierig, teuer und zeitintensiv. Die Verwendung von in silico Methoden gewinnt hier zunehmend an Bedeutung und hat sich als wertvolle Strategie zur Erkennung komplexer Zusammenhänge sowohl im Bereich der Struktur von Proteinen wie auch bei Bioaktivitäten erwiesen. Die zunehmende Nachfrage nach Rechenleistung im wissenschaftlichen Bereich sowie eine detaillierte Analyse der generierten Datenmengen benötigen innovative Strategien für die effiziente Verwendung von verteilten Computerressourcen, wie z.B. Computergrids. Diese Grids ergänzen bestehende Technologien um einen neuen Aspekt, indem sie heterogene Ressourcen zur Verfügung stellen und koordinieren. Diese Ressourcen beinhalten verschiedene Organisationen, Personen, Datenverarbeitung, Speicherungs- und Netzwerkeinrichtungen, sowie Daten, Wissen, Software und Arbeitsabläufe. Das Ziel dieser Arbeit war die Entwicklung einer universitätsweit anwendbaren Grid-Infrastruktur - UVieCo (University of Vienna Condor pool) -, welche für die Implementierung von akademisch frei verfügbaren struktur- und ligandenbasierten Drug Discovery Anwendungen verwendet werden kann. Firewall- und Sicherheitsprobleme wurden mittels eines virtuellen privaten Netzwerkes gelöst, wohingegen die Virtualisierung der Computerhardware über das CoLinux Konzept ermöglicht wurde. Dieses ermöglicht, dass unter Linux auszuführende Aufträge auf Windows Maschinen laufen können. Die Effektivität des Grids wurde durch Leistungsmessungen anhand sequenzieller und paralleler Aufgaben ermittelt. Als Anwendungsbeispiel wurde die Assoziation der Expression bzw. der Sensitivitätsprofile von ABC-Transportern mit den Aktivitätsprofilen von Antikrebswirkstoffen durch Data-Mining des NCI (National Cancer Institute) Datensatzes analysiert. Die dabei generierten Datensätze wurden für liganden-basierte Computermethoden wie Shape-Similarity und Klassifikationsalgorithmen mit dem Ziel verwendet, P-glycoprotein (P-gp) Substrate zu identifizieren und sie von Nichtsubstraten zu trennen. Beim Erstellen vorhersagekräftiger Klassifikationsmodelle konnte das Problem der extrem unausgeglichenen Klassenverteilung durch Verwendung der „Cost-Sensitive Bagging“ Methode gelöst werden. Applicability Domain Studien ergaben, dass unser Modell nicht nur die NCI Substanzen gut vorhersagen kann, sondern auch für wirkstoffähnliche Moleküle verwendet werden kann. Die entwickelten Modelle waren relativ einfach, aber doch präzise genug um für virtuelles Screening einer großen chemischen Bibliothek verwendet werden zu können. Dadurch könnten P-gp Substrate schon frühzeitig erkannt werden, was möglicherweise nützlich sein kann zur Entfernung von Substanzen mit schlechten ADMET-Eigenschaften bereits in einer frühen Phase der Arzneistoffentwicklung. Zusätzlich wurden Shape-Similarity und Self-organizing Map Techniken verwendet um neue Substanzen in einer hauseigenen sowie einer großen kommerziellen Datenbank zu identifizieren, die ähnlich zu selektiven Serotonin-Reuptake-Inhibitoren (SSRI) sind und Apoptose induzieren können. Die erhaltenen Treffer besitzen neue chemische Grundkörper und können als Startpunkte für Leitstruktur-Optimierung in Betracht gezogen werden. Die in dieser Arbeit beschriebenen Studien werden nützlich sein um eine verteilte Computerumgebung zu kreieren die vorhandene Ressourcen in einer Organisation nutzt, und die für verschiedene Anwendungen geeignet ist, wie etwa die effiziente Handhabung der Klassifizierung von unausgeglichenen Datensätzen, oder mehrstufiges virtuelles Screening.In the current drug discovery process, the identification of new target proteins and potential ligands is very tedious, expensive and time-consuming. Thus, use of in silico techniques is of utmost importance and proved to be a valuable strategy in detecting complex structural and bioactivity relationships. Increased demands of computational power for tremendous calculations in scientific fields and timely analysis of generated piles of data require innovative strategies for efficient utilization of distributed computing resources in the form of computational grids. Such grids add a new aspect to the emerging information technology paradigm by providing and coordinating the heterogeneous resources such as various organizations, people, computing, storage and networking facilities as well as data, knowledge, software and workflows. The aim of this study was to develop a university-wide applicable grid infrastructure, UVieCo (University of Vienna Condor pool) which can be used for implementation of standard structure- and ligand-based drug discovery applications using freely available academic software. Firewall and security issues were resolved with a virtual private network setup whereas virtualization of computer hardware was done using the CoLinux concept in a way to run Linux-executable jobs inside Windows machines. The effectiveness of the grid was assessed by performance measurement experiments using sequential and parallel tasks. Subsequently, the association of expression/sensitivity profiles of ABC transporters with activity profiles of anticancer compounds was analyzed by mining the data from NCI (National Cancer Institute). The datasets generated in this analysis were utilized with ligand-based computational methods such as shape similarity and classification algorithms to identify and separate P-gp substrates from non-substrates. While developing predictive classification models, the problem of imbalanced class distribution was proficiently addressed using the cost-sensitive bagging approach. Applicability domain experiment revealed that our model not only predicts NCI compounds well, but it can also be applied to drug-like molecules. The developed models were relatively simple but precise enough to be applicable for virtual screening of large chemical libraries for the early identification of P-gp substrates which can potentially be useful to remove compounds of poor ADMET properties in an early phase of drug discovery. Additionally, shape-similarity and self-organizing maps techniques were used to screen in-house as well as a large vendor database for identification of novel selective serotonin reuptake inhibitor (SSRI) like compounds to induce apoptosis. The retrieved hits possess novel chemical scaffolds and can be considered as a starting point for lead optimization studies. The work described in this thesis will be useful to create distributed computing environment using available resources within an organization and can be applied to various applications such as efficient handling of imbalanced data classification problems or multistep virtual screening approach

    Computational Methods in Science and Engineering : Proceedings of the Workshop SimLabs@KIT, November 29 - 30, 2010, Karlsruhe, Germany

    Get PDF
    In this proceedings volume we provide a compilation of article contributions equally covering applications from different research fields and ranging from capacity up to capability computing. Besides classical computing aspects such as parallelization, the focus of these proceedings is on multi-scale approaches and methods for tackling algorithm and data complexity. Also practical aspects regarding the usage of the HPC infrastructure and available tools and software at the SCC are presented

    A study in grid simulation and scheduling

    Get PDF
    Grid computing is emerging as an essential tool for large scale analysis and problem solving in scientific and business domains. Whilst the idea of stealing unused processor cycles is as old as the Internet, we are still far from reaching a position where many distributed resources can be seamlessly utilised on demand. One major issue preventing this vision is deciding how to effectively manage the remote resources and how to schedule the tasks amongst these resources. This thesis describes an investigation into Grid computing, specifically the problem of Grid scheduling. This complex problem has many unique features making it particularly difficult to solve and as a result many current Grid systems employ simplistic, inefficient solutions. This work describes the development of a simulation tool, G-Sim, which can be used to test the effectiveness of potential Grid scheduling algorithms under realistic operating conditions. This tool is used to analyse the effectiveness of a simple, novel scheduling technique in numerous scenarios. The results are positive and show that it could be applied to current procedures to enhance performance and decrease the negative effect of resource failure. Finally a conversion between the Grid scheduling problem and the classic computing problem SAT is provided. Such a conversion allows for the possibility of applying sophisticated SAT solving procedures to Grid scheduling providing potentially effective solutions

    A service-oriented Grid environment with on-demand QoS support

    Get PDF
    Grid Computing entstand aus der Vision für eine neuartige Recheninfrastruktur, welche darauf abzielt, Rechenkapazität so einfach wie Elektrizität im Stromnetz (power grid) verfügbar zu machen. Der entsprechende Zugriff auf global verteilte Rechenressourcen versetzt Forscher rund um den Globus in die Lage, neuartige Herausforderungen aus Wissenschaft und Technik in beispiellosem Ausmaß in Angriff zu nehmen. Die rasanten Entwicklungen im Grid Computing begünstigten auch Standardisierungsprozesse in Richtung Harmonisierung durch Service-orientierte Architekturen und die Anwendung kommerzieller Web Services Technologien. In diesem Kontext ist auch die Sicherung von Qualität bzw. entsprechende Vereinbarungen über die Qualität eines Services (QoS) wichtig, da diese vor allem für komplexe Anwendungen aus sensitiven Bereichen, wie der Medizin, unumgänglich sind. Diese Dissertation versucht zur Entwicklung im Grid Computing beizutragen, indem eine Grid Umgebung mit Unterstützung für QoS vorgestellt wird. Die vorgeschlagene Grid Umgebung beinhaltet eine sichere Service-orientierte Infrastruktur, welche auf Web Services Technologien basiert, sowie bedarfsorientiert und automatisiert HPC Anwendungen als Grid Services bereitstellen kann. Die Grid Umgebung zielt auf eine kommerzielle Nutzung ab und unterstützt ein durch den Benutzer initiiertes, fallweises und dynamisches Verhandeln von Serviceverträgen (SLAs). Das Design der QoS Unterstützung ist generisch, jedoch berücksichtigt die Implementierung besonders die Anforderungen von rechenintensiven und zeitkritischen parallelen Anwendungen, bzw. Garantien f¨ur deren Ausführungszeit und Preis. Daher ist die QoS Unterstützung auf Reservierung, anwendungsspezifische Abschätzung und Preisfestsetzung von Ressourcen angewiesen. Eine entsprechende Evaluation demonstriert die Möglichkeiten und das rationale Verhalten der QoS Infrastruktur. Die Grid Infrastruktur und insbesondere die QoS Unterstützung wurde in Forschungs- und Entwicklungsprojekten der EU eingesetzt, welche verschiedene Anwendungen aus dem medizinischen und bio-medizinischen Bereich als Services zur Verfügung stellen. Die EU Projekte GEMSS und Aneurist befassen sich mit fortschrittlichen HPC Anwendungen und global verteilten Daten aus dem Gesundheitsbereich, welche durch Virtualisierungstechniken als Services angeboten werden. Die Benutzung von Gridtechnologie als Basistechnologie im Gesundheitswesen ermöglicht Forschern und Ärzten die Nutzung von Grid Services in deren Arbeitsumfeld, welche letzten Endes zu einer Verbesserung der medizinischen Versorgung führt.Grid computing emerged as a vision for a new computing infrastructure that aims to make computing resources available as easily as electric power through the power grid. Enabling seamless access to globally distributed IT resources allows dispersed users to tackle large-scale problems in science and engineering in unprecedented ways. The rapid development of Grid computing also encouraged standardization, which led to the adoption of a service-oriented paradigm and an increasing use of commercial Web services technologies. Along these lines, service-level agreements and Quality of Service are essential characteristics of the Grid and specifically mandatory for Grid-enabling complex applications from certain domains such as the health sector. This PhD thesis aims to contribute to the development of Grid technologies by proposing a Grid environment with support for Quality of Service. The proposed environment comprises a secure service-oriented Grid infrastructure based on standard Web services technologies which enables the on-demand provision of native HPC applications as Grid services in an automated way and subject to user-defined QoS constraints. The Grid environment adopts a business-oriented approach and supports a client-driven dynamic negotiation of service-level agreements on a case-by-case basis. Although the design of the QoS support is generic, the implementation emphasizes the specific requirements of compute-intensive and time-critical parallel applications, which necessitate on-demand QoS guarantees such as execution time limits and price constraints. Therefore, the QoS infrastructure relies on advance resource reservation, application-specific resource capacity estimation, and resource pricing. An experimental evaluation demonstrates the capabilities and rational behavior of the QoS infrastructure. The presented Grid infrastructure and in particular the QoS support has been successfully applied and demonstrated in EU projects for various applications from the medical and bio-medical domains. The EU projects GEMSS and Aneurist are concerned with advanced e-health applications and globally distributed data sources, which are virtualized by Grid services. Using Grid technology as enabling technology in the health domain allows medical practitioners and researchers to utilize Grid services in their clinical environment which ultimately results in improved healthcare

    Machine learning methods for quantitative structure-property relationship modeling

    Get PDF
    Tese de doutoramento, Informática (Bioinformática), Universidade de Lisboa, Faculdade de Ciências, 2014Due to the high rate of new compounds discovered each day and the morosity/cost of experimental measurements there will always be a significant gap between the number of known chemical compounds and the amount of chemical compounds for which experimental properties are available. This research work is motivated by the fact that the development of new methods for predicting properties and organize huge collections of molecules to reveal certain chemical categories/patterns and select diverse/representative samples for exploratory experiments are becoming essential. This work aims to increase the capability to predict physical, chemical and biological properties, using data mining methods applied to complex non-homogeneous data (chemical structures), for large information repositories. In the first phase of this work, current methodologies in quantitative structure-property modelling were studied. These methodologies attempt to relate a set of selected structure-derived features of a compound to its property using model-based learning. This work focused on solving major issues identified when predicting properties of chemical compounds and on the solutions explored using different molecular representations, feature selection techniques and data mining approaches. In this context, an innovative hybrid approach was proposed in order to improve the prediction power and comprehensibility of QSPR/QSAR problems using Random Forests for feature selection. It is acknowledged that, in general, similar molecules tend to have similar properties; therefore, on the second phase of this work, an instance-based machine learning methodology for predicting properties of compounds using the similarity-based molecular space was developed. However, this type of methodology requires the quantification of structural similarity between molecules, which is often subjective, ambiguous and relies upon comparative judgements, and consequently, there is currently no absolute standard of molecular similarity. In this context, a new similarity method was developed, the non-contiguous atom matching (NAMS), based on the optimal atom alignment using pairwise matching algorithms that take into account both topological profiles and atoms/bonds characteristics. NAMS can then be used for property inference over the molecular metric space using ordinary kriging in order to obtain robust and interpretable predictive results, providing a better understanding of the underlying relationship structure-property.Devido ao crescimento exponencial do número de compostos químicos descobertos diariamente e à morosidade/custo de medições experimentais, existe uma diferença significativa entre o número de compostos químicos conhecidos e a quantidade de compostos para os quais estão disponíveis propriedades experimentais. O desenvolvimento de novos métodos para a previsão de propriedades e organização de grandes coleções de moléculas que permitam revelar certas categorias/padrões químicos e selecionar amostras diversas/representativas para estudos exploratórios estão a tornar-se essenciais. Este trabalho tem como objetivo melhorar a capacidade de prever propriedades físicas, químicas e biológicas, através de métodos de aprendizagem automática aplicados a dados complexos não homogeneos (estruturas químicas), para grandes repositórios de informação. Numa primeira fase deste trabalho, foi feito o estudo de metodologias atualmente aplicadas para a modelação quantitativa entre estruturapropriedades. Estas metodologias tentam relacionar um conjunto seleccionado de descritores estruturais de uma molécula com as suas propriedades, utilizando uma abordagem baseada em modelos. Este trabalho centrou-se em solucionar as principais dificuldades identificadas na previsão de propriedades de compostos químicos e nas soluções exploradas utilizando diferentes representações moleculares, técnicas de seleção de descritores e abordagens de aprendizagem automática. Neste contexto, foi proposta uma abordagem híbrida inovadora para melhorar o capacidade de previsão e compreensão de problemas QSPR/QSAR utilizando o algoritmo "Random Forests" (Florestas Aleatórias) para seleção de descritores. É reconhecido que, em geral, moléculas semelhantes tendem a ter propriedades semelhantes; assim, numa segunda fase deste trabalho foi desenvolvida uma metodologia de aprendizagem automática baseada em instâncias para a previsão de propriedades de compostos químicos utilizando o espaço métrico construído a partir da semelhança estrutural entre moléculas. No entanto, este tipo de metodologia requer a quantificação de semelhança estrutural entre moléculas, o que é muitas vezes uma tarefa subjetiva, ambígua e dependente de julgamentos comparativos e, consequentemente, não existe atualmente nenhum padrão absoluto para definir semelhança molecular. Neste âmbito, foi desenvolvido um novo método de semelhança molecular, o “Non-Contiguous Atom Matching Structural Similarity” (NAMS), que se baseia no alinhamento de átomos utilizando algoritmos de emparelhamento que têm em conta os perfis topológicos das ligações e as características dos átomos e ligações. O espaço métrico molecular construído utilizando o NAMS pode ser aplicado à inferência de propriedades usando uma técnica de interpolação espacial, a "krigagem", que tem em conta a relação espacial entre as instâncias, com o objetivo de se obter uma previsão consistente e interpretável, proporcionando uma melhor compreensão da relação entre estrutura-propriedades.Fundação para a Ciência e a Tecnologia (FCT

    OpenMolGRID, a GRID Based System for Solving Large-Scale Drug Design Problems

    No full text
    In our days, pharmaceutical companies are screening millions of structures in silico. These processes require fast and accurate predictive QSAR models. Unfortunately, at the moment these models do not include information-rich quantum-chemical descriptors, because of their time-consuming calculation procedure. These challenges make indispensable the usage of large-scale QSAR calculations on GRID systems. These “high-throughput ” informatics systems provide the facility to develop QSAR models on a vast number of model compounds in a short time and also the fast application of the novel method on an unprecedently high number of molecules. OpenMolGRID (Open Computing GRID for Molecular Science and Engineering) is going to be one of the first realizations of the GRID technology in drug design. Based on security considerations and its easy plugin technology UNICORE was selected as Grid middleware for the system (for additional info see the contribution of Mathilde Romberg et al., titled ‘Support for Classes of Applications on the Grid’). The OpenMolGRID system is designed to build forward- and reverse-QSAR models based on thousands of different type of descriptors, many of them requiring computation intensive 3D calculations. For reallife testing purposes, 30,000 novel and diverse structures have been synthesized and IC50 values for in vitro human fibroblast cytotoxicity (i.e. the concentration of compound that kills the 50 % of the cells) i
    corecore