Search CORE

3 research outputs found

Distributed computing and farm management with application to the search for heavy gauge bosons using the ATLAS experiment at the LHC (CERN)

Author: Lopez-Perez Juan Antonio
Publication venue: University of Valencia
Publication date: 01/01/2008
Field of study

The Standard Model of particle physics describes the strong, weak, and electromagnetic forces between the fundamental particles of ordinary matter. However, it presents several problems and some questions remain unanswered so it cannot be considered a complete theory of fundamental interactions. Many extensions have been proposed in order to address these problems. Some important recent extensions are the Extra Dimensions theories. In the context of some models with Extra Dimensions of size about

1 TeV^{-}1

, in particular in the ADD model with only fermions confined to a D-brane, heavy Kaluza-Klein excitations are expected, with the same properties as SM gauge bosons but more massive. In this work, three hadronic decay modes of some of such massive gauge bosons, Z* and W*, are investigated using the ATLAS experiment at the Large Hadron Collider (LHC), presently under construction at CERN. These hadronic modes are more difficult to detect than the leptonic ones, but they should allow a measurement of the couplings between heavy gauge bosons and quarks. The events were generated using the ATLAS fast simulation and reconstruction MC program Atlfast coupled to the Monte Carlo generator PYTHIA. We found that for an integrated luminosity of

3 × 10^{5} pb^{-}1

and a heavy gauge boson mass of 2 TeV, the channels Z*->bb and Z*->tt would be difficult to detect because the signal would be very small compared with the expected backgrou nd, although the significance in the case of Z*->tt is larger. In the channel W*->tb , the decay might yield a signal separable from the background and a significance larger than 5 so we conclude that it would be possible to detect this particular mode at the LHC. The analysis was also performed for masses of 1 TeV and we conclude that the observability decreases with the mass. In particular, a significance higher than 5 may be achieved below approximately 1.4, 1.9 and 2.2 TeV for Z*->bb , Z*->tt and W*->tb respectively. The LHC will start to operate in 2008 and collect data in 2009. It will produce roughly 15 Petabytes of data per year. Access to this experimental data has to be provided for some 5,000 scientists working in 500 research institutes and universities. In addition, all data need to be available over the estimated 15-year lifetime of the LHC. The analysis of the data, including comparison with theoretical simulations, requires an enormous computing power. The computing challenges that scientists have to face are the huge amount of data, calculations to perform and collaborators. The Grid has been proposed as a solution for those challenges. The LHC Computing Grid project (LCG) is the Grid used by ATLAS and the other LHC experiments and it is analised in depth with the aim of studying the possible complementary use of it with another Grid project. That is the Berkeley Open Infrastructure for Network C omputing middle-ware (BOINC) developed for the SETI@home project, a Grid specialised in high CPU requirements and in using volunteer computing resources. Several important packages of physics software used by ATLAS and other LHC experiments have been successfully adapted/ported to be used with this platform with the aim of integrating them into the LHC@home project at CERN: Atlfast, PYTHIA, Geant4 and Garfield. The events used in our physics analysis with Atlfast were reproduced using BOINC obtaining exactly the same results. The LCG software, in particular SEAL, ROOT and the external software, was ported to the Solaris/sparc platform to study it's portability in general as well. A testbed was performed including a big number of heterogeneous hardware and software that involves a farm of 100 computers at CERN's computing center (lxboinc) together with 30 PCs from CIEMAT and 45 from schools from Extremadura (Spain). That required a preliminary study, development and creation of components of the Quattor software and configuration management tool to install and manage the lxboinc farm and it also involved the set up of a collaboration between the Spanish research centers and government and CERN. The testbed was successful and 26,597 Grid jobs were delivered, executed and received successfully. We conclude that BOINC and LCG are complementary and useful kinds of Grid that can be used by ATLAS and the other LHC experiments. LCG has very good data distribution, management and storage capabilities that BOINC does not have. In the other hand, BOINC does not need high bandwidth or Internet speed and it also can provide a huge and inexpensive amount of computing power coming from volunteers. In addition, it is possible to send jobs from LCG to BOINC and vice versa. So, possible complementary cases are to use volunteer BOINC nodes when the LCG nodes have too many jobs to do or to use BOINC for high CPU tasks like event generators or reconstructions while concentrating LCG for data analysis

CERN Document Server

Development and application of distributed computing tools for virtual screening of large compound libraries

Author: Aher Yogesh
Publication venue
Publication date: 01/01/2012
Field of study

Im derzeitigen Drug Discovery Prozess ist die Identifikation eines neuen Targetproteins und dessen potenziellen Liganden langwierig, teuer und zeitintensiv. Die Verwendung von in silico Methoden gewinnt hier zunehmend an Bedeutung und hat sich als wertvolle Strategie zur Erkennung komplexer Zusammenhänge sowohl im Bereich der Struktur von Proteinen wie auch bei Bioaktivitäten erwiesen. Die zunehmende Nachfrage nach Rechenleistung im wissenschaftlichen Bereich sowie eine detaillierte Analyse der generierten Datenmengen benötigen innovative Strategien für die effiziente Verwendung von verteilten Computerressourcen, wie z.B. Computergrids. Diese Grids ergänzen bestehende Technologien um einen neuen Aspekt, indem sie heterogene Ressourcen zur Verfügung stellen und koordinieren. Diese Ressourcen beinhalten verschiedene Organisationen, Personen, Datenverarbeitung, Speicherungs- und Netzwerkeinrichtungen, sowie Daten, Wissen, Software und Arbeitsabläufe. Das Ziel dieser Arbeit war die Entwicklung einer universitätsweit anwendbaren Grid-Infrastruktur - UVieCo (University of Vienna Condor pool) -, welche für die Implementierung von akademisch frei verfügbaren struktur- und ligandenbasierten Drug Discovery Anwendungen verwendet werden kann. Firewall- und Sicherheitsprobleme wurden mittels eines virtuellen privaten Netzwerkes gelöst, wohingegen die Virtualisierung der Computerhardware über das CoLinux Konzept ermöglicht wurde. Dieses ermöglicht, dass unter Linux auszuführende Aufträge auf Windows Maschinen laufen können. Die Effektivität des Grids wurde durch Leistungsmessungen anhand sequenzieller und paralleler Aufgaben ermittelt. Als Anwendungsbeispiel wurde die Assoziation der Expression bzw. der Sensitivitätsprofile von ABC-Transportern mit den Aktivitätsprofilen von Antikrebswirkstoffen durch Data-Mining des NCI (National Cancer Institute) Datensatzes analysiert. Die dabei generierten Datensätze wurden für liganden-basierte Computermethoden wie Shape-Similarity und Klassifikationsalgorithmen mit dem Ziel verwendet, P-glycoprotein (P-gp) Substrate zu identifizieren und sie von Nichtsubstraten zu trennen. Beim Erstellen vorhersagekräftiger Klassifikationsmodelle konnte das Problem der extrem unausgeglichenen Klassenverteilung durch Verwendung der „Cost-Sensitive Bagging“ Methode gelöst werden. Applicability Domain Studien ergaben, dass unser Modell nicht nur die NCI Substanzen gut vorhersagen kann, sondern auch für wirkstoffähnliche Moleküle verwendet werden kann. Die entwickelten Modelle waren relativ einfach, aber doch präzise genug um für virtuelles Screening einer großen chemischen Bibliothek verwendet werden zu können. Dadurch könnten P-gp Substrate schon frühzeitig erkannt werden, was möglicherweise nützlich sein kann zur Entfernung von Substanzen mit schlechten ADMET-Eigenschaften bereits in einer frühen Phase der Arzneistoffentwicklung. Zusätzlich wurden Shape-Similarity und Self-organizing Map Techniken verwendet um neue Substanzen in einer hauseigenen sowie einer großen kommerziellen Datenbank zu identifizieren, die ähnlich zu selektiven Serotonin-Reuptake-Inhibitoren (SSRI) sind und Apoptose induzieren können. Die erhaltenen Treffer besitzen neue chemische Grundkörper und können als Startpunkte für Leitstruktur-Optimierung in Betracht gezogen werden. Die in dieser Arbeit beschriebenen Studien werden nützlich sein um eine verteilte Computerumgebung zu kreieren die vorhandene Ressourcen in einer Organisation nutzt, und die für verschiedene Anwendungen geeignet ist, wie etwa die effiziente Handhabung der Klassifizierung von unausgeglichenen Datensätzen, oder mehrstufiges virtuelles Screening.In the current drug discovery process, the identification of new target proteins and potential ligands is very tedious, expensive and time-consuming. Thus, use of in silico techniques is of utmost importance and proved to be a valuable strategy in detecting complex structural and bioactivity relationships. Increased demands of computational power for tremendous calculations in scientific fields and timely analysis of generated piles of data require innovative strategies for efficient utilization of distributed computing resources in the form of computational grids. Such grids add a new aspect to the emerging information technology paradigm by providing and coordinating the heterogeneous resources such as various organizations, people, computing, storage and networking facilities as well as data, knowledge, software and workflows. The aim of this study was to develop a university-wide applicable grid infrastructure, UVieCo (University of Vienna Condor pool) which can be used for implementation of standard structure- and ligand-based drug discovery applications using freely available academic software. Firewall and security issues were resolved with a virtual private network setup whereas virtualization of computer hardware was done using the CoLinux concept in a way to run Linux-executable jobs inside Windows machines. The effectiveness of the grid was assessed by performance measurement experiments using sequential and parallel tasks. Subsequently, the association of expression/sensitivity profiles of ABC transporters with activity profiles of anticancer compounds was analyzed by mining the data from NCI (National Cancer Institute). The datasets generated in this analysis were utilized with ligand-based computational methods such as shape similarity and classification algorithms to identify and separate P-gp substrates from non-substrates. While developing predictive classification models, the problem of imbalanced class distribution was proficiently addressed using the cost-sensitive bagging approach. Applicability domain experiment revealed that our model not only predicts NCI compounds well, but it can also be applied to drug-like molecules. The developed models were relatively simple but precise enough to be applicable for virtual screening of large chemical libraries for the early identification of P-gp substrates which can potentially be useful to remove compounds of poor ADMET properties in an early phase of drug discovery. Additionally, shape-similarity and self-organizing maps techniques were used to screen in-house as well as a large vendor database for identification of novel selective serotonin reuptake inhibitor (SSRI) like compounds to induce apoptosis. The retrieved hits possess novel chemical scaffolds and can be considered as a starting point for lead optimization studies. The work described in this thesis will be useful to create distributed computing environment using available resources within an organization and can be applied to various applications such as efficient handling of imbalanced data classification problems or multistep virtual screening approach

OTHES

Applications Development for the Computational Grid

Author: Abramson David
Publication venue: Monash University, InformationTechnology
Publication date: 01/01/2011
Field of study

University of Queensland eSpace