1,483 research outputs found
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?
The organization and mining of malaria genomic and post-genomic data is
highly motivated by the necessity to predict and characterize new biological
targets and new drugs. Biological targets are sought in a biological space
designed from the genomic data from Plasmodium falciparum, but using also the
millions of genomic data from other species. Drug candidates are sought in a
chemical space containing the millions of small molecules stored in public and
private chemolibraries. Data management should therefore be as reliable and
versatile as possible. In this context, we examined five aspects of the
organization and mining of malaria genomic and post-genomic data: 1) the
comparison of protein sequences including compositionally atypical malaria
sequences, 2) the high throughput reconstruction of molecular phylogenies, 3)
the representation of biological processes particularly metabolic pathways, 4)
the versatile methods to integrate genomic data, biological representations and
functional profiling obtained from X-omic experiments after drug treatments and
5) the determination and prediction of protein structures and their molecular
docking with drug candidate structures. Progresses toward a grid-enabled
chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
A posteriori metadata from automated provenance tracking: Integration of AiiDA and TCOD
In order to make results of computational scientific research findable,
accessible, interoperable and re-usable, it is necessary to decorate them with
standardised metadata. However, there are a number of technical and practical
challenges that make this process difficult to achieve in practice. Here the
implementation of a protocol is presented to tag crystal structures with their
computed properties, without the need of human intervention to curate the data.
This protocol leverages the capabilities of AiiDA, an open-source platform to
manage and automate scientific computational workflows, and TCOD, an
open-access database storing computed materials properties using a well-defined
and exhaustive ontology. Based on these, the complete procedure to deposit
computed data in the TCOD database is automated. All relevant metadata are
extracted from the full provenance information that AiiDA tracks and stores
automatically while managing the calculations. Such a protocol also enables
reproducibility of scientific data in the field of computational materials
science. As a proof of concept, the AiiDA-TCOD interface is used to deposit 170
theoretical structures together with their computed properties and their full
provenance graphs, consisting in over 4600 AiiDA nodes
TOWARDS A GENERIC ONTOLOGY FOR SOLAR IRRADIANCE FORECASTING
The growth of solar energy resources in recent years has led to increased calls for accurate forecasts of solar irradiance for the reliable and sustainable integration of solar into the national grid. A growing body of academic research has developed models for forecasting solar irradiance, identified metrics for comparing solar forecasts, and described applications and end users of solar forecasts.
In recent years, many disciplines are developing ontologies to facilitate better communication, improve inter-operabiity and refine knowledge reuse by experts and users of the domain. Ontologies are explicit and formal vocabulary of terms and their relationships. This report describes a step towards using ontologies to describe the knowledge, concepts and relationships in the domain of solar irradiance forecasting to develop a shared understanding for diverse stakeholders that interact with the domain. A preliminary ontology on solar irradiance forecasting was created and validated on three use cases
Automated Bidding in Computing Service Markets. Strategies, Architectures, Protocols
This dissertation contributes to the research on Computational Mechanism Design by providing novel theoretical and software models - a novel bidding strategy called Q-Strategy, which automates bidding processes in imperfect information markets, a software framework for realizing agents and bidding strategies called BidGenerator and a communication protocol called MX/CS, for expressing and exchanging economic and technical information in a market-based scheduling system
Recommended from our members
MapReduce based RDF assisted distributed SVM for high throughput spam filtering
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses.
Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart.
Effectively exploiting large scale, ‘Cloud’ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure.
The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin
Automating Security Risk and Requirements Management for Cyber-Physical Systems
Cyber-physische Systeme ermöglichen zahlreiche moderne Anwendungsfälle und Geschäftsmodelle wie vernetzte Fahrzeuge, das intelligente Stromnetz (Smart Grid) oder das industrielle Internet der Dinge.
Ihre Schlüsselmerkmale Komplexität, Heterogenität und Langlebigkeit machen den langfristigen Schutz dieser Systeme zu einer anspruchsvollen, aber unverzichtbaren Aufgabe. In der physischen Welt stellen die Gesetze der Physik einen festen Rahmen für Risiken und deren Behandlung dar.
Im Cyberspace gibt es dagegen keine vergleichbare Konstante, die der Erosion von Sicherheitsmerkmalen entgegenwirkt. Hierdurch können sich bestehende Sicherheitsrisiken laufend ändern und neue entstehen.
Um Schäden durch böswillige Handlungen zu verhindern, ist es notwendig, hohe und unbekannte Risiken frühzeitig zu erkennen und ihnen angemessen zu begegnen.
Die Berücksichtigung der zahlreichen dynamischen sicherheitsrelevanten Faktoren erfordert einen neuen Automatisierungsgrad im Management von Sicherheitsrisiken und -anforderungen, der über den aktuellen Stand der Wissenschaft und Technik hinausgeht.
Nur so kann langfristig ein angemessenes, umfassendes und konsistentes Sicherheitsniveau erreicht werden.
Diese Arbeit adressiert den dringenden Bedarf an einer Automatisierungsmethodik bei der Analyse von Sicherheitsrisiken sowie der Erzeugung und dem Management von Sicherheitsanforderungen für Cyber-physische Systeme. Das dazu vorgestellte Rahmenwerk umfasst drei Komponenten: (1) eine modelbasierte Methodik zur Ermittlung und Bewertung von Sicherheitsrisiken; (2) Methoden zur Vereinheitlichung, Ableitung und Verwaltung von Sicherheitsanforderungen sowie (3) eine Reihe von Werkzeugen und Verfahren zur Erkennung und Reaktion auf sicherheitsrelevante Situationen.
Der Schutzbedarf und die angemessene Stringenz werden durch die Sicherheitsrisikobewertung mit Hilfe von Graphen und einer sicherheitsspezifischen Modellierung ermittelt und bewertet.
Basierend auf dem Modell und den bewerteten Risiken werden anschließend fundierte Sicherheitsanforderungen zum Schutz des Gesamtsystems und seiner Funktionalität systematisch abgeleitet und in einer einheitlichen, maschinenlesbaren Struktur formuliert. Diese maschinenlesbare Struktur ermöglicht es, Sicherheitsanforderungen automatisiert entlang der Lieferkette zu propagieren.
Ebenso ermöglicht sie den effizienten Abgleich der vorhandenen Fähigkeiten mit externen Sicherheitsanforderungen aus Vorschriften, Prozessen und von Geschäftspartnern.
Trotz aller getroffenen Maßnahmen verbleibt immer ein gewisses Restrisiko einer Kompromittierung, worauf angemessen reagiert werden muss.
Dieses Restrisiko wird durch Werkzeuge und Prozesse adressiert, die sowohl die lokale und als auch die großräumige Erkennung, Klassifizierung und Korrelation von Vorfällen verbessern. Die Integration der Erkenntnisse aus solchen Vorfällen in das Modell führt häufig zu aktualisierten Bewertungen, neuen Anforderungen und verbessert weitere Analysen.
Abschließend wird das vorgestellte Rahmenwerk anhand eines aktuellen Anwendungsfalls aus dem Automobilbereich demonstriert.Cyber-Physical Systems enable various modern use cases and business models such as connected vehicles, the Smart (power) Grid, or the Industrial Internet of Things.
Their key characteristics, complexity, heterogeneity, and longevity make the long-term protection of these systems a demanding but indispensable task.
In the physical world, the laws of physics provide a constant scope for risks and their treatment.
In cyberspace, on the other hand, there is no such constant to counteract the erosion of security features.
As a result, existing security risks can constantly change and new ones can arise.
To prevent damage caused by malicious acts, it is necessary to identify high and unknown risks early and counter them appropriately.
Considering the numerous dynamic security-relevant factors requires a new level of automation in the management of security risks and requirements, which goes beyond the current state of the art.
Only in this way can an appropriate, comprehensive, and consistent level of security be achieved in the long term.
This work addresses the pressing lack of an automation methodology for the security-risk assessment as well as the generation and management of security requirements for Cyber-Physical Systems.
The presented framework accordingly comprises three components: (1) a model-based security risk assessment methodology, (2) methods to unify, deduce and manage security requirements, and (3) a set of tools and procedures to detect and respond to security-relevant situations.
The need for protection and the appropriate rigor are determined and evaluated by the security risk assessment using graphs and a security-specific modeling. Based on the model and the assessed risks, well-founded security requirements for protecting the overall system and its functionality are systematically derived and formulated in a uniform, machine-readable structure.
This machine-readable structure makes it possible to propagate security requirements automatically along the supply chain.
Furthermore, they enable the efficient reconciliation of present capabilities with external security requirements from regulations, processes, and business partners.
Despite all measures taken, there is always a slight risk of compromise, which requires an appropriate response.
This residual risk is addressed by tools and processes that improve the local and large-scale detection, classification, and correlation of incidents.
Integrating the findings from such incidents into the model often leads to updated assessments, new requirements, and improves further analyses.
Finally, the presented framework is demonstrated by a recent application example from the automotive domain
Living ontologies: collaborative knowledge structuring on the Internet
This thesis discusses the issues involving the support of Living Ontologies: collaborating in the construction and maintenance of ontologies using the Internet.
Ontologies define the concepts used in describing a domain: they are used by knowledge engineers as reusable components of knowledge-based systems. Knowledge engineers create ontologies by eliciting information from domain experts. However, experts often have different conceptualisations of a domain and knowledge engineers often have different ways of formalising their conceptualisations.
Taking a constructivist perspective, constructing ontologies from multiple conflicting conceptualisations can be seen as a design activity, in which knowledge engineers make choices according to the context in which the representation will be used. Based on this theory, a methodology for collaboratively constructing ontologies might involve comparing differing conceptualisations and using these comparisons to initiate discussion, changes to the conceptualisations and the development of criteria against which they can be evaluated.
APECKS (Adaptive Presentation Environment for Collaborative Knowledge Structuring) is designed to support this methodology. APECKS aims not only to support the collaborative construction of ontologies but also to use ontologies to present information to its users adaptively within a virtual environment. It demonstrates a number of innovations over conventional ontology servers, such as prompted knowledge elicitation from domain experts, automated comparisons between ontologies, the creation of design rationales and change tracking.
A small evaluation of APECKS has shown that it is usable by domain experts and that automated comparisons between ontologies can be used to initiate alterations, investigations of others' conceptualisations and as a basis for discussion. Possible future development of APECKS includes tighter integration with a virtual environment and with other networked knowledge-based tools. Further research is also needed to develop the methodology on which APECKS is based, by investigating ways of comparing, combining and discussing ontologies
New Fundamental Technologies in Data Mining
The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining
- …