2,557 research outputs found

    A framework for a multi-agent planning support system : principles and illustrations

    Get PDF

    AVOIDIT IRS: An Issue Resolution System To Resolve Cyber Attacks

    Get PDF
    Cyber attacks have greatly increased over the years and the attackers have progressively improved in devising attacks against specific targets. Cyber attacks are considered a malicious activity launched against networks to gain unauthorized access causing modification, destruction, or even deletion of data. This dissertation highlights the need to assist defenders with identifying and defending against cyber attacks. In this dissertation an attack issue resolution system is developed called AVOIDIT IRS (AIRS). AVOIDIT IRS is based on the attack taxonomy AVOIDIT (Attack Vector, Operational Impact, Defense, Information Impact, and Target). Attacks are collected by AIRS and classified into their respective category using AVOIDIT.Accordingly, an organizational cyber attack ontology was developed using feedback from security professionals to improve the communication and reusability amongst cyber security stakeholders. AIRS is developed as a semi-autonomous application that extracts unstructured external and internal attack data to classify attacks in sequential form. In doing so, we designed and implemented a frequent pattern and sequential classification algorithm associated with the five classifications in AVOIDIT. The issue resolution approach uses inference to educate the defender on the plausible cyber attacks. The AIRS can work in conjunction with an intrusion detection system (IDS) to provide a heuristic to cyber security breaches within an organization. AVOIDIT provides a framework for classifying appropriate attack information, which is fundamental in devising defense strategies against such cyber attacks. The AIRS is further used as a knowledge base in a game inspired defense architecture to promote game model selection upon attack identification. Future work will incorporate honeypot attack information to improve attack identification, classification, and defense propagation.In this dissertation, 1,025 common vulnerabilities and exposures (CVEs) and over 5,000 lines of log files instances were captured in the AIRS for analysis. Security experts were consulted to create rules to extract pertinent information and algorithms to correlate identified data for notification. The AIRS was developed using the Codeigniter [74] framework to provide a seamless visualization tool for data mining regarding potential cyber attacks relative to web applications. Testing of the AVOIDIT IRS revealed a recall of 88%, precision of 93%, and a 66% correlation metric

    SVR, General Noise Functions and Deep Learning. General Noise Deep Models

    Full text link
    Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingenieria Informática. Fecha de Lectura: 20-01-2023El aprendizaje automático, ML por sus siglas en inglés, es una rama de la inteligencia artifcial que permite construir sistemas que aprendan a resolver una tarea automáticamente a partir de los datos, en el sentido de que no necesitan ser programados explícitamente con las reglas o el método para hacerlo. ML abarca diferentes tipos de problemas; Uno de ellos, la regresión, implica predecir un resultado numérico y será el foco de atención de esta tesis. Entre los modelos ML utilizados para la regresión, las máquinas de vectores soporte o Support Vector Machines, SVM, son uno de los principales algoritmos de eleccón, habitualmente llamado Support Vector Regression, SVR, cuando se aplica a tareas de regresión. Este tipo de modelos generalmente emplea la función de pérdida ϵ−insensitive, lo que implica asumir una distribución concreta en el ruido presente en los datos, pero recientemente se han propuesto funciones de coste de ruido general para SVR. Estas funciones de coste deberían ser más efectivas cuando se aplican a problemas de regresión cuya distribución de ruido subyacente sigue la asumida para esa función de coste particular. Sin embargo, el uso de estas funciones generales, con la disparidad en las propiedades matemáticas como la diferenciabilidad que implica, hace que el método de optimización estándar utilizado en SVR, optimización mínima secuencial o SMO, ya no sea una posibilidad. Además, posiblemente el principal inconveniente de los modelos SVR es que pueden sufrir problemas de escalabilidad al trabajar con datos de gran tamaño, una situación común en la era de los grandes datos. Por otro lado, los modelos de Aprendizaje Profundo o Deep Learning, DL, pueden manejar grandes conjuntos de datos con mayor facilidad, siendo esta una de las razones fundamentales para explicar su reciente popularidad. Finalmente, aunque los modelos SVR se han estudiado a fondo, la construcción de intervalos de error para ellos parece haber recibido menos atención y sigue siendo un problema sin resolver. Esta es una desventaja signifcativa, ya que en muchas aplicaciones que implican resolver un problema de regresión no solo es util una predicción precisa, sino que también un intervalo de confianza asociado a esta predicción puede ser extremadamente valioso. Teniendo en cuenta todos estos factores, esta tesis tiene cuatro objetivos principales: Primero, proponer un marco para entrenar Modelos SVR de ruido general utilizando como método de optimización Naive Online R Minimization Algorithm, NORMA. En segundo lugar, proporcionar un método para construir modelos DL de ruido general que combinen el procesamiento de características altamente no lineales de los modelos DL con el potencial predictivo de usar funciones de pérdida de ruido general, de las cuales la función de pérdida ϵ−insensitive utilizada en SVR es solo un ejemplo particular. Tercero, describir un enfoque directo para construir intervalos de error para SVR u otros modelos de regresión, basado en asumir la hipótesis de que los residuos siguen una función de distribución concreta. Y finalmente, unificar los tres objetivos anteriores en un marco de modelos unico que permita construir modelos profundos de ruido general para la predicción en problemas de regresión con la posibilidad de obtener intervalos de confianza o intervalos de error asociado

    Privacy in characterizing and recruiting patients for IoHT-aided digital clinical trials

    Get PDF
    Nowadays there is a tremendous amount of smart and connected devices that produce data. The so-called IoT is so pervasive that its devices (in particular the ones that we take with us during all the day - wearables, smartphones...) often provide some insights on our lives to third parties. People habitually exchange some of their private data in order to obtain services, discounts and advantages. Sharing personal data is commonly accepted in contexts like social networks but individuals suddenly become more than concerned if a third party is interested in accessing personal health data. The healthcare systems worldwide, however, begun to take advantage of the data produced by eHealth solutions. It is clear that while on one hand the technology proved to be a great ally in the modern medicine and can lead to notable benefits, on the other hand these processes pose serious threats to our privacy. The process of testing, validating and putting on the market a new drug or medical treatment is called clinical trial. These trials are deeply impacted by the technological advancements and greatly benefit from the use of eHealth solutions. The clinical research institutes are the entities in charge of leading the trials and need to access as much health data of the patients as possible. However, at any phase of a clinical trial, the personal information of the participants should be preserved and maintained private as long as possible. During this thesis, we will introduce an architecture that protects the privacy of personal data during the first phases of digital clinical trials (namely the characterization phase and the recruiting phase), allowing potential participants to freely join trials without disclosing their personal health information without a proper reward and/or prior agreement. We will illustrate what is the trusted environment that is the most used approach in eHealth and, later, we will dig into the untrusted environment where the concept of privacy is more challenging to protect while maintaining usability of data. Our architecture maintains the individuals in full control over the flow of their personal health data. Moreover, the architecture allows the clinical research institutes to characterize the population of potentiant users without direct access to their personal data. We validated our architecture with a proof of concept that includes all the involved entities from the low level hardware up to the end application. We designed and realized the hardware capable of sensing, processing and transmitting personal health data in a privacy preserving fashion that requires little to none maintenance

    A review and critique of UK housing stock energy models, modelling approaches and data sources

    Get PDF
    The UK housing stock is responsible for some 27% of national energy demand and associated carbon dioxide emissions. 80% of this energy demand is due to heating (60%) and domestic hot water (20%), the former reflecting the poor average thermal integrity of the envelope of the homes comprising this stock. To support the formulation of policies and strategies to decarbonise the UK housing stock, a large number of increasingly sophisticated Housing Stock En- ergy Models (HSEMs) have been developed throughout the past 25 years. After describing the sources of data and the spatio-temporal granularity with which these data are available to represent this stock, as well as the physical and social phenomena that are modelled and the range of strategies employed to do so, this paper evaluates the 29 HSEMs that have been developed and deployed in the UK. In this we consider the models’ predictive accuracy, predictive sensitivity to design parameters, versatility, computational e ciency, the reproducibility of predictions and software usability as well as the models’ transparency (how open they are) and modularity. We also discuss their comprehensiveness. From this evaluation, we conclude that current HSEMs are lacking in transparency and modularity, they are limited in their scope and employ simplistic models that limit their utility; in particular, relating to the modelling of heat flow and in the modelling of household behaviours relating to investment decisions and energy using practices. There is a need for an open-source and modular dynamic housing stock energy modelling platform that addresses current limitations, can be readily updated as new (e.g. housing survey) calibration data is released and be readily extended by the modelling community at large: improving upon the utilisation of scarce developmental resources. This would represent a consid- erable step forward in the formulation of housing stock decarbonisation policy that is informed by sound evidence

    A survey of the application of soft computing to investment and financial trading

    Get PDF

    Data-driven and production-oriented tendering design using artificial intelligence

    Get PDF
    Construction projects are facing an increase in requirements since the projects are getting larger, more technology is integrated into the buildings, and new sustainability and CO2 equivalent emissions requirements are introduced. As a result, requirement management quickly gets overwhelming, and instead of having systematic requirement management, the construction industry tends to trust craftsmanship. One method for a more systematic requirement management approach successful in other industries is the systems engineering approach, focusing on requirement decomposition and linking proper verifications and validations. This research project explores if a systems engineering approach, supported by natural language processing techniques, can enable more systematic requirement management in construction projects and facilitate knowledge transfer from completed projects to new tendering projects.The first part of the project explores how project requirements can be extracted, digitised, and analysed in an automated way and how this can benefit the tendering specialists. The study is conducted by first developing a work support tool targeting tendering specialists and then evaluating the challenges and benefits of such a tool through a workshop and surveys. The second part of the project explores inspection data generated in production software as a requirement and quality verification method. First, a dataset containing over 95000 production issues is examined to understand the data quality level of standardisation. Second, a survey addressing production specialists evaluates the current benefits of digital inspection reporting. Third, future benefits of using inspection data for knowledge transfers are explored by applying the Knowledge Discovery in Databases method and clustering techniques. The results show that applying natural language processing techniques can be a helpful tool for analysing construction project requirements, facilitating the identification of essential requirements, and enabling benchmarking between projects. The results from the clustering process suggested in this thesis show that inspection data can be used as a knowledge base for future projects and quality improvement within a project-based organisation. However, higher data quality and standardisation would benefit the knowledge-generation process.This research project provides insights into how artificial intelligence can facilitate knowledge transfer, enable data-informed design choices in tendering projects, and automate the requirements analysis in construction projects as a possible step towards more systematic requirements management

    I2ECR: Integrated and Intelligent Environment for Clinical Research

    Get PDF
    Clinical trials are designed to produce new knowledge about a certain disease, drug or treatment. During these studies, a huge amount of data is collected about participants, therapies, clinical procedures, outcomes, adverse events and so on. A multicenter, randomized, phase III clinical trial in Hematology enrolls up to hundreds of subjects and evaluates post-treatment outcomes on stratified sub- groups of subjects for a period of many years. Therefore, data collection in clinical trials is becoming complex, with huge amount of clinical and biological variables. Outside the medical field, data warehouses (DWs) are widely employed. A Data Ware-house is a “collection of integrated, subject-oriented databases designed to support the decision-making process”. To verify whether DWs might be useful for data quality and association analysis, a team of biomedical engineers, clinicians, biologists and statisticians developed the “I2ECR” project. I2ECR is an Integrated and Intelligent Environment for Clinical Research where clinical and omics data stand together for clinical use (reporting) and for generation of new clinical knowledge. I2ECR has been built from the “MCL0208” phase III, prospective, clinical trial, sponsored by the Fondazione Italiana Linfomi (FIL); this is actually a translational study, accounting for many clinical data, along with several clinical prognostic indexes (e.g. MIPI - Mantle International Prognostic Index), pathological information, treatment and outcome data, biological assessments of disease (MRD - Minimal Residue Disease), as well as many biological, ancillary studies, such as Mutational Analysis, Gene Expression Profiling (GEP) and Pharmacogenomics. In this trial forty-eight Italian medical centers were actively involved, for a total of 300 enrolled subjects. Therefore, I2ECR main objectives are: • to propose an integration project on clinical and molecular data quality concepts. The application of a clear row-data analysis as well as clinical trial monitoring strategies to implement a digital platform where clinical, biological and “omics” data are imported from different sources and well-integrated in a data- ware-house • to be a dynamic repository of data congruency quality rules. I2ECR allows to monitor, in a semi-automatic manner, the quality of data, in relation to the clinical data imported from eCRFs (electronic Case Report Forms) and from biologic and mutational datasets internally edited by local laboratories. Therefore, I2ECR will be able to detect missing data and mistakes derived from non-conventional data- entry activities by centers. • to provide to clinical stake-holders a platform from where they can easily design statistical and data mining analysis. The term Data Mining (DM) identifies a set of tools to searching for hidden patterns of interest in large and multivariate datasets. The applications of DM techniques in the medical field range from outcome prediction and patient classification to genomic medicine and molecular biology. I2ECR allows to clinical stake-holders to propose innovative methods of supervised and unsupervised feature extraction, data classification and statistical analysis on heterogeneous datasets associated to the MCL0208 clinical trial. Although MCL0208 study is the first example of data-population of I2ECR, the environment will be able to import data from clinical studies designed for other onco-hematologic diseases, too

    Using Georeferenced Data in Social Science Survey Research: The Method of Spatial Linking and Its Application with the German General Social Survey and the GESIS Panel

    Get PDF
    This book demonstrates the use of georeferenced data for social science survey research which builds upon survey data enriched with geo-coordinates. It reviews the prerequisites and challenges of applying these data to different social science research questions, highlighting the different branches of an interdisciplinary effort. At the center of this presentation is the method of spatial linking: the combination of georeferenced survey data with information from auxiliary geospatial data sources. A collection of spatial linking methods is applied in this book’s empirical applications which underline these methods’ flexibility in different social science sub-disciplines, such as health and family, political attitudes, and environmental inequalities. For this purpose, georeferenced survey data from the German General Social Survey (GGSS) 2014 and the GESIS Panel are used. These empirical applications are part of an emerging field of research for social scientists, requiring new analytic skills from diverse and foreign disciplines, like ecology and engineering. Navigating the organizational and technical requirements for the analysis of georeferenced survey data enables researchers to answer new and innovative research questions.Dieses Buch beschäftigt sich mit der Nutzung georeferenzierter Daten in der sozialwissenschaftlichen Umfrageforschung, deren Ausgangspunkt Umfragedaten sind, die mit Geokoordinaten angereichert wurden. Es widmet sich den Voraussetzungen und Herausforderungen, solche Daten für verschiedene sozialwissenschaftliche Fragestellungen nutzbar zu machen und betont dabei die verschiedenen interdisziplinären Verzweigungen dieses Unterfangens. Im Mittelpunkt der Präsentation steht die Methode der räumlichen Verknüpfung: die Kombination georeferenzierter Umfragedaten mit Informationen aus externen Geodatenquellen. Anhand mehrerer, aus unterschiedlichen Subdisziplinen der Sozialwissenschaften stammender empirischer Anwendungen im Bereich Familie und Gesundheit, politische Einstellungen sowie Umwelt und Ungleichheit wird die Flexibilität der Methode in Form verschiedener räumlicher Verknüpfungen betont. Dazu werden georeferenzierte Umfragedaten der Allgemeinen Bevölkerungsumfrage Sozialwissenschaften (ALLBUS) 2014 und dem GESIS Panel 2014 verwendet. Diese empirischen Anwendungen sind Teil eines aufstrebenden Forschungsfelds für Sozialforschende, welches neue analytische Fertigkeiten aus verschiedenen anderen Fachbereichen wie der Ökologie oder des Ingenieurswesen erfordert. Werden die organisatorischen und technischen Anforderungen zur Analyse georeferenzierter Umfragedaten gemeistert, eröffnet sich Forschenden die Möglichkeit, neue und innovative Fragestellungen zu beantworten

    Usercentric Operational Decision Making in Distributed Information Retrieval

    Get PDF
    Information specialists in enterprises regularly use distributed information retrieval (DIR) systems that query a large number of information retrieval (IR) systems, merge the retrieved results, and display them to users. There can be considerable heterogeneity in the quality of results returned by different IR servers. Further, because different servers handle collections of different sizes and have different processing and bandwidth capacities, there can be considerable heterogeneity in their response times. The broker in the DIR system has to decide which servers to query, how long to wait for responses, and which retrieved results to display based on the benefits and costs imposed on users. The benefit of querying more servers and waiting longer is the ability to retrieve more documents. The costs may be in the form of access fees charged by IR servers or user’s cost associated with waiting for the servers to respond. We formulate the broker’s decision problem as a stochastic mixed-integer program and present analytical solutions for the problem. Using data gathered from FedStats—a system that queries IR engines of several U.S. federal agencies—we demonstrate that the technique can significantly increase the utility from DIR systems. Finally, simulations suggest that the technique can be applied to solve the broker’s decision problem under more complex decision environments
    corecore