1,272 research outputs found

    A Polyhedral Study of Mixed 0-1 Set

    Get PDF
    We consider a variant of the well-known single node fixed charge network flow set with constant capacities. This set arises from the relaxation of more general mixed integer sets such as lot-sizing problems with multiple suppliers. We provide a complete polyhedral characterization of the convex hull of the given set

    Temporal Information in Data Science: An Integrated Framework and its Applications

    Get PDF
    Data science is a well-known buzzword, that is in fact composed of two distinct keywords, i.e., data and science. Data itself is of great importance: each analysis task begins from a set of examples. Based on such a consideration, the present work starts with the analysis of a real case scenario, by considering the development of a data warehouse-based decision support system for an Italian contact center company. Then, relying on the information collected in the developed system, a set of machine learning-based analysis tasks have been developed to answer specific business questions, such as employee work anomaly detection and automatic call classification. Although such initial applications rely on already available algorithms, as we shall see, some clever analysis workflows had also to be developed. Afterwards, continuously driven by real data and real world applications, we turned ourselves to the question of how to handle temporal information within classical decision tree models. Our research brought us the development of J48SS, a decision tree induction algorithm based on Quinlan's C4.5 learner, which is capable of dealing with temporal (e.g., sequential and time series) as well as atemporal (such as numerical and categorical) data during the same execution cycle. The decision tree has been applied into some real world analysis tasks, proving its worthiness. A key characteristic of J48SS is its interpretability, an aspect that we specifically addressed through the study of an evolutionary-based decision tree pruning technique. Next, since a lot of work concerning the management of temporal information has already been done in automated reasoning and formal verification fields, a natural direction in which to proceed was that of investigating how such solutions may be combined with machine learning, following two main tracks. First, we show, through the development of an enriched decision tree capable of encoding temporal information by means of interval temporal logic formulas, how a machine learning algorithm can successfully exploit temporal logic to perform data analysis. Then, we focus on the opposite direction, i.e., that of employing machine learning techniques to generate temporal logic formulas, considering a natural language processing scenario. Finally, as a conclusive development, the architecture of a system is proposed, in which formal methods and machine learning techniques are seamlessly combined to perform anomaly detection and predictive maintenance tasks. Such an integration represents an original, thrilling research direction that may open up new ways of dealing with complex, real-world problems.Data science is a well-known buzzword, that is in fact composed of two distinct keywords, i.e., data and science. Data itself is of great importance: each analysis task begins from a set of examples. Based on such a consideration, the present work starts with the analysis of a real case scenario, by considering the development of a data warehouse-based decision support system for an Italian contact center company. Then, relying on the information collected in the developed system, a set of machine learning-based analysis tasks have been developed to answer specific business questions, such as employee work anomaly detection and automatic call classification. Although such initial applications rely on already available algorithms, as we shall see, some clever analysis workflows had also to be developed. Afterwards, continuously driven by real data and real world applications, we turned ourselves to the question of how to handle temporal information within classical decision tree models. Our research brought us the development of J48SS, a decision tree induction algorithm based on Quinlan's C4.5 learner, which is capable of dealing with temporal (e.g., sequential and time series) as well as atemporal (such as numerical and categorical) data during the same execution cycle. The decision tree has been applied into some real world analysis tasks, proving its worthiness. A key characteristic of J48SS is its interpretability, an aspect that we specifically addressed through the study of an evolutionary-based decision tree pruning technique. Next, since a lot of work concerning the management of temporal information has already been done in automated reasoning and formal verification fields, a natural direction in which to proceed was that of investigating how such solutions may be combined with machine learning, following two main tracks. First, we show, through the development of an enriched decision tree capable of encoding temporal information by means of interval temporal logic formulas, how a machine learning algorithm can successfully exploit temporal logic to perform data analysis. Then, we focus on the opposite direction, i.e., that of employing machine learning techniques to generate temporal logic formulas, considering a natural language processing scenario. Finally, as a conclusive development, the architecture of a system is proposed, in which formal methods and machine learning techniques are seamlessly combined to perform anomaly detection and predictive maintenance tasks. Such an integration represents an original, thrilling research direction that may open up new ways of dealing with complex, real-world problems

    A framework for planning of offshore wind energy projects based on multi-objective optimisation and multi-criteria decision analysis.

    Get PDF
    The wind industry is determined to lower the costs of producing energy in all phases of the offshore wind project. During 2015–2016, projects achieved a levelized cost of energy (LCOE) of £97 and more recently it was announced that Ørsted guaranteed £57.5/MWh. Significant price increases on structural materials directly impact on larger scale wind projects, the overall cost of turbines, establishing effective supply chains, improving the consent procedures for new developments, governmental mechanisms and support, improving grid connections and finally reducing overall uncertainty and costs etc. The most important decisions at the planning stage of new investment are the selection of a profitable, cost-effective suitable offshore location and a support structure type, which greatly impact on the overall Life Cycle Costs (LCC). This research aims to introduce and apply a scalable framework to reveal and select the optimal offshore location deployment and support structure in Round 3 zones in the UK by considering the interplay of LCC aspects at the planning stage of development. This research produced a portfolio of five studies while developing the framework above. First, a comparative Political Economic Social Technological Legal Environmental (PESTLE) analysis on wind energy was performed. The analysis focused on Europe, Germany, the UK and Greece, where the UK was selected in this research as the world leader in offshore wind energy. Second, three state-of-the-art Multi-Objective Optimisation (MOO) algorithms were employed to discover optimum locations for an offshore wind farm. The 7-objective optimisation problem comprises of some of the most important techno-economic LCC factors that are directly linked to the physical aspects of each site. The results of Non-dominated Sorting Genetic Algorithm (NSGA II), NSGA III and SPEA 2 algorithms follow a similar trend, where NSGA III demonstrated its suitability by revealing more uniform and clear optimum non-dominated solutions, also known as Pareto Front (PF), because of its main design compared to the other optimisers. Based on their frequency of appearance in the PF solutions, Seagreen Alpha, Seagreen Bravo, Teesside C, Teesside D, and the Celtic Array South West Potential development Area were discovered as the most appropriate. Since PF includes solutions from all regions, this provides the developer with the flexibility to accordingly assign costs in different development phases, as required, and to choose whether to invest the available budget on the installation or the maintenance stage of the project. Third, in order to reveal optimum locations for UK Round 3 offshore zones and each zone individually, three different wind farm layouts and four types of turbines were considered in an 8-objective formulation, where five LCC factors are directly linked to the physical aspects and restrictions of each location. NSGA II discovered Moray Firth Eastern Development Area 1, Seagreen Alpha, Hornsea Project One, East Anglia One and Norfolk Boreas in the PF solutions. Although layouts 1 and 2 were mainly selected as optimum solutions, the extreme case (layout 3) also appeared in the PF a few times. All this demonstrates the scalability and effectiveness of the framework. Fourth, the effectiveness of coupling MOO and Multi-Criteria Decision Making (MCDM) methods is demonstrated, so as to select the optimum wind farm Round 3 location in order to help stakeholders with investment decisions. A process on the criteria selection is also introduced, and seven conflicting criteria are considered by using the two variations of Technique for the Order of Preference by Similarity to the Ideal Solution (TOPSIS) in order to rank the optimum locationsthat were discovered by NSGA II. From the prioritisation list, Seagreen Alpha was found as the best option, three times more preferable than Moray Firth Eastern Development Area 1. Fifth, experts‘ opinions were employed in an MCDM process to select the support structure type in an offshore wind farm. For comparison, six deterministic MCDM methods and their stochastic expansion were employed; WSM, WPM, TOPSIS, AHP, ELECTRE I and PROMETHEE I in order to account for uncertainties systematically. It was shown that the methods can relate to each other and can deliver similar results. The jacket and monopile support structures were ranked first in most deterministic and stochastic approaches. Overall, the effectiveness of the introduced research framework to meet the aim of the research is demonstrated. The framework combines a) a prototype techno-economic model for offshore wind farm deployment by using the LCC and geospatial analysis, b) MOO by using NSGA II and c) survey data from real-world experts within MCDM by using a deterministic and stochastic version of TOPSIS.EngD in Renewable Energy Marine Structures (REMS

    Report from the Tri-Agency Cosmological Simulation Task Force

    Full text link
    The Tri-Agency Cosmological Simulations (TACS) Task Force was formed when Program Managers from the Department of Energy (DOE), the National Aeronautics and Space Administration (NASA), and the National Science Foundation (NSF) expressed an interest in receiving input into the cosmological simulations landscape related to the upcoming DOE/NSF Vera Rubin Observatory (Rubin), NASA/ESA's Euclid, and NASA's Wide Field Infrared Survey Telescope (WFIRST). The Co-Chairs of TACS, Katrin Heitmann and Alina Kiessling, invited community scientists from the USA and Europe who are each subject matter experts and are also members of one or more of the surveys to contribute. The following report represents the input from TACS that was delivered to the Agencies in December 2018.Comment: 36 pages, 3 figures. Delivered to NASA, NSF, and DOE in Dec 201

    Artificial intelligence for porous organic cages

    Get PDF
    Porous organic cages are a novel class of molecules with many promising applications, including in separation, sensing, catalysis and gas storage. Despite great promise, discovery of these materials is hampered by a lack of computational tools for exploring their chemical space, and significant expense associated with prediction of their properties. This results in significant synthetic effort being directed toward molecules which do not have targeted properties. This thesis presents multiple computational tools which can aid the discovery and design of these materials by increasing the number of synthetic candidates which are likely to exhibit desired, targeted properties. Firstly, a broadly applicable methodology for the construction of computational models of materials is presented. This facilitates the automated modelling and screening of materials that would otherwise have to be carried out in a more labour-intensive way. Secondly, an evolutionary algorithm is implemented and applied to the design of porous organic cages. The algorithm is capable of producing cages closely matching user-defined design criteria, and its implementation is designed to allow future applications in other fields of material design. Finally, machine learning is used to accurately predict properties of porous organic cages, orders of magnitude faster than has been possible with traditional, simulation-based approaches.Open Acces

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    Algorithms to Explore the Structure and Evolution of Biological Networks

    Get PDF
    High-throughput experimental protocols have revealed thousands of relationships amongst genes and proteins under various conditions. These putative associations are being aggressively mined to decipher the structural and functional architecture of the cell. One useful tool for exploring this data has been computational network analysis. In this thesis, we propose a collection of novel algorithms to explore the structure and evolution of large, noisy, and sparsely annotated biological networks. We first introduce two information-theoretic algorithms to extract interesting patterns and modules embedded in large graphs. The first, graph summarization, uses the minimum description length principle to find compressible parts of the graph. The second, VI-Cut, uses the variation of information to non-parametrically find groups of topologically cohesive and similarly annotated nodes in the network. We show that both algorithms find structure in biological data that is consistent with known biological processes, protein complexes, genetic diseases, and operational taxonomic units. We also propose several algorithms to systematically generate an ensemble of near-optimal network clusterings and show how these multiple views can be used together to identify clustering dynamics that any single solution approach would miss. To facilitate the study of ancient networks, we introduce a framework called ``network archaeology'') for reconstructing the node-by-node and edge-by-edge arrival history of a network. Starting with a present-day network, we apply a probabilistic growth model backwards in time to find high-likelihood previous states of the graph. This allows us to explore how interactions and modules may have evolved over time. In experiments with real-world social and biological networks, we find that our algorithms can recover significant features of ancestral networks that have long since disappeared. Our work is motivated by the need to understand large and complex biological systems that are being revealed to us by imperfect data. As data continues to pour in, we believe that computational network analysis will continue to be an essential tool towards this end

    Swarm intelligence techniques for optimization and management tasks insensor networks

    Get PDF
    The main contributions of this thesis are located in the domain of wireless sensor netorks. More in detail, we introduce energyaware algorithms and protocols in the context of the following topics: self-synchronized duty-cycling in networks with energy harvesting capabilities, distributed graph coloring and minimum energy broadcasting with realistic antennas. In the following, we review the research conducted in each case. We propose a self-synchronized duty-cycling mechanism for sensor networks. This mechanism is based on the working and resting phases of natural ant colonies, which show self-synchronized activity phases. The main goal of duty-cycling methods is to save energy by efficiently alternating between different states. In the case at hand, we considered two different states: the sleep state, where communications are not possible and energy consumption is low; and the active state, where communication result in a higher energy consumption. In order to test the model, we conducted an extensive experimentation with synchronous simulations on mobile networks and static networks, and also considering asynchronous networks. Later, we extended this work by assuming a broader point of view and including a comprehensive study of the parameters. In addition, thanks to a collaboration with the Technical University of Braunschweig, we were able to test our algorithm in the real sensor network simulator Shawn (http://shawn.sf.net). The second part of this thesis is devoted to the desynchronization of wireless sensor nodes and its application to the distributed graph coloring problem. In particular, our research is inspired by the calling behavior of Japanese tree frogs, whose males use their calls to attract females. Interestingly, as female frogs are only able to correctly localize the male frogs when their calls are not too close in time, groups of males that are located nearby each other desynchronize their calls. Based on a model of this behavior from the literature, we propose a novel algorithm with applications to the field of sensor networks. More in detail, we analyzed the ability of the algorithm to desynchronize neighboring nodes. Furthermore, we considered extensions of the original model, hereby improving its desynchronization capabilities.To illustrate the potential benefits of desynchronized networks, we then focused on distributed graph coloring. Later, we analyzed the algorithm more extensively and show its performance on a larger set of benchmark instances. The classical minimum energy broadcast (MEB) problem in wireless ad hoc networks, which is well-studied in the scientific literature, considers an antenna model that allows the adjustment of the transmission power to any desired real value from zero up to the maximum transmission power level. However, when specifically considering sensor networks, a look at the currently available hardware shows that this antenna model is not very realistic. In this work we re-formulate the MEB problem for an antenna model that is realistic for sensor networks. In this antenna model transmission power levels are chosen from a finite set of possible ones. A further contribution concerns the adaptation of an ant colony optimization algorithm --currently being the state of the art for the classical MEB problem-- to the more realistic problem version, the so-called minimum energy broadcast problem with realistic antennas (MEBRA). The obtained results show that the advantage of ant colony optimization over classical heuristics even grows when the number of possible transmission power levels decreases. Finally we build a distributed version of the algorithm, which also compares quite favorably against centralized heuristics from the literature.Las principles contribuciones de esta tesis se encuentran en el domino de las redes de sensores inalámbricas. Más en detalle, introducimos algoritmos y protocolos que intentan minimizar el consumo energético para los siguientes problemas: gestión autosincronizada de encendido y apagado de sensores con capacidad para obtener energía del ambiente, coloreado de grafos distribuido y broadcasting de consumo mínimo en entornos con antenas reales. En primer lugar, proponemos un sistema capaz de autosincronizar los ciclos de encendido y apagado de los nodos de una red de sensores. El mecanismo está basado en las fases de trabajo y reposo de las colonias de hormigas tal y como estas pueden observarse en la naturaleza, es decir, con fases de actividad autosincronizadas. El principal objectivo de este tipo de técnicas es ahorrar energía gracias a alternar estados de forma eficiente. En este caso en concreto, consideramos dos estados diferentes: el estado dormido, en el que los nodos no pueden comunicarse y el consumo energético es bajo; y el estado activo, en el que las comunicaciones propician un consumo energético elevado. Con el objetivo de probar el modelo, se ha llevado a cabo una extensa experimentación que incluye tanto simulaciones síncronas en redes móviles y estáticas, como simulaciones en redes asíncronas. Además, este trabajo se extendió asumiendo un punto de vista más amplio e incluyendo un detallado estudio de los parámetros del algoritmo. Finalmente, gracias a la colaboración con la Technical University of Braunschweig, tuvimos la oportunidad de probar el mecanismo en el simulador realista de redes de sensores, Shawn (http://shawn.sf.net). La segunda parte de esta tesis está dedicada a la desincronización de nodos en redes de sensores y a su aplicación al problema del coloreado de grafos de forma distribuida. En particular, nuestra investigación está inspirada por el canto de las ranas de árbol japonesas, cuyos machos utilizan su canto para atraer a las hembras. Resulta interesante que debido a que las hembras solo son capaces de localizar las ranas macho cuando sus cantos no están demasiado cerca en el tiempo, los grupos de machos que se hallan en una misma región desincronizan sus cantos. Basado en un modelo de este comportamiento que se encuentra en la literatura, proponemos un nuevo algoritmo con aplicaciones al campo de las redes de sensores. Más en detalle, analizamos la habilidad del algoritmo para desincronizar nodos vecinos. Además, consideramos extensiones del modelo original, mejorando su capacidad de desincronización. Para ilustrar los potenciales beneficios de las redes desincronizadas, nos centramos en el problema del coloreado de grafos distribuido que tiene relación con diferentes tareas habituales en redes de sensores. El clásico problema del broadcasting de consumo mínimo en redes ad hoc ha sido bien estudiado en la literatura. El problema considera un modelo de antena que permite transmitir a cualquier potencia elegida (hasta un máximo establecido por el dispositivo). Sin embargo, cuando se trabaja de forma específica con redes de sensores, un vistazo al hardware actualmente disponible muestra que este modelo de antena no es demasiado realista. En este trabajo reformulamos el problema para el modelo de antena más habitual en redes de sensores. En este modelo, los niveles de potencia de transmisión se eligen de un conjunto finito de posibilidades. La siguiente contribución consiste en en la adaptación de un algoritmo de optimización por colonias de hormigas a la versión más realista del problema, también conocida como broadcasting de consumo mínimo con antenas realistas. Los resultados obtenidos muestran que la ventaja de este método sobre heurísticas clásicas incluso crece cuando el número de posibles potencias de transmisión decrece. Además, se ha presentado una versión distribuida del algoritmo, que también se compara de forma bastante favorable contra las heurísticas centralizadas conocidas
    corecore