    Artificial intelligence and chemical kinetics enabled property-oriented fuel design for internal combustion engine

    Fuel Genome Project aims at addressing the forward problem of fuel property prediction and the inverse problems of molecule design, retrosynthesis and reaction condition prediction. This work primarily addresses the forward problem by integrating feature engineering theory, artificial intelligence (AI) technologies, gas-phase chemical kinetics. Group contribution method (GCM) is utilized to establish the GCM-UOB (University of Birmingham) 1.0 system with 22 molecular descriptors and the surrogate formulation is to minimize the difference of functional group fragments between target fuel and surrogate. The improved QSPR (quantitative structure–activity relationship)-UOB 2.0 system with 32 molecular features couples with machine learning (ML) algorithms to establish the regression models for fuel ignition quality prediction. QSPR-UOB 3.0 scheme expands to 42 molecular descriptors to improve the molecular resolution of aromatics and specific fuel types. The obtained structural features combining with ML algorithms enable to predict 15 physicochemical properties with high fidelity and efficiency. In addition to the technical route of ML-QSPR models, another route of deep learning-convolution neural network (DL-CNN) is proposed for property prediction and yield sooting index (YSI) is taken as a case study. The predicted accuracy of DL-CNN is inferior to the ML-QSPR model at its current status, but its benefit of automated feature extraction and rapid advance in classification problem make it a promising solution for regression problem. A high-throughput fuel screening is performed to identify the molecules with desired properties for both spark ignition (SI) and compression ignition (CI) engines which contains the Tier 1 physicochemical properties screening (based on the ML-QSPR models) and Tier 2 chemical kinetic screening (based on the detailed chemical mechanisms). Polyoxymethylene dimethyl ether 3 (PODE3) and diethoxymethane (DEM) are promising carbon-neutral fuels for CI engines and they are recommended by the virtual screening results. Their ignition delay time, laminar flame speed and dominant reactions of PODE3 and DEM are examined by chemical kinetics and a new DEM mechanism including both low and high-temperature reactions is constructed. Concluding remarks and research prospects are summarized in the final section

    Advanced Knowledge Application in Practice

    The integration and interdependency of the world economy leads towards the creation of a global market that offers more opportunities, but is also more complex and competitive than ever before. Therefore widespread research activity is necessary if one is to remain successful on the market. This book is the result of research and development activities from a number of researchers worldwide, covering concrete fields of research

    Identification des régimes et regroupement des séquences pour la prévision des marchés financiers

    Abstract : Regime switching analysis is extensively advocated to capture complex behaviors underlying financial time series for market prediction. Two main disadvantages in current approaches of regime identification are raised in the literature: 1) the lack of a mechanism for identifying regimes dynamically, restricting them to switching among a fixed set of regimes with a static transition probability matrix; 2) failure to utilize cross-sectional regime dependencies among time series, since not all the time series are synchronized to the same regime. As the numerical time series can be symbolized into categorical sequences, a third issue raises: 3) the lack of a meaningful and effective measure of the similarity between chronological dependent categorical values, in order to identify sequence clusters that could serve as regimes for market forecasting. In this thesis, we propose a dynamic regime identification model that can identify regimes dynamically with a time-varying transition probability, to address the first issue. For the second issue, we propose a cluster-based regime identification model to account for the cross-sectional regime dependencies underlying financial time series for market forecasting. For the last issue, we develop a dynamic order Markov model, making use of information underlying frequent consecutive patterns and sparse patterns, to identify the clusters that could serve as regimes identified on categorized financial time series. Experiments on synthetic and real-world datasets show that our two regime models show good performance on both regime identification and forecasting, while our dynamic order Markov clustering model also demonstrates good performance on identifying clusters from categorical sequences.L'analyse de changement de régime est largement préconisée pour capturer les comportements complexes sous-jacents aux séries chronologiques financières pour la prédiction du marché. Deux principaux problèmes des approches actuelles d'identifica-tion de régime sont soulevés dans la littérature. Il s’agit de: 1) l'absence d'un mécanisme d'identification dynamique des régimes. Ceci limite la commutation entre un ensemble fixe de régimes avec une matrice de probabilité de transition statique; 2) l’incapacité à utiliser les dépendances transversales des régimes entre les séries chronologiques, car toutes les séries chronologiques ne sont pas synchronisées sur le même régime. Étant donné que les séries temporelles numériques peuvent être symbolisées en séquences catégorielles, un troisième problème se pose: 3) l'absence d'une mesure significative et efficace de la similarité entre les séries chronologiques dépendant des valeurs catégorielles pour identifier les clusters de séquences qui pourraient servir de régimes de prévision du marché. Dans cette thèse, nous proposons un modèle d'identification de régime dynamique qui identifie dynamiquement des régimes avec une probabilité de transition variable dans le temps afin de répondre au premier problème. Ensuite, pour adresser le deuxième problème, nous proposons un modèle d'identification de régime basé sur les clusters. Notre modèle considère les dépendances transversales des régimes sous-jacents aux séries chronologiques financières avant d’effectuer la prévision du marché. Pour terminer, nous abordons le troisième problème en développant un modèle de Markov d'ordre dynamique, en utilisant les informations sous-jacentes aux motifs consécutifs fréquents et aux motifs clairsemés, pour identifier les clusters qui peuvent servir de régimes identifiés sur des séries chronologiques financières catégorisées. Nous avons mené des expériences sur des ensembles de données synthétiques et du monde réel. Nous démontrons que nos deux modèles de régime présentent de bonnes performances à la fois en termes d'identification et de prévision de régime, et notre modèle de clustering de Markov d'ordre dynamique produit également de bonnes performances dans l'identification de clusters à partir de séquences catégorielles

    Processing hidden Markov models using recurrent neural networks for biological applications

    Philosophiae Doctor - PhDIn this thesis, we present a novel hybrid architecture by combining the most popular sequence recognition models such as Recurrent Neural Networks (RNNs) and Hidden Markov Models (HMMs). Though sequence recognition problems could be potentially modelled through well trained HMMs, they could not provide a reasonable solution to the complicated recognition problems. In contrast, the ability of RNNs to recognize the complex sequence recognition problems is known to be exceptionally good. It should be noted that in the past, methods for applying HMMs into RNNs have been developed by other researchers. However, to the best of our knowledge, no algorithm for processing HMMs through learning has been given. Taking advantage of the structural similarities of the architectural dynamics of the RNNs and HMMs, in this work we analyze the combination of these two systems into the hybrid architecture. To this end, the main objective of this study is to improve the sequence recognition/classi_cation performance by applying a hybrid neural/symbolic approach. In particular, trained HMMs are used as the initial symbolic domain theory and directly encoded into appropriate RNN architecture, meaning that the prior knowledge is processed through the training of RNNs. Proposed algorithm is then implemented on sample test beds and other real time biological applications

    Uma metodologia para classificação de dados nominais baseada no processo KDD

    Resumo: A classificação de padrões é um problema de aprendizado supervisionado do campo da ciência conhecido como Reconhecimento de Padrões (RP), através do qual se deseja discriminar instâncias de dados em diferentes classes. A solução para este problema é obtida por meio de algoritmos (classificadores) que buscam por padrões de relacionamento entre classes em casos conhecidos (treinamento), usando tais relações para classificar casos desconhecidos (teste). O desempenho em termos de acurácia preditiva dos algoritmos que se propõem a realizar tal tarefa depende muito da qualidade e dos tipos de dados contidos nas bases. Visando melhorar a qualidade dos dados e dar tratamento adequado aos tipos de dados utilizados, o presente trabalho faz uso do processo de Descoberta de Conhecimento em Bases de Dados (Knowledge Discovery in Databases; KDD), no qual a classificação é uma das tarefas da etapa conhecida como Mineração de Dados (Data Mining; DM). As etapas aqui aplicadas antes da classificação são a seleção de atributos wrapper e um processo de transformação de atributos baseado em Análise Geométrica de Dados (Geometric Data Analysis; GDA). Para a seleção de atributos é proposta uma nova técnica baseada em Algoritmo de Estimação de Distribuição (Estimation of Distribution Algorithm; EDA) e em Algoritmos Culturais (AC) batizada de Belief-Based Incremental Learning (BBIL). Para a transformação de atributos é aqui proposta a utilização de uma alternativa à clássica Análise de Componentes Principais (Principal Component Analysis; PCA) para lidar especificamente com dados nominais: a Análise de Correspondência Múltipla (Multiple Correspondence Analysis; MCA). Na etapa de DM, de fato, faz-se a aplicação de dois tradicionais classificadores da área de RP, Naïve Bayes e Função Discriminante Linear de Fisher (Linear Discriminant Analysis; LDA). Apoiado em argumentos teóricos e em testes empíricos realizados com nove diferentes conjuntos de dados nominais, o presente trabalho objetiva avaliar a capacidade do MCA e do BBIL em melhorar o desempenho de classificadores em termos de acurácia preditiva média. Com o objetivo de se beneficiar simultaneamente das vantagens de ambos os tratamentos de dados são avaliadas duas combinações entre estas técnicas. A primeira trata-se da transformação GDA sobre os atributos previamente selecionados e, a segunda, a seleção de factor scores do MCA utilizando o BBIL (metodologia proposta). Os resultados dos experimentos confirmam a melhoria no desempenho de classificação proporcionada pelos tratamentos realizados e atestam a superioridade da metodologia proposta na maioria das situações analisadas


    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications


    Intelligent Energy-Savings and Process Improvement Strategies in Energy-Intensive Industries

    S tím, jak se neustále vyvíjejí nové technologie pro energeticky náročná průmyslová odvětví, stávající zařízení postupně zaostávají v efektivitě a produktivitě. Tvrdá konkurence na trhu a legislativa v oblasti životního prostředí nutí tato tradiční zařízení k ukončení provozu a k odstavení. Zlepšování procesu a projekty modernizace jsou zásadní v udržování provozních výkonů těchto zařízení. Současné přístupy pro zlepšování procesů jsou hlavně: integrace procesů, optimalizace procesů a intenzifikace procesů. Obecně se v těchto oblastech využívá matematické optimalizace, zkušeností řešitele a provozní heuristiky. Tyto přístupy slouží jako základ pro zlepšování procesů. Avšak, jejich výkon lze dále zlepšit pomocí moderní výpočtové inteligence. Účelem této práce je tudíž aplikace pokročilých technik umělé inteligence a strojového učení za účelem zlepšování procesů v energeticky náročných průmyslových procesech. V této práci je využit přístup, který řeší tento problém simulací průmyslových systémů a přispívá následujícím: (i)Aplikace techniky strojového učení, která zahrnuje jednorázové učení a neuro-evoluci pro modelování a optimalizaci jednotlivých jednotek na základě dat. (ii) Aplikace redukce dimenze (např. Analýza hlavních komponent, autoendkodér) pro vícekriteriální optimalizaci procesu s více jednotkami. (iii) Návrh nového nástroje pro analýzu problematických částí systému za účelem jejich odstranění (bottleneck tree analysis – BOTA). Bylo také navrženo rozšíření nástroje, které umožňuje řešit vícerozměrné problémy pomocí přístupu založeného na datech. (iv) Prokázání účinnosti simulací Monte-Carlo, neuronové sítě a rozhodovacích stromů pro rozhodování při integraci nové technologie procesu do stávajících procesů. (v) Porovnání techniky HTM (Hierarchical Temporal Memory) a duální optimalizace s několika prediktivními nástroji pro podporu managementu provozu v reálném čase. (vi) Implementace umělé neuronové sítě v rámci rozhraní pro konvenční procesní graf (P-graf). (vii) Zdůraznění budoucnosti umělé inteligence a procesního inženýrství v biosystémech prostřednictvím komerčně založeného paradigmatu multi-omics.Zlepšení průmyslových procesů, Model založený na datech, Optimalizace procesu, Strojové učení, Průmyslové systémy, Energeticky náročná průmyslová odvětví, Umělá inteligence.