Search CORE

20 research outputs found

Trading inference effort versus size in CNF Knowledge Compilation

Author: Gwynne Matthew
Kullmann Oliver
Publication venue
Publication date: 07/11/2013
Field of study

Knowledge Compilation (KC) studies compilation of boolean functions f into some formalism F, which allows to answer all queries of a certain kind in polynomial time. Due to its relevance for SAT solving, we concentrate on the query type "clausal entailment" (CE), i.e., whether a clause C follows from f or not, and we consider subclasses of CNF, i.e., clause-sets F with special properties. In this report we do not allow auxiliary variables (except of the Outlook), and thus F needs to be equivalent to f. We consider the hierarchies UC_k <= WC_k, which were introduced by the authors in 2012. Each level allows CE queries. The first two levels are well-known classes for KC. Namely UC_0 = WC_0 is the same as PI as studied in KC, that is, f is represented by the set of all prime implicates, while UC_1 = WC_1 is the same as UC, the class of unit-refutation complete clause-sets introduced by del Val 1994. We show that for each k there are (sequences of) boolean functions with polysize representations in UC_{k+1}, but with an exponential lower bound on representations in WC_k. Such a separation was previously only know for k=0. We also consider PC < UC, the class of propagation-complete clause-sets. We show that there are (sequences of) boolean functions with polysize representations in UC, while there is an exponential lower bound for representations in PC. These separations are steps towards a general conjecture determining the representation power of the hierarchies PC_k < UC_k <= WC_k. The strong form of this conjecture also allows auxiliary variables, as discussed in depth in the Outlook.Comment: 43 pages, second version with literature updates. Proceeds with the separation results from the discontinued arXiv:1302.442

arXiv.org e-Print Archive

CiteSeerX

Combining Boolean Networks and Ordinary Differential Equations for Analysis and Comparison of Gene Regulatory Networks

Author: Schwieger Robert
Publication venue
Publication date: 01/01/2019
Field of study

This thesis is concerned with different groups of qualitative models of gene regulatory networks. Four types of models will be considered: interaction graphs, Boolean networks, models based on differential equations and discrete abstractions of differential equations. We will investigate the relations between these modeling frameworks and how they can be used in the analysis of individual models. The focus lies on the mathematical analysis of these models. This thesis makes several contributions in relating these different modeling frameworks. The first approach concerns individual Boolean models and parametrized families of ordinary differential equations (ODEs). To construct ODE models systematically from Boolean models several automatic conversion algorithms have been proposed. In Chapter 2 several such closely related algorithms will be considered. It will be proven that certain invariant sets are preserved during the conversion from a Boolean network to a model based on ODEs. In the second approach the idea of abstracting the dynamics of individual models to relate structure and dynamics will be introduced. This approach will be applied to Boolean models and models based on differential equations. This allows to compare groups of models in these modeling frameworks which have the same structure. We demonstrate that this constitutes an approach to link the interaction graph to the dynamics of certain sets of Boolean networks and models based on differential equations. The abstracted dynamics – or more precisely the restrictions on the abstracted behavior – of such sets of Boolean networks or models based on differential equations will be represented as Boolean state transitions graphs themselves. We will show that these state transition graphs can be considered as asynchronous Boolean networks. Despite the rather theoretical question this thesis tries to answer there are many potential applications of the results. The results in Chapter 2 can be applied to network reduction of ODE models based on Hill kinetics. The results of the second approach in Chapter 4 can be applied to network inference and analysis of Boolean model sets. Furthermore, in the last chapter of this thesis several ideas for applications with respect to experiment design will be considered. This leads to the question how different asynchronous Boolean networks or different behaviours of a single asynchronous Boolean network can be distinguishedDiese Arbeit beschäftigt sich mit unterschiedlichen Typen von qualitativen Modellen genregulatorischer Netzwerke. Vier Typen von Modellen werden betrachtet: Interaktionsgraphen, Boolesche Netzwerke, Modelle, die auf Differentialgleichungen basieren und diskrete Abstraktionen von Differentialgleichungen. Wir werden mehr über die Beziehungen zwischen diesen Modellgruppen lernen und wie diese Beziehungen genutzt werden können, um einzelne Modelle zu analysieren. Der Schwerpunkt liegt hierbei auf der mathematischen Analyse dieser Modellgruppen. In dieser Hinsicht leistet diese Arbeit mehrere Beiträge. Zunächst betrachten wir Boolesche Netzwerke und parametrisierte Familien von gewöhnlichen Differentialgleichungen (ODEs). Um solche ODE-Modelle systematisch aus Booleschen Modellen abzuleiten, wurden in der Vergangenheit verschiedene automatische Konvertierungsalgorithmen vorgeschlagen. In Kapitel 2 werden einige dieser Algorithmen näher untersucht. Wir werden beweisen, dass bestimmte invariante Mengen bei der Konvertierung eines Booleschen Modells in ein ODE-Modell erhalten bleiben. Der zweite Ansatz, der in dieser Arbeit verfolgt wird, beschäftigt sich mit diskreten Abstraktionen der Dynamik von Modellen. Mit Hilfe dieser Abstraktionen ist es möglich, die Struktur – den Interaktionsgraphen – und die Dynamik der zugehörigen Modelle in Bezug zu setzen. Diese Methode wird sowohl auf Boolesche Modelle als auch auf ODE-Modelle angewandt. Gleichzeitig erlaubt dieser Ansatz Mengen von Modellen in unterschiedlichen Modellgruppen zu vergleichen, die dieselbe Struktur haben. Die abstrahierten Dynamiken (genauer die Einschränkungen der abstrahierten Dynamiken) der Booleschen Modellmengen oder ODE-Modellmengen können als Boolesche Zustandsübergangsgraphen repräsentiert werden. Wir werden zeigen, dass diese Zustandsübergangsgraphen wiederum selber als (asynchrone) Boolesche Netzwerke aufgefasst werden können. Trotz der theoretischen Ausgangsfrage werden in dieser Arbeit zahlreiche Anwendungen aufgezeigt. Die Ergebnisse aus Kapitel 2 können zur Modellreduktion benutzt werden, indem die Dynamik der ODE-Modelle auf den zu den Booleschen Netzwerken gehörigen “trap spaces” betrachtet wird. Die Resultate aus Kapitel 4 können zur Netzwerkinferenz oder zur Analyse von Modellmengen genutzt werden. Weiterhin werden im letzten Kapitel dieser Arbeit einige Anwendungsideen im Bezug auf Experimentdesign eingeführt. Dies führt zu der Fragestellung, wie verschiedene asynchrone Boolesche Netzwerke oder unterschiedliche Dynamiken, die mit einem einzelnen Modell vereinbar sind, unterschieden werden können

Institutional Repository of the Freie Universität Berlin

Analytics of Sequential Time Data from Physical Assets

Author: Elsheikh Ahmed
Publication venue
Publication date: 01/04/2018
Field of study

RÉSUMÉ: Avec l’avancement dans les technologies des capteurs et de l’intelligence artificielle, l'analyse des données est devenue une source d’information et de connaissance qui appuie la prise de décisions dans l’industrie. La prise de ces décisions, en se basant seulement sur l’expertise humaine n’est devenu suffisant ou souhaitable, et parfois même infaisable pour de nouvelles industries. L'analyse des données collectées à partir des actifs physiques vient renforcer la prise de décisions par des connaissances pratiques qui s’appuient sur des données réelles. Ces données sont utilisées pour accomplir deux tâches principales; le diagnostic et le pronostic. Les deux tâches posent un défi, principalement à cause de la provenance des données et de leur adéquation avec l’exploitation, et aussi à cause de la difficulté à choisir le type d'analyse. Ce dernier exige un analyste ayant une expertise dans les déférentes techniques d’analyse de données, et aussi dans le domaine de l’application. Les problèmes de données sont dus aux nombreuses sources inconnues de variations interagissant avec les données collectées, qui peuvent parfois être dus à des erreurs humaines. Le choix du type de modélisation est un autre défi puisque chaque modèle a ses propres hypothèses, paramètres et limitations. Cette thèse propose quatre nouveaux types d'analyse de séries chronologiques dont deux sont supervisés et les deux autres sont non supervisés. Ces techniques d'analyse sont testées et appliquées sur des différents problèmes industriels. Ces techniques visent à minimiser la charge de choix imposée à l'analyste. Pour l’analyse de séries chronologiques par des techniques supervisées, la prédiction de temps de défaillance d’un actif physique est faite par une technique qui porte le nom de ‘Logical Analysis of Survival Curves (LASC)’. Cette technique est utilisée pour stratifier de manière adaptative les courbes de survie tout au long d’un processus d’inspection. Ceci permet une modélisation plus précise au lieu d'utiliser un seul modèle augmenté pour toutes les données. L'autre technique supervisée de pronostic est un nouveau réseau de neurones de type ‘Long Short-Term Memory (LSTM) bidirectionnel’ appelé ‘Bidirectional Handshaking LSTM (BHLSTM)’. Ce modèle fait un meilleur usage des séquences courtes en faisant un tour de ronde à travers les données. De plus, le réseau est formé à l'aide d'une nouvelle fonction objective axée sur la sécurité qui force le réseau à faire des prévisions plus sûres. Enfin, étant donné que LSTM est une technique supervisée, une nouvelle approche pour générer la durée de vie utile restante (RUL) est proposée. Cette technique exige la formulation des hypothèses moins importantes par rapport aux approches précédentes. À des fins de diagnostic non supervisé, une nouvelle technique de classification interprétable est proposée. Cette technique est intitulée ‘Interpretable Clustering for Rule Extraction and Anomaly Detection (IC-READ)’. L'interprétation signifie que les groupes résultants sont formulés en utilisant une logique conditionnelle simple. Cela est pratique lors de la fourniture des résultats à des non-spécialistes. Il facilite toute mise en oeuvre du matériel si nécessaire. La technique proposée est également non paramétrique, ce qui signifie qu'aucun réglage n'est requis. Cette technique pourrait également être utiliser dans un contexte de ‘one class classification’ pour construire un détecteur d'anomalie. L'autre technique non supervisée proposée est une approche de regroupement de séries chronologiques à plusieurs variables de longueur variable à l'aide d'une distance de type ‘Dynamic Time Warping (DTW)’ modifiée. Le DTW modifié donne des correspondances plus élevées pour les séries temporelles qui ont des tendances et des grandeurs similaires plutôt que de se concentrer uniquement sur l'une ou l'autre de ces propriétés. Cette technique est également non paramétrique et utilise la classification hiérarchique pour regrouper les séries chronologiques de manière non supervisée. Cela est particulièrement utile pour décider de la planification de la maintenance. Il est également montré qu'il peut être utilisé avec ‘Kernel Principal Components Analysis (KPCA)’ pour visualiser des séquences de longueurs variables dans des diagrammes bidimensionnels.---------- ABSTRACT: Data analysis has become a necessity for industry. Working with inherited expertise only has become insufficient, expensive, not easily transferable, and mostly unavailable for new industries and facilities. Data analysis can provide decision-makers with more insight on how to manage their production, maintenance and personnel. Data collection requires acquisition and storage of observatory information about the state of the different production assets. Data collection usually takes place in a timely manner which result in time-series of observations. Depending on the type of data records available, the type of possible analyses will differ. Data labeled with previous human experience in terms of identifiable faults or fatigues can be used to build models to perform the expert’s task in the future by means of supervised learning. Otherwise, if no human labeling is available then data analysis can provide insights about similar observations or visualize these similarities through unsupervised learning. Both are challenging types of analyses. The challenges are two-fold; the first originates from the data and its adequacy, and the other is selecting the type of analysis which is a decision made by the analyst. Data challenges are due to the substantial number of unknown sources of variations inherited in the collected data, which may sometimes include human errors. Deciding upon the type of modelling is another issue as each model has its own assumptions, parameters to tune, and limitations. This thesis proposes four new types of time-series analysis, two of which are supervised requiring data labelling by certain events such as failure when, and the other two are unsupervised that require no such labelling. These analysis techniques are tested and applied on various industrial applications, namely road maintenance, bearing outer race failure detection, cutting tool failure prediction, and turbo engine failure prediction. These techniques target minimizing the burden of choice laid on the analyst working with industrial data by providing reliable analysis tools that require fewer choices to be made by the analyst. This in turn allows different industries to easily make use of their data without requiring much expertise. For prognostic purposes a proposed modification to the binary Logical Analysis of Data (LAD) classifier is used to adaptively stratify survival curves into long survivors and short life sets. This model requires no parameters to choose and completely relies on empirical estimations. The proposed Logical Analysis of Survival Curves show a 27% improvement in prediction accuracy than the results obtained by well-known machine learning techniques in terms of the mean absolute error. The other prognostic model is a new bidirectional Long Short-Term Memory (LSTM) neural network termed the Bidirectional Handshaking LSTM (BHLSTM). This model makes better use of short sequences by making a round pass through the given data. Moreover, the network is trained using a new safety oriented objective function which forces the network to make safer predictions. Finally, since LSTM is a supervised technique, a novel approach for generating the target Remaining Useful Life (RUL) is proposed requiring lesser assumptions to be made compared to previous approaches. This proposed network architecture shows an average of 18.75% decrease in the mean absolute error of predictions on the NASA turbo engine dataset. For unsupervised diagnostic purposes a new technique for providing interpretable clustering is proposed named Interpretable Clustering for Rule Extraction and Anomaly Detection (IC-READ). Interpretation means that the resulting clusters are formulated using simple conditional logic. This is very important when providing the results to non-specialists especially those in management and ease any hardware implementation if required. The proposed technique is also non-parametric, which means there is no tuning required and shows an average of 20% improvement in cluster purity over other clustering techniques applied on 11 benchmark datasets. This technique also can use the resulting clusters to build an anomaly detector. The last proposed technique is a whole multivariate variable length time-series clustering approach using a modified Dynamic Time Warping (DTW) distance. The modified DTW gives higher matches for time-series that have the similar trends and magnitudes rather than just focusing on either property alone. This technique is also non-parametric and uses hierarchal clustering to group time-series in an unsupervised fashion. This can be specifically useful for management to decide maintenance scheduling. It is shown also that it can be used along with Kernel Principal Components Analysis (KPCA) for visualizing variable length sequences in two-dimensional plots. The unsupervised techniques can help, in some cases where there is a lot of variation within certain classes, to ease the supervised learning task by breaking it into smaller problems having the same nature

PolyPublie

Part I:

Author: Daniel Panario
Publication venue
Publication date
Field of study

CiteSeerX

Customization of Treatment for Cancer Patients: An Engineering Approach

Author: Mishra Bibhu Prasad
Publication venue
Publication date: 18/01/2019
Field of study

Cancer is a disease associated with uncontrolled cell proliferation or reduced cell death, either of which can lead to tumorigenesis. A possible route through which cancer can develop is by breakdowns in the signaling cascade of proteins at the cellular level. Since there are many ways in which such breakdowns can occur, anti-cancer chemotherapeutic drugs show varying degrees of efficacy in different patients. Thus, there is an urgent need to personalize the drug treatment regimen for better response to treatment while trying to reduce the side effects of these drugs. One way to meet this need would be to try every possible drug combination on cell lines extracted from a patient and find the combination with the least number of drugs in the mix but providing the best possible output. Although this method may work it is tedious and time consuming as the number of combinations increase exponentially with every new drug that is introduced into the repertoire. First, we consider the problem where the tumor is homogeneous in nature but the mutations within the mutated cells are unknown. We use Boolean network models with monotonicity properties to reduce the number of test cases, while still getting the best possible combination with the least number of drugs in the mix. This approach is efficient both in terms of time required and the costs involved. This method has also been applied to both simulated and real-world data collected from fibroplasts using qPCR to demonstrate the usefulness of the method. Another important area of study in cancer research concerns the heterogeneous nature of tumors. The clonal evolution of tumors is the driving force leading to heterogeneity in cancer tissues. Thus, in order to customize the treatment of cancer we need to be able to better model the heterogeneous subpopulations in the tumor. This can be done by estimating the impact of the various sub-populations and by modeling the interplay of various sub-populations within the heterogeneous tumor. Prior works in the literature have already addressed the problems of estimating the proportion of the sub-populations within a tumor and of modeling the interaction between the various sub-populations. In this work we present a way to improve the accuracy of the Bayesian hierarchical model which helps in estimating the proportional breakup of the tumor population. Additionally, it looks at ways to use the knowledge of the proportional breakup of tumor subpopulations and the interplay between the various subpopulations to help customize the treatment for the patient by making use of evolutionary game theory. We demonstrate the improvement of the presented methods as compared to the existing Bayesian hierarchical model by applying these techniques to qPCR and fluorescent data. Finally, the problem becomes more challenging when the nature and the number of the subpopulations are variable and difficult to estimate. In this work, we present a feasible way to find the best possible drug combination for such a scenario by training two neural network models on synthetic and real-world cancer data. Then we test each model, to verify their effectiveness and to demonstrate their usefulness in choosing the appropriate combination therapy. The models were evaluated on synthetic qPCR data and fluorescent data obtained from experiments. The results obtained from these methods take us a step closer to the realization of customized treatment for cancer patients. This will not only make the treatment more effective but also help reduce the side effects of the drug treatment

Texas A&M Repository

Hierarchies for efficient clausal entailment checking: With applications to satisfiability and knowledge compilation.

Author: Matthew Gwynne
Publication venue
Publication date: 01/01/2014
Field of study

Cronfa at Swansea University