86 research outputs found

    Integrating automated literature searches and text mining in biomarker discovery

    Get PDF
    Epigenetics, and more specifically DNA methylation is a fast evolving research area. In almost every cancer type, each month new publications confirm the differentiated regulation of specific genes due to methylation and mention the discovery of novel methylation markers. The last decade, high-throughput methodologies are frequently used in the discovery of such methylation biomarkers. Examples of such analyses are re-expression experiments (using the demethylating agent 5-Aza-2′-Deoxycytidine, followed by expression micro-array analysis); CpG microarrays such as the Illumina HumanMethylation27 BeadChip and large scale bisulfite sequencing

    Fetal sex determination in twin pregnancies using non-invasive prenatal testing

    Get PDF
    Non-invasive prenatal testing (NIPT) is accurate for fetal sex determination in singleton pregnancies, but its accuracy is not well established in twin pregnancies. Here, we present an accurate sex prediction model to discriminate fetal sex in both dichorionic diamniotic (DCDA) and monochorionic diamniotic/monochorionic monoamniotic (MCDA/MCMA) twin pregnancies. A retrospective analysis was performed using a total of 198 twin pregnancies with documented sex. The prediction was based on a multinomial logistic regression using the normalized frequency of X and Y chromosomes, and fetal fraction estimation. A second-step regression analysis was applied when one or both twins were predicted to be male. The model determines fetal sex with 100% sensitivity and specificity when both twins are female, and with 98% sensitivity and 95% specificity when a male is present. Since sex determination can be clinically important, implementing fetal sex determination in twins will improve overall twin pregnancies management

    Maximum Entropy Modeling with Clausal Constraints

    No full text
    We present the learning system Maccent which addresses the novel task of stochastic MAximum ENTropy modeling with Clausal Constraints. Maximum Entropy method is a Bayesian method based on the principle that the target stochastic model should be as uniform as possible, subject to known constraints. Maccent incorporates clausal constraints that are based on the evaluation of Prolog clauses in examples represented as Prolog programs. We build on an existing maximum-likelihood approach to maximum entropy modeling, which we upgrade along two dimensions: (1) Maccent can handle larger search spaces, due to a partial ordering defined on the space of clausal constraints, and (2) uses a richer firstorder logic format. In comparison with other inductive logic programming systems, Maccent seems to be the first that explicitly constructs a conditional probability distribution p(CjI) based on an empirical distribution ~ p(CjI) (where p(CjI) (~p(CjI)) gives the induced (observed) probability of ..

    Frequent pattern discovery in first-order logic

    No full text
    status: publishe

    From promising to profitable applications of ILP: A case study in drug discovery

    No full text
    status: publishe

    Mining a Natural Language Corpus for Multi-Relational Association Rules

    No full text
    Association rules are generally recognized as a highly valuable type of regularities and various algorithms have been presented for efficiently mining them in large databases. To the best of our knowledge, the application of these algorithms is so far restricted to databases that consist of a single relation composed of a set of binary attributes. We describe how these restrictions can be overcome through the combination of the available algorithms with standard techniques from the field of inductive logic programming. We present the algorithm AprioriRel, which extends Apriori [ Agrawal et al., 1996 ] to mine association rules in multiple relations. Whereas in Apriori each example is described by means of a single tuple, in AprioriRel each example is viewed as a separate database with a selection, from multiple relations, of all tuples related to the example. Accordingly, the association rules discovered by AprioriRel may combine information from various relations to statements of th..

    Parallel Inductive Logic Programming

    No full text
    The generic task of Inductive Logic Programming (ILP) is to search a predefined subspace of first-order logic for hypotheses that in some respect explain examples and background knowledge. In this paper we consider the development of parallel implementations of ILP systems. A first part discusses the division of the ILP-task into subtasks that can be handled concurrently by multiple processes executing a common sequential ILP algorithm. We define the notion of a valid partition of an ILP-task, and test this definition against two problem specifications that have been employed within ILP. The second part of the paper focuses on the algorithmic description, prototypical implementation, and comparative evaluation of a parallel version of the clausal discovery system Claudien. Keywords: inductive logic programming, machine learning, concurrency, first order logic, knowledge discovery. 1 Introduction Inductive Logic Programming (ILP) [12, 13, 14] by now has become an established subfield ..

    Mining Association Rules in Multiple Relations

    No full text
    . The application of algorithms for efficiently generating association rules is so far restricted to cases where information is put together in a single relation. We describe how this restriction can be overcome through the combination of the available algorithms with standard techniques from the field of inductive logic programming. We present the system Warmr, which extends Apriori [2] to mine association rules in multiple relations. We apply Warmr to the natural language processing task of mining part-of-speech tagging rules in a large corpus of English. Keywords: association rules, inductive logic programming 1 Introduction Association rules are generally recognized as a highly valuable type of regularities and various algorithms have been presented for efficiently mining them in large databases (cf. [1, 7, 2]). To the best of our knowledge, the application of these algorithms is so far restricted to cases where information is put together in a single relation. We describe how th..

    DLAB: A Declarative Language Bias Formalism

    No full text
    . We describe the principles and functionalities of Dlab (Declarative LAnguage Bias). Dlab can be used in inductive learning systems to define syntactically and traverse efficiently finite subspaces of first order clausal logic, be it a set of propositional formulae, association rules, Horn clauses, or full clauses. A Prolog implementation of Dlab is available by ftp access. Keywords: declarative language bias, concept learning, knowledge discovery 1 Introduction The notion bias, generally circumscribed as "a tendency to show prejudice against one group and favouritism towards another" (Collins Cobuild, 1987), has been adapted to the field of computational inductive reasoning to become a generic term for "any basis for choosing one generalization over another, other than strict consistency with the instances" (Mitchell [14]). We borrow a more finetuned definition of inductive bias from Utgoff [20]. Definition1 (inductive bias). Except for the presented examples and counterexamples ..

    Clausal Discovery

    No full text
    The clausal discovery engine Claudien is presented. Claudien is an inductive logic programming engine that fits in the descriptive data mining paradigm. Claudien addresses characteristic induction from interpretations, a task which is related to existing formalisations of induction in logic. In characteristic induction from interpretations, the regularities are represented by clausal theories, and the data using Herbrand interpretations. Because Claudien uses clausal logic to represent hypotheses, the regularities induced typically involve multiple relations or predicates. Claudien also employs a novel declarative bias mechanism to define the set of clauses that may appear in a hypothesis
    • …
    corecore