Search CORE

13 research outputs found

Textual Membership Queries

Author: Markovitch Shaul
Zarecki Jonathan
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 07/08/2020
Field of study

Human labeling of data can be very time-consuming and expensive, yet, in many cases it is critical for the success of the learning process. In order to minimize human labeling efforts, we propose a novel active learning solution that does not rely on existing sources of unlabeled data. It uses a small amount of labeled data as the core set for the synthesis of useful membership queries (MQs) - unlabeled instances generated by an algorithm for human labeling. Our solution uses modification operators, functions that modify instances to some extent. We apply the operators on a small set of instances (core set), creating a set of new membership queries. Using this framework, we look at the instance space as a search space and apply search algorithms in order to generate new examples highly relevant to the learner. We implement this framework in the textual domain and test it on several text classification tasks and show improved classifier performance as more MQs are labeled and incorporated into the training set. To the best of our knowledge, this is the first work on membership queries in the textual domain.Comment: Accepted to IJCAI 2020. Code is available at github.com/jonzarecki/textual-mqs . Additional material is available at tinyurl.com/sup-textualmqs . SOLE copyright holder is IJCAI (International Joint Conferences on Artificial Intelligence), all rights reserve

arXiv.org e-Print Archive

Crossref

Redescription Mining and Applications in Bioinformatics

Author: Mohammed J. Zaki
Naren Ramakrishnan
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2009
Field of study

Our ability to interrogate the cell and computationally assimilate its answers is improving at a dramatic pace. For instance, the study of even a focused aspect of cellular activity, such as gene action, now benefits from multiple high-throughput data acquisition technologies such as microarrays, genome-wide deletion screens, and RNAi assays. A critical need is the development of algorithms that can bridge, relate, and unify diverse categories of data descriptors. Redescription mining is such an approach. Given a set of biological objects (e.g., genes, proteins) and a collection of descriptors defined over this set, the goal of redescription mining is to use the given descriptors as a vocabulary and find subsets of data that afford multiple definitions. The premise of redescription mining is that subsets that afford multiple definitions are likely to exhibit concerted behavior and are, hence, interesting. We present algorithms for redescription mining based on formal concept analysis and applications of redescription mining to multiple biological datasets. We demonstrate how redescriptions identify conceptual clusters of data using mutually reinforcing features, without explicit training information.

CiteSeerX

Crossref

On the Complexity of Mining Itemsets from the Crowd Using Taxonomies

Author: Amarilli Antoine
Amsterdamer Yael
Milo Tova
Publication venue
Publication date: 16/12/2013
Field of study

We study the problem of frequent itemset mining in domains where data is not recorded in a conventional database but only exists in human knowledge. We provide examples of such scenarios, and present a crowdsourcing model for them. The model uses the crowd as an oracle to find out whether an itemset is frequent or not, and relies on a known taxonomy of the item domain to guide the search for frequent itemsets. In the spirit of data mining with oracles, we analyze the complexity of this problem in terms of (i) crowd complexity, that measures the number of crowd questions required to identify the frequent itemsets; and (ii) computational complexity, that measures the computational effort required to choose the questions. We provide lower and upper complexity bounds in terms of the size and structure of the input taxonomy, as well as the size of a concise description of the output itemsets. We also provide constructive algorithms that achieve the upper bounds, and consider more efficient variants for practical situations.Comment: 18 pages, 2 figures. To be published to ICDT'13. Added missing acknowledgemen

arXiv.org e-Print Archive

CiteSeerX

Tools and Algorithms for the Construction and Analysis of Systems

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref

Predicate Generation for Learning-Based Quantifier-Free Loop Invariant Inference

Author: Yungbum Jung
Wonchan Lee
Bow-Yaw Wang
Kwangkuen Yi
Publication venue: Springer Berlin Heidelberg
Publication date: 01/01/1994
Field of study

PETITION FOR ORIGINAL WRIT OF MANDAMUS DIRECTED TO THE HONORABLE DAVID L. MOWER DISTRICT JUDGE OF SEVIER COUNTY, STATE OF UTA

bepress Legal Repository

Crossref

Brigham Young University Law School

Conjunctions of Unate DNF Formulas: Learning and Structure

Author: Feigelson Aaron
Hellerstein Lisa
Publication venue: Academic Press.
Publication date: 01/02/1998
Field of study

AbstractA central topic in query learning is to determine which classes of Boolean formulas are efficiently learnable with membership and equivalence queries. We consider the class Rkconsisting of conjunctions ofkunate DNF formulas. This class generalizes the class ofk-clause CNF formulas and the class of unate DNF formulas, both of which are known to be learnable in polynomial time with membership and equivalence queries. We prove that R2can be properly learned with a polynomial number of polynomial-size membership and equivalence queries, but can be properly learned in polynomial time with such queries if and only if P=NP. Thus the barrier to properly learning R2with membership and equivalence queries is computational rather than informational. Few results of this type are known. In our proofs, we use recent results of Hellersteinet al.(1997,J. Assoc. Comput. Mach.43(5), 840–862), characterizing the classes that are polynomial-query learnable, together with work of Bshouty on the monotone dimension of Boolean functions. We extend some of our results to Rkand pose open questions on learning DNF formulas of small monotone dimension. We also prove structural results for Rk. We construct, for any fixedk⩾2, a class of functionsfthat cannot be represented by any formula in Rk, but which cannot be “easily” shown to have this property. More precisely, for any functionfonnvariables in the class, the value offon any polynomial-size set of points in its domain is not a witness thatfcannot be represented by a formula in Rk. Our construction is based on BCH codes

Elsevier - Publisher Connector

Efficiently Learning Monotone Decision Trees with ID3

Author: Thompson Pamela
Publication venue: Duquesne Scholarship Collection
Publication date: 01/04/2015
Field of study

Since the Probably Approximately Correct learning model was introduced in 1984, there has been much effort in designing computationally efficient algorithms for learning Boolean functions from random examples drawn from a uniform distribution. In this paper, I take the ID3 information-gain-first classification algorithm and apply it to the task of learning monotone Boolean functions from examples that are uniformly distributed over {0,1}^n. I limited my scope to the class of monotone Boolean functions that can be represented as read-2 width-2 disjunctive normal form expressions. I modeled these functions as graphs and examined each type of connected component contained in these models, i.e. path graphs and cycle graphs. I determined the influence of the variables in the pieces of these graph models in order to understand how ID3 behaves when learning these functions. My findings show that ID3 will produce an optimal decision tree for this class of Boolean functions

Duquesne University: Digital Commons

Exact Learning Boolean Functions via the Monotone Theory

Author: Nader H. Bshouty
Publication venue
Publication date: 01/01/1995
Field of study

We study the learnability of boolean functions from membership and equivalence queries. We develop the Monotone Theory that proves 1) Any boolean function is learnable in polynomial time in its minimal DNF size, its minimal CNF size and the number of variables n. In particular, 2) Decision trees are learnable. Our algorithms are in the model of exact learning with membership queries and unrestricted equivalence queires. The hypotheses to the equivalence queries and the output hypotheses are depth 3 formulas

CiteSeerX

Elsevier - Publisher Connector