5 research outputs found

    Regression, Model Misspecification and Causation, with Pedagogical Demonstration

    Get PDF
    Abstract This paper shows, by a proposition and a numerical example, how a classic simple or multiple normal regression can achieve with 0.99 probability a near perfect fit to a random sample of any size but due to the omission of an independent variable the signs of the estimated coefficients are all wrong, thus distinguishing prediction from causation

    Data mining and knowledge discovery: a guided approach base on monotone boolean functions

    Get PDF
    This dissertation deals with an important problem in Data Mining and Knowledge Discovery (DM & KD), and Information Technology (IT) in general. It addresses the problem of efficiently learning monotone Boolean functions via membership queries to oracles. The monotone Boolean function can be thought of as a phenomenon, such as breast cancer or a computer crash, together with a set of predictor variables. The oracle can be thought of as an entity that knows the underlying monotone Boolean function, and provides a Boolean response to each query. In practice, it may take the shape of a human expert, or it may be the outcome of performing tasks such as running experiments or searching large databases. Monotone Boolean functions have a general knowledge representation power and are inherently frequent in applications. A key goal of this dissertation is to demonstrate the wide spectrum of important real-life applications that can be analyzed by using the new proposed computational approaches. The applications of breast cancer diagnosis, computer crashing, college acceptance policies, and record linkage in databases are here used to demonstrate this point and illustrate the algorithmic details. Monotone Boolean functions have the added benefit of being intuitive. This property is perhaps the most important in learning environments, especially when human interaction is involved, since people tend to make better use of knowledge they can easily interpret, understand, validate, and remember. The main goal of this dissertation is to design new algorithms that can minimize the average number of queries used to completely reconstruct monotone Boolean functions defined on a finite set of vectors V = {0,1}^n. The optimal query selections are found via a recursive algorithm in exponential time (in the size of V). The optimality conditions are then summarized in the simple form of evaluative criteria, which are near optimal and only take polynomial time to compute. Extensive unbiased empirical results show that the evaluative criterion approach is far superior to any of the existing methods. In fact, the reduction in average number of queries increases exponentially with the number of variables n, and faster than exponentially with the oracle\u27s error rate

    Predicting Cause-Effect Relationships from Incomplete Discrete Observations

    No full text
    This paper addresses a prediction problem occurring frequently in practice. The problem consists in predicting the value of a function on the basis of discrete observational data that are incomplete in two senses. Only certain arguments of the function are observed, and the function value is observed only for certain combinations of values of these arguments. The problem is considered under a monotonicity condition that is natural in many applications. Applications to tax auditing, medicine, and real estate valuation are discussed. In particular, a special class of problems is identified for which the best monotone prediction can be found in polynomial tim

    Predicting Cause-Effect Relationships from Incomplete Discrete Observations

    No full text
    We address a prediction problem that frequently occurs in practice. We wish to predict the value of a function on the basis of discrete observational dat a that are incomplete in two senses. Only certain arguments of the function ar e observed, and the function value is observed only for certain combinations of values of these arguments. We solve the problem under a monotonicity condition that is natural in many applications, and we discuss applications t o tax auditing, medicine, and real estate valuation. In particular, we display a special class of problems for which the best mono tone prediction can be found in polynomial time. 1 Introduction The problem of establishing cause-effect relationship based on incomplete observations was studied in [4]. In this paper we address the problem of finding a good approximation of an unknown discrete function on the basis of a set of observations, which is incomplete in two senses. We observe the values of only GSIA Working Paper 1991..
    corecore