Search CORE

12 research outputs found

Decision tree design from a communication theory standpoint

Author: Goodman Rodney M.
Smyth Padhraic
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/1988
Field of study

A communication theory approach to decision tree design based on a top-town mutual information algorithm is presented. It is shown that this algorithm is equivalent to a form of Shannon-Fano prefix coding, and several fundamental bounds relating decision-tree parameters are derived. The bounds are used in conjunction with a rate-distortion interpretation of tree design to explain several phenomena previously observed in practical decision-tree design. A termination rule for the algorithm called the delta-entropy rule is proposed that improves its robustness in the presence of noise. Simulation results are presented, showing that the tree classifiers derived by the algorithm compare favourably to the single nearest neighbour classifier

Caltech Authors

Decision tree design from a communication theory standpoint

Author: Goodman Rodney M.
Smyth Padhraic
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/1988
Field of study

Query Learning with Exponential Query Costs

Author: Bellala Gowtham
Bhavnani Suresh
Scott Clayton
Publication venue
Publication date: 01/01/2010
Field of study

In query learning, the goal is to identify an unknown object while minimizing the number of "yes" or "no" questions (queries) posed about that object. A well-studied algorithm for query learning is known as generalized binary search (GBS). We show that GBS is a greedy algorithm to optimize the expected number of queries needed to identify the unknown object. We also generalize GBS in two ways. First, we consider the case where the cost of querying grows exponentially in the number of queries and the goal is to minimize the expected exponential cost. Then, we consider the case where the objects are partitioned into groups, and the objective is to identify only the group to which the object belongs. We derive algorithms to address these issues in a common, information-theoretic framework. In particular, we present an exact formula for the objective function in each case involving Shannon or Renyi entropy, and develop a greedy algorithm for minimizing it. Our algorithms are demonstrated on two applications of query learning, active learning and emergency response.Comment: 15 page

arXiv.org e-Print Archive

CiteSeerX

Speeding up rendering of hybrid surface and volume models

Author: Ferré Bergadà Maria
Puig Puig Anna
Tost Pardell Daniela
Publication venue
Publication date: 01/01/2003
Field of study

Hybrid rendering of volume and polygonal model is an interesting feature of visualization systems, since it helps users to better understand the relationships between internal structures of the volume and fitted surfaces as well as external surfaces. Most of the existing bibliography focuses at the problem of correctly integrating in depth both types of information. The rendering method proposed in this paper is built on these previous results. It is aimed at solving a different problem: how to efficiently access to selected information of a hybrid model. We propose to construct a decision tree (the Rendering Decision Tree), which together with an auxiliary run-length representation of the model avoids visiting unselected surfaces and internal regions during a traversal of the model.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Search Through Systematic Set Enumeration

Author: Rymon Ron
Publication venue: ScholarlyCommons
Publication date: 01/08/1992
Field of study

In many problem domains, solutions take the form of unordered sets. We present the Set-Enumerations (SE)-tree - a vehicle for representing sets and/or enumerating them in a best-first fashion. We demonstrate its usefulness as the basis for a unifying search-based framework for domains where minimal (maximal) elements of a power set are targeted, where minimal (maximal) partial instantiations of a set of variables are sought, or where a composite decision is not dependent on the order in which its primitive component-decisions are taken. Particular instantiations of SE-tree-based algorithms for some AI problem domains are used to demonstrate the general features of the approach. These algorithms are compared theoretically and empirically with current algorithms

ScholarlyCommons@Penn

On the Qualitative Behavior of Impurity-Based Splitting Rules I: The Minima-Free Property

Author: Brodley Carla E.
Codrington Craig W.
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/1997
Field of study

We show that all strictly convex n impurity measures lead to splits at boundary points, and furthermore show that certain rational splitting rules, notably the information gain ratio, also have this property. A slightly weaker result is shown to hold for impurity measures that are only convex n, such as Inaccuracy

CiteSeerX

Purdue E-Pubs

Projecting land use changes using parcel-level data : model development and application to Hunterdon County, New Jersey

Author: Ballesteros Florencio C., Jr.
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2008
Field of study

This dissertation is to develop a parcel-based spatial land use change prediction model by coupling various machine learning and interpretation algorithms such as cellular automata (CA) and decision tree (DT). CA is a collection of cells that evolves through a number of discrete time steps according to a set of transition rules based on the state of each cell and the characteristics of its neighboring cells. DT is a data mining and machine learning tool that extracts the patterns of decision process from observed cell behaviors and their affecting factors. In this dissertation, CA is used to predict the future land use status of cadastral parcels based on a set of transition rules derived from a set of identified land use change driving factors using DT. Although CA and DT have been applied separately in various land use change models in the literature, no studies attempted to integrate them. This DT-based CA model developed in this dissertation represents the first kind of such integration in land use change modeling. The coupled model would be able to handle a large set of driving factors and also avoid subjective bias when deriving the transition rules. The coupled model uses the cadastral parcel as a unit of analysis, which has practical policy implications because the responses of land use changes to various policy usually take place at the parcel level. Since parcel varies by their sizes and shapes, its use as a unit of analysis does make it difficult to apply CA, which initially designed to handle regular grid cells. This dissertation improves the treatment of the irregular cell in CA-based land use change models in literature by defining a cell\u27s neighborhood as a fixed distance buffer along the parcel boundary. The DT-based CA model was developed and validated in Hunterdon County, New Jersey. The data on historical land uses and various land use change driving factors for Hunterdon County were collected and processed using a Geographic Information System (GIS). Specifically, the county land uses in 1986, I995 and 2002 were overlaid with a parcel map to create parcel-based land use maps. The single land use in each parcel is based on a classification scheme developed thorough literature review and empirical testing in the study area. The possible land use status considered for each parcel is agriculture, barren land, forest, urban, water or wetlands following the land use/land cover classification by the New Jersey Department of Environment Protection. The identified driving factors for the future status of the parcel includes the present land use type, the number of soil restrictions to urban development, and the size of the parcel, the amount of wetlands within the parcel, the distribution of land uses in the neighborhood of the parcel, the distances to the nearest streams, urban centers and major roads. A set of transition rules illustrating the land use change processes during the period 1986-1995 were developed using a TD software J48 Classifier. The derived transition rules were applied to the 1995 land use data in a CA model Agent Analyst/RePast (Recursive Porous Agent Simulation Toolkit) to predict the spatial land use pattern in 2004, which were then validated by the actual land use map in 2002. The DT-based CA model had an overall accuracy of 84.46 percent in terms of the number of parcels and of 80.92 percent in terms of the total acreage in predicting land use changes. The model shows much higher capacity in predicting the quantitative changes than the locational changes in land use. The validated model was applied to simulate the 2011 land use patterns in Hunterdon County based on its actual land uses in 2002 under both business as usual and policy scenarios. The simulation results shows that successfully implementing current land use policies such as down zoning, open space and farmland preservation would prevent the total of 7,053 acres (741 acres of wetlands, 3,034 acres of agricultural lands, 250 acres of barren land, and 3,028 acres of forest) from future urban development in Hunterdon County during the period 2002-2011. The neighborhood of a parcel was defined by a 475-foot buffer along the parcel boundary in the study. The results of sensitivity analyses using two additional neighborhoods (237- and 712-foot buffers) indicate the insignificant impacts of the neighborhood size on the model outputs in this application

Digital Commons @ New Jersey Institute of Technology (NJIT)

Individual and combined AI models for complicated predictive forest type mapping using multisource GIS data

Author: Huang Zhi
Publication venue
Publication date: 21/09/2018
Field of study

The Australian National University