21,720 research outputs found

    Robust Classification for Imprecise Environments

    Get PDF
    In real-world environments it usually is difficult to specify target operating conditions precisely, for example, target misclassification costs. This uncertainty makes building robust classification systems problematic. We show that it is possible to build a hybrid classifier that will perform at least as well as the best available classifier for any target conditions. In some cases, the performance of the hybrid actually can surpass that of the best known classifier. This robust performance extends across a wide variety of comparison frameworks, including the optimization of metrics such as accuracy, expected cost, lift, precision, recall, and workforce utilization. The hybrid also is efficient to build, to store, and to update. The hybrid is based on a method for the comparison of classifier performance that is robust to imprecise class distributions and misclassification costs. The ROC convex hull (ROCCH) method combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers. The method is efficient and incremental, minimizes the management of classifier performance data, and allows for clear visual comparisons and sensitivity analyses. Finally, we point to empirical evidence that a robust hybrid classifier indeed is needed for many real-world problems.Comment: 24 pages, 12 figures. To be published in Machine Learning Journal. For related papers, see http://www.hpl.hp.com/personal/Tom_Fawcett/ROCCH

    Characterizing urban landscapes using fuzzy sets

    Get PDF
    Characterizing urban landscapes is important given the present and future projections of global population that favor urban growth. The definition of “urban” on a thematic map has proven to be problematic since urban areas are heterogeneous in terms of land use and land cover. Further, certain urban classes are inherently imprecise due to the difficulty in integrating various social and environmental inputs into a precise definition. Social components often include demographic patterns, transportation, building type and density while ecological components include soils, elevation, hydrology, climate, vegetation and tree cover. In this paper, we adopt a coupled human and natural system (CHANS) integrated scientific framework for characterizing urban landscapes. We implement the framework by adopting a fuzzy sets concept of “urban characterization” since fuzzy sets relate to classes of object with imprecise boundaries in which membership is a matter of degree. For dynamic mapping applications, user-defined classification schemes involving rules combining different social and ecological inputs can lead to a degree of quantification in class labeling varying from “highly urban” to “least urban”. A socio-economic perspective of urban may include threshold values for population and road network density while a more ecological perspective of urban may utilize the ratio of natural versus built area and percent forest cover. Threshold values are defined to derive the fuzzy rules of membership, in each case, and various combinations of rules offer a greater flexibility to characterize the many facets of the urban landscape. We illustrate the flexibility and utility of this fuzzy inference approach called the Fuzzy Urban Index for the Boston Metro region with five inputs and eighteen rules. The resulting classification map shows levels of fuzzy membership ranging from highly urban to least urban or rural in the Boston study region. We validate our approach using two experts assessing accuracy of the resulting fuzzy urban map. We discuss how our approach can be applied in other urban contexts with newly emerging descriptors of urban sustainability, urban ecology and urban metabolism.This research was partially supported by "Boston University Initiative on Cities Early Stage Urban Research Awards 2015-16" (Gopal & Phillips) and the Frederick S. Pardee Center for the Study of the Longer-Range Future at Boston University. We thank the anonymous reviewers for their careful reading of our manuscript and their many insightful comments and suggestions. (Boston University Initiative on Cities Early Stage Urban Research Awards; Frederick S. Pardee Center for the Study of the Longer-Range Future at Boston University)https://doi.org/10.1016/j.compenvurbsys.2016.02.002Published versio

    Statistical Modelling in Surveys without Neglecting "The Undecided": Multinomial Logistic Regression Models and Imprecise Classification Trees under Ontic Data Imprecision - extended version

    Get PDF
    In surveys, and most notably in election polls, undecided participants frequently constitute subgroups of their own with specific individual characteristics. While traditional survey methods and corresponding statistical models are inherently damned to neglect this valuable information, an ontic random set view provides us with the full power of the whole statistical modelling framework. We elaborate this idea for a multinomial logistic regression model (which can be derived as a discrete choice model for voting behaviour) and an imprecise classification tree, and apply them as a prototypic illustration to the German Longitudinal Election Study 2013. Our results corroborate the importance of a sophisticated, random set-based modelling. Furthermore, by reinterpreting the undecided respondents' answers as disjunctive random sets, general forecasts based on interval-valued point estimators are calculated

    Statistical modelling under epistemic data imprecision : some results on estimating multinomial distributions and logistic regression for coarse categorical data

    Get PDF
    Paper presented at 9th International Symposium on Imprecise Probability: Theories and Applications, Pescara, Italy, 2015. Abstract: The paper deals with parameter estimation for categorical data under epistemic data imprecision, where for a part of the data only coarse(ned) versions of the true values are observable. For different observation models formalizing the information available on the coarsening process, we derive the (typically set-valued) maximum likelihood estimators of the underlying distributions. We discuss the homogeneous case of independent and identically distributed variables as well as logistic regression under a categorical covariate. We start with the imprecise point estimator under an observation model describing the coarsening process without any further assumptions. Then we determine several sensitivity parameters that allow the refinement of the estimators in the presence of auxiliary information
    • 

    corecore