3,199 research outputs found

    Decision by sampling

    Get PDF
    We present a theory of decision by sampling (DbS) in which, in contrast with traditional models, there are no underlying psychoeconomic scales. Instead, we assume that an attribute’s subjective value is constructed from a series of binary, ordinal comparisons to a sample of attribute values drawn from memory and is its rank within the sample. We assume that the sample reflects both the immediate distribution of attribute values from the current decision’s context and also the background, real-world distribution of attribute values. DbS accounts for concave utility functions; losses looming larger than gains; hyperbolic temporal discounting; and the overestimation of small probabilities and the underestimation of large probabilities

    Decision by sampling

    Get PDF
    We present a theory of decision by sampling (DbS) in which, in contrast with traditional models, there are no underlying psychoeconomic scales. Instead, we assume that an attribute's subjective value is constructed from a series of binary, ordinal comparisons to a sample of attribute values drawn from memory and is its rank within the sample. We assume that the sample reflects both the immediate distribution of attribute values from the current decision's context and also the background, real-world distribution of attribute values. DbS accounts for concave utility functions; losses looming larger than gains; hyperbolic temporal discounting; and the overestimation of small probabilities and the underestimation of large probabilities

    Learning, Social Intelligence and the Turing Test - why an "out-of-the-box" Turing Machine will not pass the Turing Test

    Get PDF
    The Turing Test (TT) checks for human intelligence, rather than any putative general intelligence. It involves repeated interaction requiring learning in the form of adaption to the human conversation partner. It is a macro-level post-hoc test in contrast to the definition of a Turing Machine (TM), which is a prior micro-level definition. This raises the question of whether learning is just another computational process, i.e. can be implemented as a TM. Here we argue that learning or adaption is fundamentally different from computation, though it does involve processes that can be seen as computations. To illustrate this difference we compare (a) designing a TM and (b) learning a TM, defining them for the purpose of the argument. We show that there is a well-defined sequence of problems which are not effectively designable but are learnable, in the form of the bounded halting problem. Some characteristics of human intelligence are reviewed including it's: interactive nature, learning abilities, imitative tendencies, linguistic ability and context-dependency. A story that explains some of these is the Social Intelligence Hypothesis. If this is broadly correct, this points to the necessity of a considerable period of acculturation (social learning in context) if an artificial intelligence is to pass the TT. Whilst it is always possible to 'compile' the results of learning into a TM, this would not be a designed TM and would not be able to continually adapt (pass future TTs). We conclude three things, namely that: a purely "designed" TM will never pass the TT; that there is no such thing as a general intelligence since it necessary involves learning; and that learning/adaption and computation should be clearly distinguished.Comment: 10 pages, invited talk at Turing Centenary Conference CiE 2012, special session on "The Turing Test and Thinking Machines

    Fast Incremental SVDD Learning Algorithm with the Gaussian Kernel

    Full text link
    Support vector data description (SVDD) is a machine learning technique that is used for single-class classification and outlier detection. The idea of SVDD is to find a set of support vectors that defines a boundary around data. When dealing with online or large data, existing batch SVDD methods have to be rerun in each iteration. We propose an incremental learning algorithm for SVDD that uses the Gaussian kernel. This algorithm builds on the observation that all support vectors on the boundary have the same distance to the center of sphere in a higher-dimensional feature space as mapped by the Gaussian kernel function. Each iteration involves only the existing support vectors and the new data point. Moreover, the algorithm is based solely on matrix manipulations; the support vectors and their corresponding Lagrange multiplier αi\alpha_i's are automatically selected and determined in each iteration. It can be seen that the complexity of our algorithm in each iteration is only O(k2)O(k^2), where kk is the number of support vectors. Experimental results on some real data sets indicate that FISVDD demonstrates significant gains in efficiency with almost no loss in either outlier detection accuracy or objective function value.Comment: 18 pages, 1 table, 4 figure

    Encrypted statistical machine learning: new privacy preserving methods

    Full text link
    We present two new statistical machine learning methods designed to learn on fully homomorphic encrypted (FHE) data. The introduction of FHE schemes following Gentry (2009) opens up the prospect of privacy preserving statistical machine learning analysis and modelling of encrypted data without compromising security constraints. We propose tailored algorithms for applying extremely random forests, involving a new cryptographic stochastic fraction estimator, and na\"{i}ve Bayes, involving a semi-parametric model for the class decision boundary, and show how they can be used to learn and predict from encrypted data. We demonstrate that these techniques perform competitively on a variety of classification data sets and provide detailed information about the computational practicalities of these and other FHE methods.Comment: 39 page

    Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality

    Full text link
    The final publication is available at Springer via http://dx.doi.org/DOI 10.1007/s10618-014-0378-6. Published online.Knowledge discovery on biomedical data can be based on on-line, data-stream analyses, or using retrospective, timestamped, off-line datasets. In both cases, changes in the processes that generate data or in their quality features through time may hinder either the knowledge discovery process or the generalization of past knowledge. These problems can be seen as a lack of data temporal stability. This work establishes the temporal stability as a data quality dimension and proposes new methods for its assessment based on a probabilistic framework. Concretely, methods are proposed for (1) monitoring changes, and (2) characterizing changes, trends and detecting temporal subgroups. First, a probabilistic change detection algorithm is proposed based on the Statistical Process Control of the posterior Beta distribution of the Jensen–Shannon distance, with a memoryless forgetting mechanism. This algorithm (PDF-SPC) classifies the degree of current change in three states: In-Control, Warning, and Out-of-Control. Second, a novel method is proposed to visualize and characterize the temporal changes of data based on the projection of a non-parametric information-geometric statistical manifold of time windows. This projection facilitates the exploration of temporal trends using the proposed IGT-plot and, by means of unsupervised learning methods, discovering conceptually-related temporal subgroups. Methods are evaluated using real and simulated data based on the National Hospital Discharge Survey (NHDS) dataset.The work by C Saez has been supported by an Erasmus Lifelong Learning Programme 2013 Grant. This work has been supported by own IBIME funds. The authors thank Dr. Gregor Stiglic, from the Univeristy of Maribor, Slovenia, for his support on the NHDS data.Sáez Silvestre, C.; Pereira Rodrigues, P.; Gama, J.; Robles Viejo, M.; García Gómez, JM. (2014). Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality. Data Mining and Knowledge Discovery. 28:1-1. doi:10.1007/s10618-014-0378-6S1128Aggarwal C (2003) A framework for diagnosing changes in evolving data streams. In Proceedings of the International Conference on Management of Data ACM SIGMOD, pp 575–586Amari SI, Nagaoka H (2007) Methods of information geometry. American Mathematical Society, Providence, RIArias E (2014) United states life tables, 2009. Natl Vital Statist Rep 62(7): 1–63Aspden P, Corrigan JM, Wolcott J, Erickson SM (2004) Patient safety: achieving a new standard for care. Committee on data standards for patient safety. The National Academies Press, Washington, DCBasseville M, Nikiforov IV (1993) Detection of abrupt changes: theory and application. Prentice-Hall Inc, Upper Saddle River, NJBorg I, Groenen PJF (2010) Modern multidimensional scaling: theory and applications. Springer, BerlinBowman AW, Azzalini A (1997) Applied smoothing techniques for data analysis: the Kernel approach with S-plus illustrations (Oxford statistical science series). Oxford University Press, OxfordBrandes U, Pich C (2007) Eigensolver methods for progressive multidimensional scaling of large data. In: Kaufmann M, Wagner D (eds) Graph drawing. Lecture notes in computer science, vol 4372. Springer, Berlin, pp 42–53Brockwell P, Davis R (2009) Time series: theory and methods., Springer series in statisticsSpringer, BerlinCesario SK (2002) The “Christmas Effect” and other biometeorologic influences on childbearing and the health of women. J Obstet Gynecol Neonatal Nurs 31(5):526–535Chakrabarti K, Garofalakis M, Rastogi R, Shim K (2001) Approximate query processing using wavelets. VLDB J 10(2–3):199–223Cruz-Correia RJ, Pereira Rodrigues P, Freitas A, Canario Almeida F, Chen R, Costa-Pereira A (2010) Data quality and integration issues in electronic health records. Information discovery on electronic health records, pp 55–96Csiszár I (1967) Information-type measures of difference of probability distributions and indirect observations. Studia Sci Math Hungar 2:299–318Dasu T, Krishnan S, Lin D, Venkatasubramanian S, Yi K (2009) Change (detection) you can believe. In: Finding distributional shifts in data streams. In: Proceedings of the 8th international symposium on intelligent data analysis: advances in intelligent data analysis VIII, IDA ’09. Springer, Berlin, pp 21–34Endres D, Schindelin J (2003) A new metric for probability distributions. IEEE Trans Inform Theory 49(7):1858–1860Gama J, Gaber MM (2007) Learning from data streams: processing techniques in sensor networks. Springer, BerlinGama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Bazzan A, Labidi S (eds) Advances in artificial intelligence—SBIA 2004., Lecture notes in computer scienceSpringer, Berlin, pp 286–295Gama J (2010) Knowledge discovery from data streams, 1st edn. Chapman & Hall, LondonGehrke J, Korn F, Srivastava D (2001) On computing correlated aggregates over continual data streams. SIGMOD Rec 30(2):13–24Guha S, Shim K, Woo J (2004) Rehist: relative error histogram construction algorithms. In: Proceedings of the thirtieth international conference on very large data bases VLDB, pp 300–311Han J, Kamber M, Pei J (2012) Data mining: concepts and techniques. Morgan Kaufmann, Elsevier, Burlington, MAHowden LM, Meyer JA, (2011) Age and sex composition. 2010 Census Briefs US Department of Commerce. Economics and Statistics Administration, US Census BureauHrovat G, Stiglic G, Kokol P, Ojstersek M (2014) Contrasting temporal trend discovery for large healthcare databases. Comput Methods Program Biomed 113(1):251–257Keim DA (2000) Designing pixel-oriented visualization techniques: theory and applications. IEEE Trans Vis Comput Graph 6(1):59–78Kifer D, Ben-David S, Gehrke J (2004) Detecting change in data streams. In: Proceedings of the thirtieth international conference on Very large data bases, VLDB Endowment, VLDB ’04, vol 30, pp 180–191Klinkenberg R, Renz I (1998) Adaptive information filtering: Learning in the presence of concept drifts. In: Workshop notes of the ICML/AAAI-98 workshop learning for text categorization. AAAI Press, Menlo Park, pp 33–40Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biolog Cybern 43(1):59–69Lin J (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inform Theory 37:145–151Mitchell TM, Caruana R, Freitag D, McDermott J, Zabowski D (1994) Experience with a learning personal assistant. Commun ACM 37(7):80–91Mouss H, Mouss D, Mouss N, Sefouhi L (2004) Test of page-hinckley, an approach for fault detection in an agro-alimentary production system. In: Proceedings of the 5th Asian Control Conference, vol 2, pp 815–818National Research Council (2011) Explaining different levels of longevity in high-income countries. The National Academies Press, Washington, DCNHDS (2010) United states department of health and human services. Centers for disease control and prevention. National center for health statistics. National hospital discharge survey codebookNHDS (2014) National Center for Health Statistics, National Hospital Discharge Survey (NHDS) data, US Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Health Statistics, Hyattsville, Maryland. http://www.cdc.gov/nchs/nhds.htmPapadimitriou S, Sun J, Faloutsos C (2005) Streaming pattern discovery in multiple time-series. In: Proceedings of the 31st international conference on very large data bases, VLDB endowment, VLDB ’05, pp 697–708Parzen E (1962) On estimation of a probability density function and mode. Ann Math Statist 33(3):1065–1076Ramsay JO, Silverman BW (2005) Functional data analysis. Springer, New YorkRodrigues P, Correia R (2013) Streaming virtual patient records. In: Krempl G, Zliobaite I, Wang Y, Forman G (eds) Real-world challenges for data stream mining. University Magdeburg, Otto-von-Guericke, pp 34–37Rodrigues P, Gama J, Pedroso J (2008) Hierarchical clustering of time-series data streams. IEEE Trans Knowl Data Eng 20(5):615–627Rodrigues PP, Gama Ja (2010) A simple dense pixel visualization for mobile sensor data mining. In: Proceedings of the second international conference on knowledge discovery from sensor data, sensor-KDD’08. Springer, Berlin, pp 175–189Rodrigues PP, Gama J, Sebastiã o R (2010) Memoryless fading windows in ubiquitous settings. In Proceedings of ubiquitous data mining (UDM) workshop in conjunction with the 19th european conference on artificial intelligence—ECAI 2010, pp 27–32Rodrigues PP, Sebastiã o R, Santos CC (2011) Improving cardiotocography monitoring: a memory-less stream learning approach. In: Proceedings of the learning from medical data streams workshop. Bled, SloveniaRubner Y, Tomasi C, Guibas L (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vision 40(2):99–121Sebastião R, Gama J (2009) A study on change detection methods. In: 4th Portuguese conference on artificial intelligenceSebastião R, Gama J, Rodrigues P, Bernardes J (2010) Monitoring incremental histogram distribution for change detection in data streams. In: Gaber M, Vatsavai R, Omitaomu O, Gama J, Chawla N, Ganguly A (eds) Knowledge discovery from sensor data, vol 5840., Lecture notes in computer science. Springer, Berlin, pp 25–42Sebastião R, Silva M, Rabiço R, Gama J, Mendonça T (2013) Real-time algorithm for changes detection in depth of anesthesia signals. Evol Syst 4(1):3–12Sáez C, Martínez-Miranda J, Robles M, García-Gómez JM (2012) O rganizing data quality assessment of shifting biomedical data. Stud Health Technol Inform 180:721–725Sáez C, Robles M, García-Gómez JM (2013) Comparative study of probability distribution distances to define a metric for the stability of multi-source biomedical research data. In: Engineering in medicine and biology society (EMBC), 2013 35th annual international conference of the IEEE, pp 3226–3229Sáez C, Robles M, García-Gómez JM (2014) Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distances. Statist Method Med Res (forthcoming)Shewhart WA, Deming WE (1939) Statistical method from the viewpoint of quality control. Graduate School of the Department of Agriculture, Washington, DCShimazaki H, Shinomoto S (2010) Kernel bandwidth optimization in spike rate estimation. J Comput Neurosci 29(1–2):171–182Solberg LI, Engebretson KI, Sperl-Hillen JM, Hroscikoski MC, O’Connor PJ (2006) Are claims data accurate enough to identify patients for performance measures or quality improvement? the case of diabetes, heart disease, and depression. Am J Med Qual 21(4):238–245Spiliopoulou M, Ntoutsi I, Theodoridis Y, Schult R (2006) monic: modeling and monitoring cluster transitions. In: Proceedings of the 12th ACm SIGKDD international conference on knowledge discovery and data mining, KDD ’06. ACm, New York, NY, pp 706–711Stiglic G, Kokol P (2011) Interpretability of sudden concept drift in medical informatics domain. In Proceedings of the 2010 IEEE international conference on data mining workshops, pp 609–613Torgerson W (1952) Multidimensional scaling: I theory and method. Psychometrika 17(4):401–419Wang RY, Strong DM (1996) Beyond accuracy: what data quality means to data consumers. J Manage Inform Syst 12(4):5–33Weiskopf NG, Weng C (2013) M ethods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 20(1):144–151Wellings K, Macdowall W, Catchpole M, Goodrich J (1999) Seasonal variations in sexual activity and their implications for sexual health promotion. J R Soc Med 92(2):60–64Westgard JO, Barry PL (2010) Basic QC practices: training in statistical quality control for medical laboratories. Westgard Quality Corporation, Madison, WIWidmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–10

    Gaussian Artmap: A Neural Network for Fast Incremental Learning of Noisy Multidimensional Maps

    Full text link
    A new neural network architecture for incremental supervised learning of analalog multidimensional maps is introduced. The architecture, called Gaussian ARTMAP, is a synthesis of a Gaussian classifier and an Adaptive Resonance Theory (ART) neural network, achieved by defining the ART choice function as the discriminant function of a Gaussian classifer with separable distributions, and the ART match function as the same, but with the a priori probabilities of the distributions discounted. While Gaussian ARTMAP retains the attractive parallel computing and fast learning properties of fuzzy ARTMAP, it learns a more efficient internal representation of a mapping while being more resistant to noise than fuzzy ARTMAP on a number of benchmark databases. Several simulations are presented which demonstrate that Gaussian ARTMAP consistently obtains a better trade-off of classification rate to number of categories than fuzzy ARTMAP. Results on a vowel classiflcation problem are also presented which demonstrate that Gaussian ARTMAP outperforms many other classifiers.National Science Foundation (IRI 90-00530); Office of Naval Research (N00014-92-J-4015, 40014-91-J-4100
    corecore