650,477 research outputs found

    Reconciling modern machine learning practice and the bias-variance trade-off

    Full text link
    Breakthroughs in machine learning are rapidly changing science and society, yet our fundamental understanding of this technology has lagged far behind. Indeed, one of the central tenets of the field, the bias-variance trade-off, appears to be at odds with the observed behavior of methods used in the modern machine learning practice. The bias-variance trade-off implies that a model should balance under-fitting and over-fitting: rich enough to express underlying structure in data, simple enough to avoid fitting spurious patterns. However, in the modern practice, very rich models such as neural networks are trained to exactly fit (i.e., interpolate) the data. Classically, such models would be considered over-fit, and yet they often obtain high accuracy on test data. This apparent contradiction has raised questions about the mathematical foundations of machine learning and their relevance to practitioners. In this paper, we reconcile the classical understanding and the modern practice within a unified performance curve. This "double descent" curve subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance. We provide evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets, and we posit a mechanism for its emergence. This connection between the performance and the structure of machine learning models delineates the limits of classical analyses, and has implications for both the theory and practice of machine learning

    Review on Machine Learning Algorithms for Weather Forecasting Issues

    Get PDF
    Machine leaning is a ground of recent research that officially focuses on the theory, performance, and properties of learning systems and algorithms. It is particularly cross disciplinary field building upon ideas from many different kinds of fields such as artificial intelligence, optimization theory, information theory, statistics, cognitive science, optimal control, and many other disciplines of science, engineering, and mathematics. Since implementation in a wide range of applications, machine learning has covered almost every scientific domain, which has brought great impact on the science and society. Machine learning techniques has been used on a variety of problems, including recommendation engines, recognition systems, informatics and data mining, and autonomous control systems. This research paper compared different machine algorithms for classification. Classification is used when the desired output is a discrete label

    Improved decision making with similarity based machine learning

    Full text link
    Despite their fundamental importance for science and society at large, experimental design decisions are often plagued by extreme data scarcity which severely hampers the use of modern ready-made machine learning models as they rely heavily on the paradigm, 'the bigger the data the better'. Presenting similarity based machine learning we show how to reduce these data needs such that decision making can be objectively improved in certain problem classes. After introducing similarity machine learning for the harmonic oscillator and the Rosenbrock function, we describe real-world applications to very scarce data scenarios which include (i) quantum mechanics based molecular design, (ii) organic synthesis planning, and (iii) real estate investment decisions in the city of Berlin, Germany

    Capturing human category representations by sampling in deep feature spaces

    Get PDF
    Understanding how people represent categories is a core problem in cognitive science. Decades of research have yielded a variety of formal theories of categories, but validating them with naturalistic stimuli is difficult. The challenge is that human category representations cannot be directly observed and running informative experiments with naturalistic stimuli such as images requires a workable representation of these stimuli. Deep neural networks have recently been successful in solving a range of computer vision tasks and provide a way to compactly represent image features. Here, we introduce a method to estimate the structure of human categories that combines ideas from cognitive science and machine learning, blending human-based algorithms with state-of-the-art deep image generators. We provide qualitative and quantitative results as a proof-of-concept for the method's feasibility. Samples drawn from human distributions rival those from state-of-the-art generative models in quality and outperform alternative methods for estimating the structure of human categories.Comment: 6 pages, 5 figures, 1 table. Accepted as a paper to the 40th Annual Meeting of the Cognitive Science Society (CogSci 2018

    Teaching data science in school: Digital learning material on predictive text systems

    Get PDF
    Data science and especially machine learning issues are currently the subject of lively discussions in society. Many research areas now use machine learning methods, which, especially in combination with increased computer power, has led to major advances in recent years. One example is natural language processing. A large number of technologies and applications that we use every day are based on methods from this area. For example, students encounter these technologies in everyday life through the use of Siri and Alexa but also when chatting with friends they are supported by assistance systems such as predictive text systems that give suggestions for the next word. This proximity to everyday life is used to give students a motivating approach to data science concepts. In this paper we will show how mathematical modeling of data science problems can be addressed with students from tenth grade or higher using digital learning material on predictive text systems

    Virtual Astronomy, Information Technology, and the New Scientific Methodology

    Get PDF
    All sciences, including astronomy, are now entering the era of information abundance. The exponentially increasing volume and complexity of modern data sets promises to transform the scientific practice, but also poses a number of common technological challenges. The Virtual Observatory concept is the astronomical community's response to these challenges: it aims to harness the progress in information technology in the service of astronomy, and at the same time provide a valuable testbed for information technology and applied computer science. Challenges broadly fall into two categories: data handling (or "data farming"), including issues such as archives, intelligent storage, databases, interoperability, fast networks, etc., and data mining, data understanding, and knowledge discovery, which include issues such as automated clustering and classification, multivariate correlation searches, pattern recognition, visualization in highly hyperdimensional parameter spaces, etc., as well as various applications of machine learning in these contexts. Such techniques are forming a methodological foundation for science with massive and complex data sets in general, and are likely to have a much broather impact on the modern society, commerce, information economy, security, etc. There is a powerful emerging synergy between the computationally enabled science and the science-driven computing, which will drive the progress in science, scholarship, and many other venues in the 21st century

    Report on the Second Working Group Meeting of the “AG Marketing”

    Get PDF
    In this article, we report on the second working group meeting of the “AG Marketing” within the GfKl Data Science Society. The meeting was held online on August 17 and 18, 2020. The presented topics reflect ongoing trends of using innovative methods and models for preference measurement as well as new data sources and machine learning approaches in quantitative marketing
    • …
    corecore