650,477 research outputs found
Reconciling modern machine learning practice and the bias-variance trade-off
Breakthroughs in machine learning are rapidly changing science and society,
yet our fundamental understanding of this technology has lagged far behind.
Indeed, one of the central tenets of the field, the bias-variance trade-off,
appears to be at odds with the observed behavior of methods used in the modern
machine learning practice. The bias-variance trade-off implies that a model
should balance under-fitting and over-fitting: rich enough to express
underlying structure in data, simple enough to avoid fitting spurious patterns.
However, in the modern practice, very rich models such as neural networks are
trained to exactly fit (i.e., interpolate) the data. Classically, such models
would be considered over-fit, and yet they often obtain high accuracy on test
data. This apparent contradiction has raised questions about the mathematical
foundations of machine learning and their relevance to practitioners.
In this paper, we reconcile the classical understanding and the modern
practice within a unified performance curve. This "double descent" curve
subsumes the textbook U-shaped bias-variance trade-off curve by showing how
increasing model capacity beyond the point of interpolation results in improved
performance. We provide evidence for the existence and ubiquity of double
descent for a wide spectrum of models and datasets, and we posit a mechanism
for its emergence. This connection between the performance and the structure of
machine learning models delineates the limits of classical analyses, and has
implications for both the theory and practice of machine learning
Review on Machine Learning Algorithms for Weather Forecasting Issues
Machine leaning is a ground of recent research that officially focuses on the theory, performance, and properties of learning systems and algorithms. It is particularly cross disciplinary field building upon ideas from many different kinds of fields such as artificial intelligence, optimization theory, information theory, statistics, cognitive science, optimal control, and many other disciplines of science, engineering, and mathematics. Since implementation in a wide range of applications, machine learning has covered almost every scientific domain, which has brought great impact on the science and society. Machine learning techniques has been used on a variety of problems, including recommendation engines, recognition systems, informatics and data mining, and autonomous control systems. This research paper compared different machine algorithms for classification. Classification is used when the desired output is a discrete label
Improved decision making with similarity based machine learning
Despite their fundamental importance for science and society at large,
experimental design decisions are often plagued by extreme data scarcity which
severely hampers the use of modern ready-made machine learning models as they
rely heavily on the paradigm, 'the bigger the data the better'. Presenting
similarity based machine learning we show how to reduce these data needs such
that decision making can be objectively improved in certain problem classes.
After introducing similarity machine learning for the harmonic oscillator and
the Rosenbrock function, we describe real-world applications to very scarce
data scenarios which include (i) quantum mechanics based molecular design, (ii)
organic synthesis planning, and (iii) real estate investment decisions in the
city of Berlin, Germany
Capturing human category representations by sampling in deep feature spaces
Understanding how people represent categories is a core problem in cognitive
science. Decades of research have yielded a variety of formal theories of
categories, but validating them with naturalistic stimuli is difficult. The
challenge is that human category representations cannot be directly observed
and running informative experiments with naturalistic stimuli such as images
requires a workable representation of these stimuli. Deep neural networks have
recently been successful in solving a range of computer vision tasks and
provide a way to compactly represent image features. Here, we introduce a
method to estimate the structure of human categories that combines ideas from
cognitive science and machine learning, blending human-based algorithms with
state-of-the-art deep image generators. We provide qualitative and quantitative
results as a proof-of-concept for the method's feasibility. Samples drawn from
human distributions rival those from state-of-the-art generative models in
quality and outperform alternative methods for estimating the structure of
human categories.Comment: 6 pages, 5 figures, 1 table. Accepted as a paper to the 40th Annual
Meeting of the Cognitive Science Society (CogSci 2018
Teaching data science in school: Digital learning material on predictive text systems
Data science and especially machine learning issues are currently the subject of lively discussions in society. Many research areas now use machine learning methods, which, especially in combination with increased computer power, has led to major advances in recent years. One example is natural language processing. A large number of technologies and applications that we use every day are based on methods from this area. For example, students encounter these technologies in everyday life through the use of Siri and Alexa but also when chatting with friends they are supported by assistance systems such as predictive text systems that give suggestions for the next word. This proximity to everyday life is used to give students a motivating approach to data science concepts. In this paper we will show how mathematical modeling of data science problems can be addressed with students from tenth grade or higher using digital learning material on predictive text systems
Virtual Astronomy, Information Technology, and the New Scientific Methodology
All sciences, including astronomy, are now entering the era of information abundance. The exponentially increasing volume and complexity of modern data sets promises to transform the scientific practice, but also poses a number of common technological challenges. The Virtual Observatory concept is the astronomical community's response to these challenges: it aims to harness the progress in information technology in the service of astronomy, and at the same time provide a valuable testbed for information technology and applied computer science. Challenges broadly fall into two categories: data handling (or "data farming"), including issues such as archives, intelligent storage, databases, interoperability, fast networks, etc., and data mining, data understanding, and knowledge discovery, which include issues such as automated clustering and classification, multivariate correlation searches, pattern recognition, visualization in highly hyperdimensional parameter spaces, etc., as well as various applications of machine learning in these contexts. Such techniques are forming a methodological foundation for science with massive and complex data sets in general, and are likely to have a much broather impact on the modern society, commerce, information economy, security, etc. There is a powerful emerging synergy between the
computationally enabled science and the science-driven computing, which will drive the progress in science, scholarship, and many other venues in the 21st century
Report on the Second Working Group Meeting of the “AG Marketing”
In this article, we report on the second working group meeting of the “AG Marketing” within the GfKl Data Science Society. The meeting was held online on August 17 and 18, 2020. The presented topics reflect ongoing trends of using innovative methods and models for preference measurement as well as new data sources and machine learning approaches in quantitative marketing
- …