236,025 research outputs found
An extensive experimental survey of regression methods
Regression is a very relevant problem in machine learning, with many different available approaches. The current work presents a comparison of a large collection composed by 77 popular regression models which belong to 19 families: linear and generalized linear models, generalized additive models, least squares, projection methods, LASSO and ridge regression, Bayesian models, Gaussian processes, quantile regression, nearest neighbors, regression trees and rules, random forests, bagging and boosting, neural networks, deep learning and support vector regression. These methods are evaluated using all the regression datasets of the UCI machine learning repository (83 datasets), with some exceptions due to technical reasons. The experimental work identifies several outstanding regression models: the M5 rule-based model with corrections based on nearest neighbors (cubist), the gradient boosted machine (gbm), the boosting ensemble of regression trees (bstTree) and the M5 regression tree. Cubist achieves the best squared correlation (R2) in 15.7% of datasets being very near to it, with difference below 0.2 for 89.1% of datasets, and the median of these differences over the dataset collection is very low (0.0192), compared e.g. to the classical linear regression (0.150). However, cubist is slow and fails in several large datasets, while other similar regression models as M5 never fail and its difference to the best R2 is below 0.2 for 92.8% of datasets. Other well-performing regression models are the committee of neural networks (avNNet), extremely randomized regression trees (extraTrees, which achieves the best R2 in 33.7% of datasets), random forest (rf) and Δ-support vector regression (svr), but they are slower and fail in several datasets. The fastest regression model is least angle regression lars, which is 70 and 2,115 times faster than M5 and cubist, respectively. The model which requires least memory is non-negative least squares (nnls), about 2 GB, similarly to cubist, while M5 requires about 8 GB. For 97.6% of datasets there is a regression model among the 10 bests which is very near (difference below 0.1) to the best R2, which increases to 100% allowing differences of 0.2. Therefore, provided that our dataset and model collection are representative enough, the main conclusion of this study is that, for a new regression problem, some model in our top-10 should achieve R2 near to the best attainable for that problemThis work has received financial support from the Erasmus Mundus Euphrates programme [project number 2013-2540/001-001-EMA2], from the Xunta de Galicia (Centro singular de investigaciĂłn de Galicia, accreditation 2016â2019) and the European Union (European Regional Development Fund â ERDF), Project MTM2016â76969âP (Spanish State Research Agency, AEI)co-funded by the European Regional Development Fund (ERDF) and IAP network from Belgian Science PolicyS
To Give or Not to Give, That Is the Question: How Methodology Is Destiny in Dutch Giving Data
In research on giving, methodology is destiny. The volume of donations estimated from sample surveys strongly depends on the length of the questionnaire used to measure giving. By comparing two giving surveys from the Netherlands, the authors show that a short questionnaire on giving not only underestimates the volume of giving but also biases the effects of predictors of giving. Specifically, they find that a very short module leads to an underestimation of the effects of predictors of giving on the amount donated but an overestimation of their effects on the probability of charitable giving. Short survey modules may lead researchers to falsely reject or accept hypotheses on determinants of giving due to underreporting of donations.
Factors Affecting Abundance of Adult Karner Blues (\u3ci\u3eLycaeides Melissa Samuelis\u3c/i\u3e) (Lepidoptera: Lycaenidae) in Wisconsin Surveys 1987-95
At 141 pine-oak barrens in central and northwestern Wisconsin, 3,702 Karner blues (Lycaeides melissa samuelis Nabokov) were found in 81.1 hr of transect surveys during spring and 6,094 individuals in 116.6 hr during sumÂmer. Adults offive other closely related lycaenids occurred with Karner blues. The percentage of Karner blue males (of sexed individuals) correlated nega- tively with advancing date within brood, exceeded 50% on peak date within brood, but showed wide variability on a given date. Karner blues occasionally occurred up to 800 m from the nearest larval host, or in tiny, isolated host stands. However, all individuals were within 3-5 km of other larger Karner blue populations. Karner blue abundance significantly increased with decreasing latitude, increasing temperature, nearness to midpoint within brood, decreasing site canopy, increasing larval host abundance, and in summer compared to spring. Long-term monitoring sites showed dramatic but relatively similar fluctuations among broods (median of 2.8-fold change among ten brood pairs) that apparently varied by individual brood rather than season or year. Extensive dense host patches and dense Karner blues were in sites rep- resenting a diversity of management histories
Recommended from our members
Weather, climate, and hydrologic forecasting for the US Southwest: A survey
As part of a regional integrated assessment of climate vulnerability, a survey was conducted from June 1998 to May 2000 of weather, climate, and hydrologic forecasts with coverage of the US Southwest and an emphasis on the Colorado River Basin. The survey addresses the types of forecasts that were issued, the organizations that provided them, and techniques used in their generation. It reflects discussions with key personnel from organizations involved in producing or issuing forecasts, providing data for making forecasts, or serving as a link for communicating forecasts. During the survey period, users faced a complex and constantly changing mix of forecast products available from a variety of sources. The abundance of forecasts was not matched in the provision of corresponding interpretive materials, documentation about how the forecasts were generated, or reviews of past performance. Potential existed for confusing experimental and research products with others that had undergone a thorough review process, including official products issued by the National Weather Service. Contrasts between the state of meteorologic and hydrologic forecasting were notable, especially in the former's greater operational flexibility and more rapid incorporation of new observations and research products. Greater attention should be given to forecast content and communication, including visualization, expression of probabilistic forecasts and presentation of ancillary information. Regional climate models and use of climate forecasts in water supply forecasting offer rapid improvements in predictive capabilities for the Southwest. Forecasts and production details should be archived, and publicly available forecasts should be accompanied by performance evaluations that are relevant to users
Hypnosis and memory: two hundred years of adventures and still going!
One of the most persistent beliefs about hypnosis is its ability to transcend mnemonic abilities. This belief has paved the way to the use of hypnosis in the clinical and legal arenas. The authors review the phenomena of hypnotic hypermnesia, pseudo-memories, and amnesia in light of current knowledge of hypnosis and memory. The investigation of the relation between hypnosis and memory processes has played an important role in our understanding of memory in action. Hypnosis provides a fertile field to explore the social, neuropsychological, and cognitive variables at play when individuals are asked to remember or to forget their past. We suggest promising avenues of research that may further our knowledge of the building blocks of memories and the mechanisms that leads to forgetfulness
Evolutionary-based sparse regression for the experimental identification of duffing oscillator
In this paper, an evolutionary-based sparse regression algorithm is proposed and applied onto experimental data collected from a Duffing oscillator setup and numerical simulation data. Our purpose is to identify the Coulomb friction terms as part of the ordinary differential equation of the system. Correct identification of this nonlinear system using sparse identification is hugely dependent on selecting the correct form of nonlinearity included in the function library. Consequently, in this work, the evolutionary-based sparse identification is replacing the need for user knowledge when constructing the library in sparse identification. Constructing the library based on the data-driven evolutionary approach is an effective way to extend the space of nonlinear functions, allowing for the sparse regression to be applied on an extensive space of functions. The results show that the method provides an effective algorithm for the purpose of unveiling the physical nature of the Duffing oscillator. In addition, the robustness of the identification algorithm is investigated for various levels of noise in simulation. The proposed method has possible applications to other nonlinear dynamic systems in mechatronics, robotics, and electronics
Pose-Invariant 3D Face Alignment
Face alignment aims to estimate the locations of a set of landmarks for a
given image. This problem has received much attention as evidenced by the
recent advancement in both the methodology and performance. However, most of
the existing works neither explicitly handle face images with arbitrary poses,
nor perform large-scale experiments on non-frontal and profile face images. In
order to address these limitations, this paper proposes a novel face alignment
algorithm that estimates both 2D and 3D landmarks and their 2D visibilities for
a face image with an arbitrary pose. By integrating a 3D deformable model, a
cascaded coupled-regressor approach is designed to estimate both the camera
projection matrix and the 3D landmarks. Furthermore, the 3D model also allows
us to automatically estimate the 2D landmark visibilities via surface normals.
We gather a substantially larger collection of all-pose face images to evaluate
our algorithm and demonstrate superior performances than the state-of-the-art
methods
- âŠ