179,059 research outputs found

    Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation

    Get PDF
    Missing data is a widespread problem that can affect the ability to use data to construct effective prediction systems. We investigate a common machine learning technique that can tolerate missing values, namely C4.5, to predict cost using six real world software project databases. We analyze the predictive performance after using the k-NN missing data imputation technique to see if it is better to tolerate missing data or to try to impute missing values and then apply the C4.5 algorithm. For the investigation, we simulated three missingness mechanisms, three missing data patterns, and five missing data percentages. We found that the k-NN imputation can improve the prediction accuracy of C4.5. At the same time, both C4.5 and k-NN are little affected by the missingness mechanism, but that the missing data pattern and the missing data percentage have a strong negative impact upon prediction (or imputation) accuracy particularly if the missing data percentage exceeds 40%

    Multiple Imputation Ensembles (MIE) for dealing with missing data

    Get PDF
    Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases

    Suit the action to the word, the word to the action: Hypothetical choices and real decisions in Medicare Part D

    Get PDF
    In recent years, consumer choice has become an important element of public policy. One reason is that consumers differ in their tastes and needs, which they can express most easily through their own choices. Elements that strengthen consumer choice feature prominently in the design of public insurance markets, for instance in the United States in the recent introduction of prescription drug coverage for older individuals via Medicare Part D. For policy makers who design such a market, an important practical question in the design phase of such a new program is how to deduce enrollment and plan selection preferences prior to its introduction. In this paper, we investigate whether hypothetical choice experiments can serve as a tool in this process. We combine data from hypothetical and real plan choices, elicited around the time of the introduction of Medicare Part D. We first analyze how well the hypothetical choice data predict willingness to pay and market shares at the aggregate level. We then analyze predictions at the individual level, in particular how insurance demand varies with observable characteristics. We also explore whether the extent of adverse selection can be predicted using hypothetical choice data alone

    Cluster membership probabilities from proper motions and multiwavelength photometric catalogues: I. Method and application to the Pleiades cluster

    Full text link
    We present a new technique designed to take full advantage of the high dimensionality (photometric, astrometric, temporal) of the DANCe survey to derive self-consistent and robust membership probabilities of the Pleiades cluster. We aim at developing a methodology to infer membership probabilities to the Pleiades cluster from the DANCe multidimensional astro-photometric data set in a consistent way throughout the entire derivation. The determination of the membership probabilities has to be applicable to censored data and must incorporate the measurement uncertainties into the inference procedure. We use Bayes' theorem and a curvilinear forward model for the likelihood of the measurements of cluster members in the colour-magnitude space, to infer posterior membership probabilities. The distribution of the cluster members proper motions and the distribution of contaminants in the full multidimensional astro-photometric space is modelled with a mixture-of-Gaussians likelihood. We analyse several representation spaces composed of the proper motions plus a subset of the available magnitudes and colour indices. We select two prominent representation spaces composed of variables selected using feature relevance determination techniques based in Random Forests, and analyse the resulting samples of high probability candidates. We consistently find lists of high probability (p > 0.9975) candidates with ≈\approx 1000 sources, 4 to 5 times more than obtained in the most recent astro-photometric studies of the cluster. The methodology presented here is ready for application in data sets that include more dimensions, such as radial and/or rotational velocities, spectral indices and variability.Comment: 14 pages, 4 figures, accepted by A&

    Identification and Estimation of Discrete Games of Complete Information

    Get PDF
    We discuss the identification and estimation of discrete games of complete information. Following Bresnahan and Reiss (1990, 1991), a discrete game is a generalization of a standard discrete choice model where utility depends on the actions of other players. Using recent algorithms to compute all of the Nash equilibria to a game, we propose simulation-based estimators for static, discrete games. With appropriate exclusion restrictions about how covariates enter into payoffs and influence equilibrium selection, the model is identified with only weak parametric assumptions. Monte Carlo evidence demonstrates that the estimator can perform well in moderately-sized samples. As an application, we study the strategic decision of firms in spatially-separated markets to establish a presence on the Internet.

    Debt Contracts, Collapse and Regulation as Competition Phenomena

    Get PDF
    We study a credit market with adverse selection and moral hazard where sufficient sorting is impossible. The crucial novel feature is the competition between lenders in their choice of contracts offered. Qualities of investment projects are not observable by banks and investment decisions of entrepreneurs are not contractible, but output conditional on investment is. We explain the empirically observed prevalence of debt contracts as an equilibrium phenomenon with competing lenders. Equilibrium contracts must be immune against raisin-picking by competitors. Non-debt contracts allow competitors to offer sweet deals to particularly good debtors, who will self-select to choose such a deal, while bad debtors distribute themselves across all offered contracts. Competition of banks introduces three possibilities for a breakdown of credit markets that do not occur when a bank has a monopoly. First, average returns decrease since banks compete for good lenders which may make the lending altogether unprofitable. Second, banks can have an incentive to offer a debt contract and additional equity contracts to intermediate debtors. This combination, however, is in turn dominated by a simple debt contract that is only attractive for very good entrepreneurs. As a result no equilibrium in pure strategies exists. Existence can be restored, if the permissible types of contracts are limited by regulation resembling the separation of investment and commercial banking in the U.S. Third, allowing for random delivery on credit contracts leads to a break-down since all banks want to avoid the contract with the highest chance of delivery: that contract attracts all bad entrepreneurs.contract;debt contract;adverse selection;moral hazard;competition;financial collapse;regulation

    Identification and Estimation of Discrete Games of Complete Information

    Get PDF
    We discuss the identification and estimation of discrete games with complete information. Following Bresnahan and Reiss, a discrete game is defined to be a generalization of a standard discrete choice model in which utility depends on the actions of other players. Using recent algorithms that compute the complete set of the Nash equilibria, we propose simulation-based estimators for static, discrete games. With appropriate exclusion restrictions about how covariates enter into payoffs and influence equilibrium selection, the model is identified with only weak parametric assumptions. Monte Carlo evidence demonstrates that the estimator can perform well in moderately-sized samples. As an illustration, we study the strategic decisions of firms in spatially-separated markets in establishing a presence on the InternetEmpirical Industrial Organization, Simulation Based Estimation, Homotopies
    • 

    corecore