25,729 research outputs found

    Using Conservative Estimation for Conditional Probability instead of Ignoring Infrequent Case

    Full text link
    There are several estimators of conditional probability from observed frequencies of features. In this paper, we propose using the lower limit of confidence interval on posterior distribution determined by the observed frequencies to ascertain conditional probability. In our experiments, this method outperformed other popular estimators.Comment: The 2016 International Conference on Advanced Informatics: Concepts, Theory and Application (ICAICTA2016

    Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

    Get PDF
    Models for natural language understanding (NLU) tasks often rely on the idiosyncratic biases of the dataset, which make them brittle against test cases outside the training distribution. Recently, several proposed debiasing methods are shown to be very effective in improving out-of-distribution performance. However, their improvements come at the expense of performance drop when models are evaluated on the in-distribution data, which contain examples with higher diversity. This seemingly inevitable trade-off may not tell us much about the changes in the reasoning and understanding capabilities of the resulting models on broader types of examples beyond the small subset represented in the out-of-distribution data. In this paper, we address this trade-off by introducing a novel debiasing method, called confidence regularization, which discourage models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples. We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets (e.g., 7pp gain on HANS dataset) while maintaining the original in-distribution accuracy.Comment: to appear at ACL 202

    Identifying Key Success Factors of Vocational Rehabilitation Services Program for People with Disabilities: A Multi-Level Analysis Approach

    Get PDF
    This study proposes a multi-level approach to identify both superficial and latent relationships among variables in the data setobtained from a vocational rehabilitation (VR) services program of people with significant disabilities. In our study,classification models are first used to extract the superficial relationships between dependent and independent variables at thefirst level, and association rule mining algorithms are employed to extract additional sets of interesting associativerelationships among variables at the second level. Finally, nonlinear nonparametric canonical correlation analysis (NLCCA)along with clustering algorithm is employed to identify latent nonlinear relationships. Experimental outputs validate theusefulness of the proposed approach

    A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

    Get PDF
    Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data

    Pension fund sophistication and investment policy

    Get PDF
    This paper assesses the sophistication of pension funds' investment policies using data on 748 Dutch pension funds during the 1999.2006 period. We develop three indicators of sophistication: gross rounding of investment choices, investments in alternative sophisticated asset classes and 'home bias'. We find that pension funds' strategic portfolio choices are often based on coarse and possibly less sophisticated approaches. Most pension funds, particularly the medium-sized and smaller ones, round strategic asset allocations to the nearest multiple of 5%, similar to age heaping in demographic and historical studies. Second, many pension funds invest little or nothing in alternative asset classes besides equities and bonds, resulting in limited asset diversification. Third, medium-sized and smaller pension funds favor regional investments and as such not fully employ the opportunities of international diversification. Finally, we show that pension funds using less sophisticated asset allocation rules tend to opt for investment strategies with a lower risk-return profile.Pension funds, investment policy, portfolio choice, gross rounding, heaping, diversification, home bias, alternative investments, behavioral finance.

    Liquidity Risk and Monetary Policy

    Get PDF
    This paper provides a framework to analyse emergency liquidity assistance of central banks on financial markets in response to aggregate and idiosyncratic liquidity shocks. The model combines the microeconomic view of liquidity as the ability to sell assets quickly and at low costs and the macroeconomic view of liquidity as a medium of exchange that influences the aggregate price level of goods. The central bank faces a trade-off between limiting the negative output effects of dramatic asset price declines and more inflation. Furthermore, the anticipation of central bank intervention causes a moral hazard effect with investors. This gives rise to the possibility of an optimal monetary policy under commitment

    Interactive Data Exploration with Smart Drill-Down

    Full text link
    We present {\em smart drill-down}, an operator for interactively exploring a relational table to discover and summarize "interesting" groups of tuples. Each group of tuples is described by a {\em rule}. For instance, the rule (a,b,⋆,1000)(a, b, \star, 1000) tells us that there are a thousand tuples with value aa in the first column and bb in the second column (and any value in the third column). Smart drill-down presents an analyst with a list of rules that together describe interesting aspects of the table. The analyst can tailor the definition of interesting, and can interactively apply smart drill-down on an existing rule to explore that part of the table. We demonstrate that the underlying optimization problems are {\sc NP-Hard}, and describe an algorithm for finding the approximately optimal list of rules to display when the user uses a smart drill-down, and a dynamic sampling scheme for efficiently interacting with large tables. Finally, we perform experiments on real datasets on our experimental prototype to demonstrate the usefulness of smart drill-down and study the performance of our algorithms
    • …
    corecore