25,729 research outputs found
Using Conservative Estimation for Conditional Probability instead of Ignoring Infrequent Case
There are several estimators of conditional probability from observed
frequencies of features. In this paper, we propose using the lower limit of
confidence interval on posterior distribution determined by the observed
frequencies to ascertain conditional probability. In our experiments, this
method outperformed other popular estimators.Comment: The 2016 International Conference on Advanced Informatics: Concepts,
Theory and Application (ICAICTA2016
Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance
Models for natural language understanding (NLU) tasks often rely on the
idiosyncratic biases of the dataset, which make them brittle against test cases
outside the training distribution. Recently, several proposed debiasing methods
are shown to be very effective in improving out-of-distribution performance.
However, their improvements come at the expense of performance drop when models
are evaluated on the in-distribution data, which contain examples with higher
diversity. This seemingly inevitable trade-off may not tell us much about the
changes in the reasoning and understanding capabilities of the resulting models
on broader types of examples beyond the small subset represented in the
out-of-distribution data. In this paper, we address this trade-off by
introducing a novel debiasing method, called confidence regularization, which
discourage models from exploiting biases while enabling them to receive enough
incentive to learn from all the training examples. We evaluate our method on
three NLU tasks and show that, in contrast to its predecessors, it improves the
performance on out-of-distribution datasets (e.g., 7pp gain on HANS dataset)
while maintaining the original in-distribution accuracy.Comment: to appear at ACL 202
Identifying Key Success Factors of Vocational Rehabilitation Services Program for People with Disabilities: A Multi-Level Analysis Approach
This study proposes a multi-level approach to identify both superficial and latent relationships among variables in the data setobtained from a vocational rehabilitation (VR) services program of people with significant disabilities. In our study,classification models are first used to extract the superficial relationships between dependent and independent variables at thefirst level, and association rule mining algorithms are employed to extract additional sets of interesting associativerelationships among variables at the second level. Finally, nonlinear nonparametric canonical correlation analysis (NLCCA)along with clustering algorithm is employed to identify latent nonlinear relationships. Experimental outputs validate theusefulness of the proposed approach
A survey on utilization of data mining approaches for dermatological (skin) diseases prediction
Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data
Pension fund sophistication and investment policy
This paper assesses the sophistication of pension funds' investment policies using data on 748 Dutch pension funds during the 1999.2006 period. We develop three indicators of sophistication: gross rounding of investment choices, investments in alternative sophisticated asset classes and 'home bias'. We find that pension funds' strategic portfolio choices are often based on coarse and possibly less sophisticated approaches. Most pension funds, particularly the medium-sized and smaller ones, round strategic asset allocations to the nearest multiple of 5%, similar to age heaping in demographic and historical studies. Second, many pension funds invest little or nothing in alternative asset classes besides equities and bonds, resulting in limited asset diversification. Third, medium-sized and smaller pension funds favor regional investments and as such not fully employ the opportunities of international diversification. Finally, we show that pension funds using less sophisticated asset allocation rules tend to opt for investment strategies with a lower risk-return profile.Pension funds, investment policy, portfolio choice, gross rounding, heaping, diversification, home bias, alternative investments, behavioral finance.
Liquidity Risk and Monetary Policy
This paper provides a framework to analyse emergency liquidity assistance of central banks on financial markets in response to aggregate and idiosyncratic liquidity shocks. The model combines the microeconomic view of liquidity as the ability to sell assets quickly and at low costs and the macroeconomic view of liquidity as a medium of exchange that influences the aggregate price level of goods. The central bank faces a trade-off between limiting the negative output effects of dramatic asset price declines and more inflation. Furthermore, the anticipation of central bank intervention causes a moral hazard effect with investors. This gives rise to the possibility of an optimal monetary policy under commitment
Interactive Data Exploration with Smart Drill-Down
We present {\em smart drill-down}, an operator for interactively exploring a
relational table to discover and summarize "interesting" groups of tuples. Each
group of tuples is described by a {\em rule}. For instance, the rule tells us that there are a thousand tuples with value in the
first column and in the second column (and any value in the third column).
Smart drill-down presents an analyst with a list of rules that together
describe interesting aspects of the table. The analyst can tailor the
definition of interesting, and can interactively apply smart drill-down on an
existing rule to explore that part of the table. We demonstrate that the
underlying optimization problems are {\sc NP-Hard}, and describe an algorithm
for finding the approximately optimal list of rules to display when the user
uses a smart drill-down, and a dynamic sampling scheme for efficiently
interacting with large tables. Finally, we perform experiments on real datasets
on our experimental prototype to demonstrate the usefulness of smart drill-down
and study the performance of our algorithms
- …