216,396 research outputs found

    Boosting the concordance index for survival data - a unified framework to derive and evaluate biomarker combinations

    Get PDF
    The development of molecular signatures for the prediction of time-to-event outcomes is a methodologically challenging task in bioinformatics and biostatistics. Although there are numerous approaches for the derivation of marker combinations and their evaluation, the underlying methodology often suffers from the problem that different optimization criteria are mixed during the feature selection, estimation and evaluation steps. This might result in marker combinations that are only suboptimal regarding the evaluation criterion of interest. To address this issue, we propose a unified framework to derive and evaluate biomarker combinations. Our approach is based on the concordance index for time-to-event data, which is a non-parametric measure to quantify the discrimatory power of a prediction rule. Specifically, we propose a component-wise boosting algorithm that results in linear biomarker combinations that are optimal with respect to a smoothed version of the concordance index. We investigate the performance of our algorithm in a large-scale simulation study and in two molecular data sets for the prediction of survival in breast cancer patients. Our numerical results show that the new approach is not only methodologically sound but can also lead to a higher discriminatory power than traditional approaches for the derivation of gene signatures.Comment: revised manuscript - added simulation study, additional result

    Finding kernel function for stock market prediction with support vector regression

    Get PDF
    Stock market prediction is one of the fascinating issues of stock market research. Accurate stock prediction becomes the biggest challenge in investment industry because the distribution of stock data is changing over the time. Time series forcasting, Neural Network (NN) and Support Vector Machine (SVM) are once commonly used for prediction on stock price. In this study, the data mining operation called time series forecasting is implemented. The large amount of stock data collected from Kuala Lumpur Stock Exchange is used for the experiment to test the validity of SVMs regression. SVM is a new machine learning technique with principle of structural minimization risk, which have greater generalization ability and proved success in time series prediction. Two kernel functions namely Radial Basis Function and polynomial are compared for finding the accurate prediction values. Besides that, backpropagation neural network are also used to compare the predictions performance. Several experiments are conducted and some analyses on the experimental results are done. The results show that SVM with polynomial kernels provide a promising alternative tool in KLSE stock market prediction

    Forecasting of commercial sales with large scale Gaussian Processes

    Full text link
    This paper argues that there has not been enough discussion in the field of applications of Gaussian Process for the fast moving consumer goods industry. Yet, this technique can be important as it e.g., can provide automatic feature relevance determination and the posterior mean can unlock insights on the data. Significant challenges are the large size and high dimensionality of commercial data at a point of sale. The study reviews approaches in the Gaussian Processes modeling for large data sets, evaluates their performance on commercial sales and shows value of this type of models as a decision-making tool for management.Comment: 1o pages, 5 figure

    Asset Pricing Theories, Models, and Tests

    Get PDF
    An important but still partially unanswered question in the investment field is why different assets earn substantially different returns on average. Financial economists have typically addressed this question in the context of theoretically or empirically motivated asset pricing models. Since many of the proposed “risk” theories are plausible, a common practice in the literature is to take the models to the data and perform “horse races” among competing asset pricing specifications. A “good” asset pricing model should produce small pricing (expected return) errors on a set of test assets and should deliver reasonable estimates of the underlying market and economic risk premia. This chapter provides an up-to-date review of the statistical methods that are typically used to estimate, evaluate, and compare competing asset pricing models. The analysis also highlights several pitfalls in the current econometric practice and offers suggestions for improving empirical tests

    Reducing regression test size by exclusion.

    Get PDF
    Operational software is constantly evolving. Regression testing is used to identify the unintended consequences of evolutionary changes. As most changes affect only a small proportion of the system, the challenge is to ensure that the regression test set is both safe (all relevant tests are used) and unclusive (only relevant tests are used). Previous approaches to reducing test sets struggle to find safe and inclusive tests by looking only at the changed code. We use decomposition program slicing to safely reduce the size of regression test sets by identifying those parts of a system that could not have been affected by a change; this information will then direct the selection of regression tests by eliminating tests that are not relevant to the change. The technique properly accounts for additions and deletions of code. We extend and use Rothermel and Harrold’s framework for measuring the safety of regression test sets and introduce new safety and precision measures that do not require a priori knowledge of the exact number of modification-revealing tests. We then analytically evaluate and compare our techniques for producing reduced regression test sets

    Reducing regression test size by exclusion.

    Get PDF
    Operational software is constantly evolving. Regression testing is used to identify the unintended consequences of evolutionary changes. As most changes affect only a small proportion of the system, the challenge is to ensure that the regression test set is both safe (all relevant tests are used) and unclusive (only relevant tests are used). Previous approaches to reducing test sets struggle to find safe and inclusive tests by looking only at the changed code. We use decomposition program slicing to safely reduce the size of regression test sets by identifying those parts of a system that could not have been affected by a change; this information will then direct the selection of regression tests by eliminating tests that are not relevant to the change. The technique properly accounts for additions and deletions of code. We extend and use Rothermel and Harrold’s framework for measuring the safety of regression test sets and introduce new safety and precision measures that do not require a priori knowledge of the exact number of modification-revealing tests. We then analytically evaluate and compare our techniques for producing reduced regression test sets
    • …
    corecore