349,685 research outputs found

    Semiparametric Sieve-Type GLS Inference in Regressions with Long-Range Dependence

    Get PDF
    This paper considers the problem of statistical inference in linear regression models whose stochastic regressors and errors may exhibit long-range dependence. A time-domain sieve-type generalized least squares (GLS) procedure is proposed based on an autoregressive approximation to the generating mechanism of the errors. The asymptotic properties of the sieve-type GLS estimator are established. A Monte Carlo study examines the finite-sample properties of the method for testing regression hypotheses.Autoregressive approximation, Generalized least squares, Linear regression, Long-range dependence, Spectral density

    Prediction by Nonparametric Posterior Estimation in Virtual Screening

    No full text
    The ability to rank molecules according to their effectiveness in some domain, e.g. pesticide, drug, is important owing to the cost of synthesising and testing chemical compounds. Virtual screening seeks to do this computationally with potential savings of millions of pounds and large profits associated with reduced time to market. Recently, binary kernel discrimination (BKD) is introduced and becoming popular in Chemoinformatics domain. It produces scores based on the estimated likelihood ratio of active to inactive compounds that are then ranked. The likelihoods are estimated through a Parzen Windows approach using the binomial distribution function (to accommodate binary descriptor or "fingerprint" vectors representing the presence, or not, of certain sub-structural arrangements of atoms) in place of the usual Gaussian choice. This research aims to compute the likelihood ratio via direct estimate of posterior probability by using non-parametric generalisation of logistic regression the so-called “Kernel Logistic Regression”. Furthermore, complexity is then controlled by penalising the likelihood function by Lq-norm. The compounds are then rank descending on the basis of posterior probability. The 11 activity classes from the MDL Drug Data Report (MDDR) database are used. The results are found to be less accurate than a currently leading approach but are still comparable in a number of cases

    Testing the martingale difference hypothesis using integrated regression functions

    Get PDF
    An omnibus test for testing a generalized version of the martingale difference hypothesis (MDH) is proposed. This generalized hypothesis includes the usual MDH, testing for conditional moments constancy such as conditional homoscedasticity (ARCH effects) or testing for directional predictability. A unified approach for dealing with all of these testing problems is proposed. These hypotheses are long standing problems in econometric time series analysis, and typically have been tested using the sample autocorrelations or in the spectral domain using the periodogram. Since these hypotheses cover also nonlinear predictability, tests based on those second order statistics are inconsistent against uncorrelated processes in the alternative hypothesis. In order to circumvent this problem pairwise integrated regression functions are introduced as measures of linear and nonlinear dependence. The proposed test does not require to chose a lag order depending on sample size, to smooth the data or to formulate a parametric alternative model. Moreover, the test is robust to higher order dependence, in particular to conditional heteroskedasticity. Under general dependence the asymptotic null distribution depends on the data generating process, so a bootstrap procedure is considered and a Monte Carlo study examines its finite sample performance. Then, the martingale and conditional heteroskedasticity properties of the Pound/Dollar exchange rate are investigated.Publicad

    Towards the detection and analysis of performance regression introducing code changes

    Get PDF
    In contemporary software development, developers commonly conduct regression testing to ensure that code changes do not affect software quality. Performance regression testing is an emerging research area from the regression testing domain in software engineering. Performance regression testing aims to maintain the system\u27s performance. Conducting performance regression testing is known to be expensive. It is also complex, considering the increase of committed code and developing team members working simultaneously. Many automated regression testing techniques have been proposed in prior research. However, challenges in the practice of locating and resolving performance regression still exist. Directing regression testing to the commit level provides solutions to locate the root cause, yet it hinders the development process. This thesis outlines motivations and solutions to address locating performance regression root causes. First, we challenge a deterministic state-of-art approach by expanding the testing data to find improvement areas. The deterministic approach was found to be limited in searching for the best regression-locating rule. Thus, we presented two stochastic approaches to develop models that can learn from historical commits. The goal of the first stochastic approach is to view the research problem as a search-based optimization problem seeking to reach the highest detection rate. We are applying different multi-objective evolutionary algorithms and conducting a comparison between them. This thesis also investigates whether simplifying the search space by combining objectives would achieve comparative results. The second stochastic approach addresses the severity of class imbalance any system could have since code changes introducing regression are rare but costly. We formulate the identification of problematic commits that introduce performance regression as a binary classification problem that handles class imbalance. Further, the thesis provides an exploratory study on the challenges developers face in resolving performance regression. The study is based on the questions posted on a technical form directed to performance regression. We collected around 2k questions discussing the regression of software execution time, and all were manually analyzed. The study resulted in a categorization of the challenges. We also discussed the difficulty level of performance regression issues within the development community. This study provides insights to help developers during the software design and implementation to avoid regression causes

    Testing the martingale difference hypothesis using integrated regression functions.

    Get PDF
    An omnibus test for testing a generalized version of the martingale difference hypothesis (MDH) is proposed. This generalized hypothesis includes the usual MDH, testing for conditional moments constancy such as conditional homoscedasticity (ARCH effects) or testing for directional predictability. A unified approach for dealing with all of these testing problems is proposed. These hypotheses are long standing problems in econometric time series analysis, and typically have been tested using the sample autocorrelations or in the spectral domain using the periodogram. Since these hypotheses cover also nonlinear predictability, tests based on those second order statistics are inconsistent against uncorrelated processes in the alternative hypothesis. In order to circumvent this problem pairwise integrated regression functions are introduced as measures of linear and nonlinear dependence. The proposed test does not require to chose a lag order depending on sample size, to smooth the data or to formulate a parametric alternative model. Moreover, the test is robust to higher order dependence, in particular to conditional heteroskedasticity. Under general dependence the asymptotic null distribution depends on the data generating process, so a bootstrap procedure is considered and a Monte Carlo study examines its finite sample performance. Then, the martingale and conditional heteroskedasticity properties of the Pound/Dollar exchange rate are investigated.Nonlinear time series; Martingale difference hypothesis; Empirical processes; Exchange rates;

    Predicting EuroQol (EQ-5D) scores from the patient-reported outcomes measurement information system (PROMIS) global items and domain item banks in a United States sample

    Get PDF
    Preference-based health index scores provide a single summary score assessing overall health-related quality of life and are useful as an outcome measure in clinical studies, for estimating quality-adjusted life years for economic evaluations, and for monitoring the health of populations. We predicted EuroQoL (EQ-5D) index scores from patient-reported outcomes measurement information system (PROMIS) global items and domain item banks. This was a secondary analysis of health outcome data collected in an internet survey as part of the PROMIS Wave 1 field testing. For this study, we included the 10 global items and the physical function, fatigue, pain impact, anxiety, and depression item banks. Linear regression analyses were used to predict EQ-5D index scores based on the global items and selected domain banks. The regression models using eight of the PROMIS global items (quality of life, physical activities, mental health, emotional problems, social activities, pain, and fatigue and either general health or physical health items) explained 65% of the variance in the EQ-5D. When the PROMIS domain scores were included in a regression model, 57% of the variance was explained in EQ-5D scores. Comparisons of predicted to actual EQ-5D scores by age and gender groups showed that they were similar. EQ-5D preference scores can be predicted accurately from either the PROMIS global items or selected domain banks. Application of the derived regression model allows the estimation of health preference scores from the PROMIS health measures for use in economic evaluations

    Testing for seasonal unit roots by frequency domain regression

    Get PDF
    This paper considers statistics based on spectral regression estimators for testing for seasonal unit roots in a time series. An advantage of the frequency domain approach is that it enables serial correlation to be treated nonparametrically, thereby facilitating an explicit focus on the frequencies at which unit roots are of interest. The limiting distributions of the proposed test statistics are derived and their size and power properties are explored in simulation experiments

    Assigning Test Priority to Modules Using Code-Content and Bug History

    Get PDF
    Regression testing is a process that is repeated after every change in the program. Prioritization of test cases is an important process during regression test execution. Nowadays, there exist several techniques that decide which of the test cases will run first as per their priority levels, while increasing the probability of finding bugs earlier in the test life cycle. However, sometimes algorithms used to select important test cases may stop searching in local minima while missing the rest of the tests that might be important for a given change. To address this limitation further, we propose a domain-specific model that assigns testing priority to classes in applications based on developers\u27 judgments for priority. Moreover, our technique which takes into consideration applications\u27 code content and bug history, relates these features to overall class priority for testing. In the end, we test the proposed approach with a new (unknown) dataset of 20 instances. The predicted results are compared with developers\u27 priority score and saw that this metric can prioritize correctly 70% of classes under test

    Causally Regularized Learning with Agnostic Data Selection Bias

    Full text link
    Most of previous machine learning algorithms are proposed based on the i.i.d. hypothesis. However, this ideal assumption is often violated in real applications, where selection bias may arise between training and testing process. Moreover, in many scenarios, the testing data is not even available during the training process, which makes the traditional methods like transfer learning infeasible due to their need on prior of test distribution. Therefore, how to address the agnostic selection bias for robust model learning is of paramount importance for both academic research and real applications. In this paper, under the assumption that causal relationships among variables are robust across domains, we incorporate causal technique into predictive modeling and propose a novel Causally Regularized Logistic Regression (CRLR) algorithm by jointly optimize global confounder balancing and weighted logistic regression. Global confounder balancing helps to identify causal features, whose causal effect on outcome are stable across domains, then performing logistic regression on those causal features constructs a robust predictive model against the agnostic bias. To validate the effectiveness of our CRLR algorithm, we conduct comprehensive experiments on both synthetic and real world datasets. Experimental results clearly demonstrate that our CRLR algorithm outperforms the state-of-the-art methods, and the interpretability of our method can be fully depicted by the feature visualization.Comment: Oral paper of 2018 ACM Multimedia Conference (MM'18
    corecore