349,685 research outputs found
Semiparametric Sieve-Type GLS Inference in Regressions with Long-Range Dependence
This paper considers the problem of statistical inference in linear regression models whose stochastic regressors and errors may exhibit long-range dependence. A time-domain sieve-type generalized least squares (GLS) procedure is proposed based on an autoregressive approximation to the generating mechanism of the errors. The asymptotic properties of the sieve-type GLS estimator are established. A Monte Carlo study examines the finite-sample properties of the method for testing regression hypotheses.Autoregressive approximation, Generalized least squares, Linear regression, Long-range dependence, Spectral density
Prediction by Nonparametric Posterior Estimation in Virtual Screening
The ability to rank molecules according to their effectiveness in some domain, e.g. pesticide, drug, is important owing to the cost of synthesising and testing chemical compounds. Virtual screening seeks to do this computationally with potential savings of millions of pounds and large profits associated with reduced time to market. Recently, binary kernel discrimination (BKD) is introduced and becoming popular in Chemoinformatics domain. It produces scores based on the estimated likelihood ratio of active to inactive compounds that are then ranked. The likelihoods are estimated through a Parzen Windows approach using the binomial distribution function (to accommodate binary descriptor or "fingerprint" vectors representing the presence, or not, of certain sub-structural arrangements of atoms) in place of the usual Gaussian choice. This research aims to compute the likelihood ratio via direct estimate of posterior probability by using non-parametric generalisation of logistic regression the so-called “Kernel Logistic Regression”. Furthermore, complexity is then controlled by penalising the likelihood function by Lq-norm. The compounds are then rank descending on the basis of posterior probability. The 11 activity classes from the MDL Drug Data Report (MDDR) database are used. The results are found to be less accurate than a currently leading approach but are still comparable in a number of cases
Testing the martingale difference hypothesis using integrated regression functions
An omnibus test for testing a generalized version of the martingale difference hypothesis (MDH) is proposed. This generalized hypothesis includes the usual MDH, testing for conditional moments constancy such as conditional homoscedasticity (ARCH effects) or testing for directional predictability. A unified approach for dealing with all of these testing problems is proposed. These hypotheses are long standing problems in econometric time series analysis, and typically have been tested using the sample autocorrelations or in the spectral domain using the periodogram. Since these hypotheses cover also nonlinear predictability, tests based on those second order statistics are inconsistent against uncorrelated processes in the alternative hypothesis. In order to circumvent this problem pairwise integrated regression functions are introduced as measures of linear and nonlinear dependence. The proposed test does not require to chose a lag order depending on sample size, to smooth the data or to formulate a parametric alternative model. Moreover, the test is robust to higher order dependence, in particular to conditional heteroskedasticity. Under general dependence the asymptotic null distribution depends on the data generating process, so a bootstrap procedure is considered and a Monte Carlo study examines its finite sample performance. Then, the martingale and conditional heteroskedasticity properties of the Pound/Dollar exchange rate are investigated.Publicad
Towards the detection and analysis of performance regression introducing code changes
In contemporary software development, developers commonly conduct regression testing to ensure that code changes do not affect software quality. Performance regression testing is an emerging research area from the regression testing domain in software engineering. Performance regression testing aims to maintain the system\u27s performance. Conducting performance regression testing is known to be expensive. It is also complex, considering the increase of committed code and developing team members working simultaneously. Many automated regression testing techniques have been proposed in prior research. However, challenges in the practice of locating and resolving performance regression still exist. Directing regression testing to the commit level provides solutions to locate the root cause, yet it hinders the development process. This thesis outlines motivations and solutions to address locating performance regression root causes. First, we challenge a deterministic state-of-art approach by expanding the testing data to find improvement areas. The deterministic approach was found to be limited in searching for the best regression-locating rule. Thus, we presented two stochastic approaches to develop models that can learn from historical commits. The goal of the first stochastic approach is to view the research problem as a search-based optimization problem seeking to reach the highest detection rate. We are applying different multi-objective evolutionary algorithms and conducting a comparison between them. This thesis also investigates whether simplifying the search space by combining objectives would achieve comparative results. The second stochastic approach addresses the severity of class imbalance any system could have since code changes introducing regression are rare but costly. We formulate the identification of problematic commits that introduce performance regression as a binary classification problem that handles class imbalance. Further, the thesis provides an exploratory study on the challenges developers face in resolving performance regression. The study is based on the questions posted on a technical form directed to performance regression. We collected around 2k questions discussing the regression of software execution time, and all were manually analyzed. The study resulted in a categorization of the challenges. We also discussed the difficulty level of performance regression issues within the development community. This study provides insights to help developers during the software design and implementation to avoid regression causes
Testing the martingale difference hypothesis using integrated regression functions.
An omnibus test for testing a generalized version of the martingale difference hypothesis (MDH) is proposed. This generalized hypothesis includes the usual MDH, testing for conditional moments constancy such as conditional homoscedasticity (ARCH effects) or testing for directional predictability. A unified approach for dealing with all of these testing problems is proposed. These hypotheses are long standing problems in econometric time series analysis, and typically have been tested using the sample autocorrelations or in the spectral domain using the periodogram. Since these hypotheses cover also nonlinear predictability, tests based on those second order statistics are inconsistent against uncorrelated processes in the alternative hypothesis. In order to circumvent this problem pairwise integrated regression functions are introduced as measures of linear and nonlinear dependence. The proposed test does not require to chose a lag order depending on sample size, to smooth the data or to formulate a parametric alternative model. Moreover, the test is robust to higher order dependence, in particular to conditional heteroskedasticity. Under general dependence the asymptotic null distribution depends on the data generating process, so a bootstrap procedure is considered and a Monte Carlo study examines its finite sample performance. Then, the martingale and conditional heteroskedasticity properties of the Pound/Dollar exchange rate are investigated.Nonlinear time series; Martingale difference hypothesis; Empirical processes; Exchange rates;
Predicting EuroQol (EQ-5D) scores from the patient-reported outcomes measurement information system (PROMIS) global items and domain item banks in a United States sample
Preference-based health index scores provide a single summary score assessing overall health-related quality of life and are useful as an outcome measure in clinical studies, for estimating quality-adjusted life years for economic evaluations, and for monitoring the health of populations. We predicted EuroQoL (EQ-5D) index scores from patient-reported outcomes measurement information system (PROMIS) global items and domain item banks.
This was a secondary analysis of health outcome data collected in an internet survey as part of the PROMIS Wave 1 field testing. For this study, we included the 10 global items and the physical function, fatigue, pain impact, anxiety, and depression item banks. Linear regression analyses were used to predict EQ-5D index scores based on the global items and selected domain banks.
The regression models using eight of the PROMIS global items (quality of life, physical activities, mental health, emotional problems, social activities, pain, and fatigue and either general health or physical health items) explained 65% of the variance in the EQ-5D. When the PROMIS domain scores were included in a regression model, 57% of the variance was explained in EQ-5D scores. Comparisons of predicted to actual EQ-5D scores by age and gender groups showed that they were similar.
EQ-5D preference scores can be predicted accurately from either the PROMIS global items or selected domain banks. Application of the derived regression model allows the estimation of health preference scores from the PROMIS health measures for use in economic evaluations
Testing for seasonal unit roots by frequency domain regression
This paper considers statistics based on spectral regression estimators for testing for seasonal
unit roots in a time series. An advantage of the frequency domain approach is that it enables
serial correlation to be treated nonparametrically, thereby facilitating an explicit focus on the
frequencies at which unit roots are of interest. The limiting distributions of the proposed
test statistics are derived and their size and power properties are explored in simulation
experiments
Assigning Test Priority to Modules Using Code-Content and Bug History
Regression testing is a process that is repeated after every change in the program. Prioritization of test cases is an important process during regression test execution. Nowadays, there exist several techniques that decide which of the test cases will run first as per their priority levels, while increasing the probability of finding bugs earlier in the test life cycle. However, sometimes algorithms used to select important test cases may stop searching in local minima while missing the rest of the tests that might be important for a given change. To address this limitation further, we propose a domain-specific model that assigns testing priority to classes in applications based on developers\u27 judgments for priority. Moreover, our technique which takes into consideration applications\u27 code content and bug history, relates these features to overall class priority for testing. In the end, we test the proposed approach with a new (unknown) dataset of 20 instances. The predicted results are compared with developers\u27 priority score and saw that this metric can prioritize correctly 70% of classes under test
Causally Regularized Learning with Agnostic Data Selection Bias
Most of previous machine learning algorithms are proposed based on the i.i.d.
hypothesis. However, this ideal assumption is often violated in real
applications, where selection bias may arise between training and testing
process. Moreover, in many scenarios, the testing data is not even available
during the training process, which makes the traditional methods like transfer
learning infeasible due to their need on prior of test distribution. Therefore,
how to address the agnostic selection bias for robust model learning is of
paramount importance for both academic research and real applications. In this
paper, under the assumption that causal relationships among variables are
robust across domains, we incorporate causal technique into predictive modeling
and propose a novel Causally Regularized Logistic Regression (CRLR) algorithm
by jointly optimize global confounder balancing and weighted logistic
regression. Global confounder balancing helps to identify causal features,
whose causal effect on outcome are stable across domains, then performing
logistic regression on those causal features constructs a robust predictive
model against the agnostic bias. To validate the effectiveness of our CRLR
algorithm, we conduct comprehensive experiments on both synthetic and real
world datasets. Experimental results clearly demonstrate that our CRLR
algorithm outperforms the state-of-the-art methods, and the interpretability of
our method can be fully depicted by the feature visualization.Comment: Oral paper of 2018 ACM Multimedia Conference (MM'18
- …