43 research outputs found

    Two-Stage Bagging Pruning for Reducing the Ensemble Size and Improving the Classification Performance

    Get PDF
    Ensemble methods, such as the traditional bagging algorithm, can usually improve the performance of a single classifier. However, they usually require large storage space as well as relatively time-consuming predictions. Many approaches were developed to reduce the ensemble size and improve the classification performance by pruning the traditional bagging algorithms. In this article, we proposed a two-stage strategy to prune the traditional bagging algorithm by combining two simple approaches: accuracy-based pruning (AP) and distance-based pruning (DP). These two methods, as well as their two combinations, “AP+DP” and “DP+AP” as the two-stage pruning strategy, were all examined. Comparing with the single pruning methods, we found that the two-stage pruning methods can furthermore reduce the ensemble size and improve the classification. “AP+DP” method generally performs better than the “DP+AP” method when using four base classifiers: decision tree, Gaussian naive Bayes, K-nearest neighbor, and logistic regression. Moreover, as compared to the traditional bagging, the two-stage method “AP+DP” improved the classification accuracy by 0.88%, 4.06%, 1.26%, and 0.96%, respectively, averaged over 28 datasets under the four base classifiers. It was also observed that “AP+DP” outperformed other three existing algorithms Brag, Nice, and TB assessed on 8 common datasets. In summary, the proposed two-stage pruning methods are simple and promising approaches, which can both reduce the ensemble size and improve the classification accuracy

    Correlation Coefficients for a Study with Repeated Measures

    Get PDF
    Repeated measures are increasingly collected in a study to investigate the trajectory of measures over time. One of the first research questions is to determine the correlation between two measures. The following five methods for correlation calculation are compared: (1) Pearson correlation; (2) correlation of subject means; (3) partial correlation for subject effect; (4) partial correlation for visit effect; and (5) a mixed model approach. Pearson correlation coefficient is traditionally used in a cross-sectional study. Pearson correlation is close to the correlations computed from mixed-effects models that consider the correlation structure, but Pearson correlation may not be theoretically appropriate in a repeated-measure study as it ignores the correlation of the outcomes from multiple visits within the same subject. We compare these methods with regard to the average of correlation and the mean squared error. In general, correlation under the mixed-effects model with the compound symmetric structure is recommended as its correlation is close to the nominal level with small mean square error

    A new method to determine the tropopause

    Get PDF
    The tropopause has a complex structure and some interference information may exist in high-resolution global positioning system (GPS)/low earth-orbiting (LEO) radio occultation (RO) data. The position of the tropopause cannot be accurately determined using traditional cold point tropopause (CPT) and lapse rate tropopause (LRT) algorithms. In this paper, an integrative algorithm is developed to determinate tropopause parameters. The algorithm is applied to GPS/COSMIC RO data to obtain a global distribution of the height and temperature of the tropopause. This algorithm improves the utilization rate of GPS/LEO RO data by 30% compared with that from the traditional CPT method. The rationality and reliability of GPS/LEO RO data in probing the Earth’s atmosphere are verified by our study of the tropopause using COSMIC data

    Identification and Characterization of \u3cem\u3eOGG1\u3c/em\u3e Mutations in Patients with Alzheimer\u27s Disease

    Get PDF
    Patients with Alzheimer\u27s disease (AD) exhibit higher levels of 8-oxo-guanine (8-oxoG) DNA lesions in their brain, suggesting a reduced or defective 8-oxoG repair. To test this hypothesis, this study investigated 14 AD patients and 10 age-matched controls for mutations of the major 8-oxoG removal gene OGG1. Whereas no alterations were detected in any control samples, four AD patients exhibited mutations in OGG1, two carried a common single base (C796) deletion that alters the carboxyl terminal sequence of OGG1, and the other two had nucleotide alterations leading to single amino acid substitutions. In vitro biochemical assays revealed that the protein encoded by the C796-deleted OGG1 completely lost its 8-oxoG glycosylase activity, and that the two single residue-substituted OGG1 proteins showed a significant reduction in the glycosylase activity. These results were consistent with the fact that nuclear extracts derived from a limited number of AD patients with OGG1 mutations exhibited greatly reduced 8-oxoG glycosylase activity compared with age-matched controls and AD patients without OGG1 alterations. Our findings suggest that defects in OGG1 may be important in the pathogenesis of AD in a significant fraction of AD patients and provide new insight into the molecular basis for the disease

    Two-stage optimal designs with survival endpoint when the follow-up time is restricted

    No full text
    Abstract Background Survival endpoint is frequently used in early phase clinical trials as the primary endpoint to assess the activity of a new treatment. Existing two-stage optimal designs with survival endpoint either over estimate the sample size or compute power outside the alternative hypothesis space. Methods We propose a new single-arm two-stage optimal design with survival endpoint by using the one-sample log rank test based on exact variance estimates. This proposed design with survival endpoint is analogous to Simon’s two-stage design with binary endpoint, having restricted follow-up. Results We compare the proposed design with the existing two-stage designs, including the two-stage design with survival endpoint based on the nonparametric Nelson-Aalen estimate, and Simon’s two-stage designs with or without interim accrual. The new design always performs better than these competitors with regards to the expected total study length, and requires a smaller expected sample size than Simon’s design with interim accrual. Conclusions The proposed two-stage minimax and optimal designs with survival endpoint are recommended for use in practice to shorten the study length of clinical trials

    Comparison of Unweighted and Weighted Rank Based Tests for an Ordered Alternative in Randomized Complete Block Designs

    Full text link
    In randomized complete block designs, a monotonic relationship among treatment groups may already be established from prior information, e.g., a study with different dose levels of a drug. The test statistic developed by Page (1963) and another from Jonckheere (1954) and Terpstra (1952) are two unweighted rank based tests used to detect ordered alternatives when the assumptions in the traditional two-way analysis of variance are not satisfied. We consider a new weighted rank based test by utilizing a weight for each subject based on the sample variance in computing the new test statistic. The new weighted rank based test is compared with the two commonly used unweighted tests with regard to power under various conditions. The weighted test is generally more powerful than the two unweighted tests when the number of treatment groups is small to moderate
    corecore