709 research outputs found

    Adaptive Storey's null proportion estimator

    Full text link
    False discovery rate (FDR) is a commonly used criterion in multiple testing and the Benjamini-Hochberg (BH) procedure is arguably the most popular approach with FDR guarantee. To improve power, the adaptive BH procedure has been proposed by incorporating various null proportion estimators, among which Storey's estimator has gained substantial popularity. The performance of Storey's estimator hinges on a critical hyper-parameter, where a pre-fixed configuration lacks power and existing data-driven hyper-parameters compromise the FDR control. In this work, we propose a novel class of adaptive hyper-parameters and establish the FDR control of the associated BH procedure using a martingale argument. Within this class of data-driven hyper-parameters, we present a specific configuration designed to maximize the number of rejections and characterize the convergence of this proposal to the optimal hyper-parameter under a commonly-used mixture model. We evaluate our adaptive Storey's null proportion estimator and the associated BH procedure on extensive simulated data and a motivating protein dataset. Our proposal exhibits significant power gains when dealing with a considerable proportion of weak non-nulls or a conservative null distribution.Comment: 17 pages, 4 figures, 1 tabl

    Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey

    Full text link
    Data Augmentation (DA) has emerged as an indispensable strategy in Time Series Classification (TSC), primarily due to its capacity to amplify training samples, thereby bolstering model robustness, diversifying datasets, and curtailing overfitting. However, the current landscape of DA in TSC is plagued with fragmented literature reviews, nebulous methodological taxonomies, inadequate evaluative measures, and a dearth of accessible, user-oriented tools. In light of these challenges, this study embarks on an exhaustive dissection of DA methodologies within the TSC realm. Our initial approach involved an extensive literature review spanning a decade, revealing that contemporary surveys scarcely capture the breadth of advancements in DA for TSC, prompting us to meticulously analyze over 100 scholarly articles to distill more than 60 unique DA techniques. This rigorous analysis precipitated the formulation of a novel taxonomy, purpose-built for the intricacies of DA in TSC, categorizing techniques into five principal echelons: Transformation-Based, Pattern-Based, Generative, Decomposition-Based, and Automated Data Augmentation. Our taxonomy promises to serve as a robust navigational aid for scholars, offering clarity and direction in method selection. Addressing the conspicuous absence of holistic evaluations for prevalent DA techniques, we executed an all-encompassing empirical assessment, wherein upwards of 15 DA strategies were subjected to scrutiny across 8 UCR time-series datasets, employing ResNet and a multi-faceted evaluation paradigm encompassing Accuracy, Method Ranking, and Residual Analysis, yielding a benchmark accuracy of 88.94 +- 11.83%. Our investigation underscored the inconsistent efficacies of DA techniques, with..

    Identifying Strongly Lensed Gravitational Waves with the Third-generation Detectors

    Full text link
    The joint detection of GW signals by a network of instruments will increase the detecting ability of faint and far GW signals with higher signal-to-noise ratios (SNRs), which could improve the ability of detecting the lensed GWs as well, especially for the 3rd generation detectors, e.g. Einstein Telescope (ET) and Cosmic Explorer (CE). However, identifying Strongly Lensed Gravitational Waves (SLGWs) is still challenging. We focus on the identification ability of 3G detectors in this article. We predict and analyze the SNR distribution of SLGW signals and prove only 50.6\% of SLGW pairs detected by ET alone can be identified by Lens Bayes factor (LBF), which is a popular method at present to identify SLGWs. For SLGW pairs detected by CE\&ET network, owing to the superior spatial resolution, this number rises to 87.3\%. Moreover, we get an approximate analytical relation between SNR and LBF. We give clear SNR limits to identify SLGWs and estimate the expected yearly detection rates of galaxy-scale lensed GWs that can get identified with 3G detector network.Comment: 9 pages, 7 figure

    Long-tail Cross Modal Hashing

    Full text link
    Existing Cross Modal Hashing (CMH) methods are mainly designed for balanced data, while imbalanced data with long-tail distribution is more general in real-world. Several long-tail hashing methods have been proposed but they can not adapt for multi-modal data, due to the complex interplay between labels and individuality and commonality information of multi-modal data. Furthermore, CMH methods mostly mine the commonality of multi-modal data to learn hash codes, which may override tail labels encoded by the individuality of respective modalities. In this paper, we propose LtCMH (Long-tail CMH) to handle imbalanced multi-modal data. LtCMH firstly adopts auto-encoders to mine the individuality and commonality of different modalities by minimizing the dependency between the individuality of respective modalities and by enhancing the commonality of these modalities. Then it dynamically combines the individuality and commonality with direct features extracted from respective modalities to create meta features that enrich the representation of tail labels, and binaries meta features to generate hash codes. LtCMH significantly outperforms state-of-the-art baselines on long-tail datasets and holds a better (or comparable) performance on datasets with balanced labels.Comment: Accepted by the Thirty-Seventh AAAI Conference on Artificial Intelligence(AAAI2023
    • …
    corecore