186 research outputs found

    Domain Adaptation for Dense Retrieval through Self-Supervision by Pseudo-Relevance Labeling

    Full text link
    Although neural information retrieval has witnessed great improvements, recent works showed that the generalization ability of dense retrieval models on target domains with different distributions is limited, which contrasts with the results obtained with interaction-based models. To address this issue, researchers have resorted to adversarial learning and query generation approaches; both approaches nevertheless resulted in limited improvements. In this paper, we propose to use a self-supervision approach in which pseudo-relevance labels are automatically generated on the target domain. To do so, we first use the standard BM25 model on the target domain to obtain a first ranking of documents, and then use the interaction-based model T53B to re-rank top documents. We further combine this approach with knowledge distillation relying on an interaction-based teacher model trained on the source domain. Our experiments reveal that pseudo-relevance labeling using T53B and the MiniLM teacher performs on average better than other approaches and helps improve the state-of-the-art query generation approach GPL when it is fine-tuned on the pseudo-relevance labeled data.Comment: 16 page

    OpenSD: Unified Open-Vocabulary Segmentation and Detection

    Full text link
    Recently, a few open-vocabulary methods have been proposed by employing a unified architecture to tackle generic segmentation and detection tasks. However, their performance still lags behind the task-specific models due to the conflict between different tasks, and their open-vocabulary capability is limited due to the inadequate use of CLIP. To address these challenges, we present a universal transformer-based framework, abbreviated as OpenSD, which utilizes the same architecture and network parameters to handle open-vocabulary segmentation and detection tasks. First, we introduce a decoder decoupled learning strategy to alleviate the semantic conflict between thing and staff categories so that each individual task can be learned more effectively under the same framework. Second, to better leverage CLIP for end-to-end segmentation and detection, we propose dual classifiers to handle the in-vocabulary domain and out-of-vocabulary domain, respectively. The text encoder is further trained to be region-aware for both thing and stuff categories through decoupled prompt learning, enabling them to filter out duplicated and low-quality predictions, which is important to end-to-end segmentation and detection. Extensive experiments are conducted on multiple datasets under various circumstances. The results demonstrate that OpenSD outperforms state-of-the-art open-vocabulary segmentation and detection methods in both closed- and open-vocabulary settings. Code is available at https://github.com/strongwolf/OpenS

    Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval

    Get PDF
    AbstractPre-trained language models have been successful in many knowledge-intensive NLP tasks. However, recent work has shown that models such as BERT are not “structurally ready” to aggregate textual information into a [CLS] vector for dense passage retrieval (DPR). This “lack of readiness” results from the gap between language model pre-training and DPR fine-tuning. Previous solutions call for computationally expensive techniques such as hard negative mining, cross-encoder distillation, and further pre-training to learn a robust DPR model. In this work, we instead propose to fully exploit knowledge in a pre-trained language model for DPR by aggregating the contextualized token embeddings into a dense vector, which we call agg★. By concatenating vectors from the [CLS] token and agg★, our Aggretriever model substantially improves the effectiveness of dense retrieval models on both in-domain and zero-shot evaluations without introducing substantial training overhead. Code is available at https://github.com/castorini/dhr

    Dynamical Masses and Ages of Sirius-like Systems

    Full text link
    We measure precise orbits and dynamical masses and derive age constraints for six confirmed and one candidate Sirius-like systems, including the Hyades member HD 27483. Our orbital analysis incorporates radial velocities, relative astrometry, and Hipparcos-Gaia astrometric accelerations. We constrain the main-sequence lifetime of a white dwarf's progenitor from the remnant's dynamical mass and semi-empirical initial-final mass relations and infer the cooling age from mass and effective temperature. We present new relative astrometry of HD 27483 B from Keck/NIRC2 observations and archival HST data, and obtain the first dynamical mass of 0.7980.041+0.10{0.798}_{-0.041}^{+0.10} MM_{\odot}, and an age of 450180+570{450}_{-180}^{+570} Myr, consistent with previous age estimates of Hyades. We also measure precise dynamical masses for HD 114174 B (0.591±0.0110.591 \pm 0.011 MM_{\odot}) and HD 169889 B (0.5260.037+0.039{0.526}_{-0.037}^{+0.039} MM_{\odot}), but their age precisions are limited by their uncertain temperatures. For HD 27786 B, the unusually small mass of 0.443±0.0120.443 \pm 0.012 MM_{\odot} suggests a history of rapid mass loss, possibly due to binary interaction in its progenitor's AGB phase. The orbits of HD 118475 and HD 136138 from our RV fitting are overall in good agreement with Gaia DR3 astrometric two-body solutions, despite moderate differences in the eccentricity and period of HD 136138. The mass of 0.5800.039+0.052{0.580}_{-0.039}^{+0.052} MM_{\odot} for HD 118475 B and a speckle imaging non-detection confirms that the companion is a white dwarf. Our analysis shows examples of a rich number of precise WD dynamical mass measurements enabled by Gaia DR3 and later releases, which will improve empirical calibrations of the white dwarf initial-final mass relation.Comment: 21 pages, 7 figures. Submitted to MNRA

    Pathology Steered Stratification Network for Subtype Identification in Alzheimer's Disease

    Full text link
    Alzheimer's disease (AD) is a heterogeneous, multifactorial neurodegenerative disorder characterized by beta-amyloid, pathologic tau, and neurodegeneration. There are no effective treatments for Alzheimer's disease at a late stage, urging for early intervention. However, existing statistical inference approaches of AD subtype identification ignore the pathological domain knowledge, which could lead to ill-posed results that are sometimes inconsistent with the essential neurological principles. Integrating systems biology modeling with machine learning, we propose a novel pathology steered stratification network (PSSN) that incorporates established domain knowledge in AD pathology through a reaction-diffusion model, where we consider non-linear interactions between major biomarkers and diffusion along brain structural network. Trained on longitudinal multimodal neuroimaging data, the biological model predicts long-term trajectories that capture individual progression pattern, filling in the gaps between sparse imaging data available. A deep predictive neural network is then built to exploit spatiotemporal dynamics, link neurological examinations with clinical profiles, and generate subtype assignment probability on an individual basis. We further identify an evolutionary disease graph to quantify subtype transition probabilities through extensive simulations. Our stratification achieves superior performance in both inter-cluster heterogeneity and intra-cluster homogeneity of various clinical scores. Applying our approach to enriched samples of aging populations, we identify six subtypes spanning AD spectrum, where each subtype exhibits a distinctive biomarker pattern that is consistent with its clinical outcome. PSSN provides insights into pre-symptomatic diagnosis and practical guidance on clinical treatments, which may be further generalized to other neurodegenerative diseases
    corecore