94 research outputs found

    Generalization Bounds for Representative Domain Adaptation

    Full text link
    In this paper, we propose a novel framework to analyze the theoretical properties of the learning process for a representative type of domain adaptation, which combines data from multiple sources and one target (or briefly called representative domain adaptation). In particular, we use the integral probability metric to measure the difference between the distributions of two domains and meanwhile compare it with the H-divergence and the discrepancy distance. We develop the Hoeffding-type, the Bennett-type and the McDiarmid-type deviation inequalities for multiple domains respectively, and then present the symmetrization inequality for representative domain adaptation. Next, we use the derived inequalities to obtain the Hoeffding-type and the Bennett-type generalization bounds respectively, both of which are based on the uniform entropy number. Moreover, we present the generalization bounds based on the Rademacher complexity. Finally, we analyze the asymptotic convergence and the rate of convergence of the learning process for representative domain adaptation. We discuss the factors that affect the asymptotic behavior of the learning process and the numerical experiments support our theoretical findings as well. Meanwhile, we give a comparison with the existing results of domain adaptation and the classical results under the same-distribution assumption.Comment: arXiv admin note: substantial text overlap with arXiv:1304.157

    Image-Level and Group-Level Models for Drosophila Gene Expression Pattern Annotation

    Get PDF
    Background: Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the gene functions, interactions, and networks. To facilitate pattern recognition and comparison, many web-based resources have been created to conduct comparative analysis based on the body part keywords and the associated images. With the fast accumulation of images from high-throughput techniques, manual inspection of images will impose a serious impediment on the pace of biological discovery. It is thus imperative to design an automated system for efficient image annotation and comparison. Results: We present a computational framework to perform anatomical keywords annotation for Drosophila gene expression images. The spatial sparse coding approach is used to represent local patches of images in comparison with the well-known bag-of-words (BoW) method. Three pooling functions including max pooling, average pooling and Sqrt (square root of mean squared statistics) pooling are employed to transform the sparse codes to image features. Based on the constructed features, we develop both an image-level scheme and a group-level scheme to tackle the key challenges in annotating Drosophila gene expression pattern images automatically. To deal with the imbalanced data distribution inherent in image annotation tasks, the undersampling method is applied together with majority vote. Results on Drosophila embryonic expression pattern images verify the efficacy of our approach. Conclusion: In our experiment, the three pooling functions perform comparably well in feature dimension reduction. The undersampling with majority vote is shown to be effective in tackling the problem of imbalanced data. Moreover, combining sparse coding and image-level scheme leads to consistent performance improvement in keywords annotation

    Learning Sparse Representations for Fruit Fly Gene Expression Pattern Image Annotation and Retreival

    Get PDF
    Background: Fruit fly embryogenesis is one of the best understood animal development systems, and the spatiotemporal gene expression dynamics in this process are captured by digital images. Analysis of these high-throughput images will provide novel insights into the functions, interactions, and networks of animal genes governing development. To facilitate comparative analysis, web-based interfaces have been developed to conduct image retrieval based on body part keywords and images. Currently, the keyword annotation of spatiotemporal gene expression patterns is conducted manually. However, this manual practice does not scale with the continuously expanding collection of images. In addition, existing image retrieval systems based on the expression patterns may be made more accurate using keywords. Results: In this article, we adapt advanced data mining and computer vision techniques to address the key challenges in annotating and retrieving fruit fly gene expression pattern images. To boost the performance of image annotation and retrieval, we propose representations integrating spatial information and sparse features, overcoming the limitations of prior schemes. Conclusions: We perform systematic experimental studies to evaluate the proposed schemes in comparison with current methods. Experimental results indicate that the integration of spatial information and sparse features lead to consistent performance improvement in image annotation, while for the task of retrieval, sparse features alone yields better results

    Learning Sparse Representations for Fruit Fly Gene Expression Pattern Image Annotation and Retreival

    Get PDF
    Background: Fruit fly embryogenesis is one of the best understood animal development systems, and the spatiotemporal gene expression dynamics in this process are captured by digital images. Analysis of these high-throughput images will provide novel insights into the functions, interactions, and networks of animal genes governing development. To facilitate comparative analysis, web-based interfaces have been developed to conduct image retrieval based on body part keywords and images. Currently, the keyword annotation of spatiotemporal gene expression patterns is conducted manually. However, this manual practice does not scale with the continuously expanding collection of images. In addition, existing image retrieval systems based on the expression patterns may be made more accurate using keywords. Results: In this article, we adapt advanced data mining and computer vision techniques to address the key challenges in annotating and retrieving fruit fly gene expression pattern images. To boost the performance of image annotation and retrieval, we propose representations integrating spatial information and sparse features, overcoming the limitations of prior schemes. Conclusions: We perform systematic experimental studies to evaluate the proposed schemes in comparison with current methods. Experimental results indicate that the integration of spatial information and sparse features lead to consistent performance improvement in image annotation, while for the task of retrieval, sparse features alone yields better results

    Flexible online task assignment in real-time spatial data

    Get PDF
    The popularity of Online To Offline (O2O) service platforms has spurred the need for online task assignment in real-time spatial data, where streams of spatially distributed tasks and workers are matched in real time such that the total number of assigned pairs is maximized. Existing online task assignment models assume that each worker is either assigned a task immediately or waits for a subsequent task at a fixed location once she/he appears on the platform. Yet in practice a worker may actively move around rather than passively wait in place if no task is assigned. In this paper, we define a new problem Flexible Two-sided Online task Assignment (FTOA). FTOA aims to guide idle workers based on the prediction of tasks and workers so as to increase the total number of assigned worker-task pairs. To address the FTOA problem, we face two challenges: (i) How to generate guidance for idle workers based on the prediction of the spatiotemporal distribution of tasks and workers? (ii) How to leverage the guidance of workers’ movements to optimize the online task assignment? To this end, we propose a novel two-step framework, which integrates offline prediction and online task assignment. Specifically, we estimate the distributions of tasks and workers per time slot and per unit area, and design an online task assignment algorithm, Prediction-oriented Online task Assignment in Real-time spatial data (POLAR-OP). It yields a 0.47-competitive ratio, which is nearly twice better than that of the state-of-the-art. POLAR-OP also reduces the time complexity to process each newly-arrived task/worker to O(1). We validate the effectiveness and efficiency of our methods via extensive experiments on both synthetic datasets and real-world datasets from a large-scale taxi-calling platform.ISSN:2150-809

    Assessment of variation in immunosuppressive pathway genes reveals TGFBR2 to be associated with prognosis of estrogen receptor-negative breast cancer after chemotherapy

    Get PDF
    Introduction: Tumor lymphocyte infiltration is associated with clinical response to chemotherapy in estrogen receptor (ER) negative breast cancer. To identify variants in immunosuppressive pathway genes associated with prognosis after adjuvant chemotherapy for ER-negative patients, we studied stage I-III invasive breast cancer patients of European ancestry, including 9,334 ER-positive (3,151 treated with chemotherapy) and 2,334 ER-negative patients (1,499 treated with chemotherapy). Methods: We pooled data from sixteen studies from the Breast Cancer Association Consortium (BCAC), and employed two independent studies for replications. Overall 3,610 single nucleotide polymorphisms (SNPs) in 133 genes were genotyped as part of the Collaborative Oncological Gene-environment Study, in which phenotype and clinical data were collected and harmonized. Multivariable Cox proportional hazard regression was used to assess genetic associations with overall survival (OS) and breast

    Assessment of variation in immunosuppressive pathway genes reveals TGFBR2 to be associated with prognosis of estrogen receptor-negative breast cancer after chemotherapy

    Get PDF
    This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.Abstract Introduction Tumor lymphocyte infiltration is associated with clinical response to chemotherapy in estrogen receptor (ER) negative breast cancer. To identify variants in immunosuppressive pathway genes associated with prognosis after adjuvant chemotherapy for ER-negative patients, we studied stage I-III invasive breast cancer patients of European ancestry, including 9,334 ER-positive (3,151 treated with chemotherapy) and 2,334 ER-negative patients (1,499 treated with chemotherapy). Methods We pooled data from sixteen studies from the Breast Cancer Association Consortium (BCAC), and employed two independent studies for replications. Overall 3,610 single nucleotide polymorphisms (SNPs) in 133 genes were genotyped as part of the Collaborative Oncological Gene-environment Study, in which phenotype and clinical data were collected and harmonized. Multivariable Cox proportional hazard regression was used to assess genetic associations with overall survival (OS) and breast cancer-specific survival (BCSS). Heterogeneity according to chemotherapy or ER status was evaluated with the log-likelihood ratio test. Results Three independent SNPs in TGFBR2 and IL12B were associated with OS (P  C) (per allele hazard ratio (HR) 1.54 (95% confidence interval (CI) 1.22 to 1.95), P = 3.08 × 10−4) was not found in ER-negative patients without chemotherapy or ER-positive patients with chemotherapy (P for interaction  A) with poorer OS (HR 1.50 (95% CI 1.21 to 1.86), P = 1.81 × 10−4), and rs2853694 (A > C) with improved OS (HR 0.73 (95% CI 0.61 to 0.87), P = 3.67 × 10−4). Similar associations were observed with BCSS. Association with TGFBR2 rs1367610 but not IL12B variants replicated using BCAC Asian samples and the independent Prospective Study of Outcomes in Sporadic versus Hereditary Breast Cancer Study and yielded a combined HR of 1.57 ((95% CI 1.28 to 1.94), P = 2.05 × 10−5) without study heterogeneity. Conclusions TGFBR2 variants may have prognostic and predictive value in ER-negative breast cancer patients treated with adjuvant chemotherapy. Our findings provide further insights into the development of immunotherapeutic targets for ER-negative breast cancer
    corecore