12,650 research outputs found

    A Partially Linear Framework for Massive Heterogeneous Data

    Full text link
    We consider a partially linear framework for modelling massive heterogeneous data. The major goal is to extract common features across all sub-populations while exploring heterogeneity of each sub-population. In particular, we propose an aggregation type estimator for the commonality parameter that possesses the (non-asymptotic) minimax optimal bound and asymptotic distribution as if there were no heterogeneity. This oracular result holds when the number of sub-populations does not grow too fast. A plug-in estimator for the heterogeneity parameter is further constructed, and shown to possess the asymptotic distribution as if the commonality information were available. We also test the heterogeneity among a large number of sub-populations. All the above results require to regularize each sub-estimation as though it had the entire sample size. Our general theory applies to the divide-and-conquer approach that is often used to deal with massive homogeneous data. A technical by-product of this paper is the statistical inferences for the general kernel ridge regression. Thorough numerical results are also provided to back up our theory.Comment: 40 pages main text + 40 pages suppl, To appear in Annals of Statistic

    Self-desiccation and self-desiccation shrinkage of silica fume-cement pastes

    Get PDF
    Self-desiccation is one common phenomenon of high-performance cementitious materials, which are characterized by low water/binder (w/b) ratio and high mineral admixture incorporation. As a consequence, large magnitude of self-desiccation shrinkage, a key factor which influences the cracking behavior of concrete, develops rapidly in the cement matrix due to the internal relative humidity (RH) decrease and capillary pressure induced by self-desiccation. The objective of this study is to evaluate the behavior of self-desiccation and self-desiccation shrinkage in silica fume (SF) blended cement pasts with low w/b ratio of 0.25. The self-desiccation process was revealed by the measurement of internal RH of the sealed cement pastes with conventional method of hygrometer. The shrinkage of the sealed cement pastes was measured by the corrugated tube method, permitting measurements to start at early age. Experimental results revealed that SF blending leads to a higher internal RH, indicating slower self-desiccation process, compared with pure cement paste. Consequently, less self-desiccation shrinkage was observed in SF blended cement pastes than that in pure cement paste

    The correlation between the mutation of protein kinase genes and the clinical characteristics of breast cancer progression

    Get PDF
    It is accepted that breast cancer (BC) is a heterogeneous disease. In order to investigate BC as a group of disease sub-types, the varying clinical characteristics of BC patients must be considered. In this project a series of clinical, pathological, genetic and genomic data, retrieved from multiple data repositories, will be reviewed for selection in a large-scale meta-analysis and then categorised into 5 sub-groups (Luminal A, Luminal B, Basal, HER2 and Normal). The meta-analysis is primarily designed to ascertain if a correlation exists between the mutation of protein kinase (PK) genes and BC progression. As PK genes play important roles in regulating most cellular processes (e.g. cell proliferation, differentiation and apoptosis), it is no surprise that deregulated PK activity is a frequent cause of disease, and that PK genes are often oncogenes. The meta-analysis objectives are two-fold: 1. To conduct an integrative meta-analysis of the differential gene expression of the PK gene family between clinical categories of BC progression (low vs high proliferation; luminal vs basal tissue; and grade 1 vs grade 3 tumours). Results from the meta-analysis will generate a ranked list of PK gene expression profiles observed in BC progression. 2. Through the use of powerful bioinformatics tools and sequence analysis interfaces the ranked PK list will be used to direct investigations into the correlations between: codon usage bias; aberrant epigenetic factors; somatic mutations; and observed structural/functional changes of deregulated PK genes in different BC progression categories. To address these objectives a series of in silico bioinformatics experiments have been designed. A software program (MYGEO) has been specifically written for: multiple dataset download; calculation of p-values between BC progression groups; finding Q-values to control for the false discovery rate over multiple dataset comparisons; and to perform permutation testing on the ranked PK gene list; and 2D/3D sequence analysis functions for the analysis of structure/function relationships in significantly differentiated PK genes in BC progression. This project will benefit our understanding of the complex system of BC biology by identifying significantly deregulated PK genes in BC progression. The results will identify BC biomarkers and structural/functional locations within PK genes not yet elucidated, thus providing new directions for the development of PK inhibitors and improving the effectiveness of current BC treatment strategies
    corecore