651 research outputs found

    Enabling Multi-level Trust in Privacy Preserving Data Mining

    Full text link
    Privacy Preserving Data Mining (PPDM) addresses the problem of developing accurate models about aggregated data without access to precise information in individual data record. A widely studied \emph{perturbation-based PPDM} approach introduces random perturbation to individual values to preserve privacy before data is published. Previous solutions of this approach are limited in their tacit assumption of single-level trust on data miners. In this work, we relax this assumption and expand the scope of perturbation-based PPDM to Multi-Level Trust (MLT-PPDM). In our setting, the more trusted a data miner is, the less perturbed copy of the data it can access. Under this setting, a malicious data miner may have access to differently perturbed copies of the same data through various means, and may combine these diverse copies to jointly infer additional information about the original data that the data owner does not intend to release. Preventing such \emph{diversity attacks} is the key challenge of providing MLT-PPDM services. We address this challenge by properly correlating perturbation across copies at different trust levels. We prove that our solution is robust against diversity attacks with respect to our privacy goal. That is, for data miners who have access to an arbitrary collection of the perturbed copies, our solution prevent them from jointly reconstructing the original data more accurately than the best effort using any individual copy in the collection. Our solution allows a data owner to generate perturbed copies of its data for arbitrary trust levels on-demand. This feature offers data owners maximum flexibility.Comment: 20 pages, 5 figures. Accepted for publication in IEEE Transactions on Knowledge and Data Engineerin

    The state of the market and the contrarian strategy: Evidence from China’s stock market

    Get PDF
    This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2012 The Chinese Economic Association.Using the most comprehensive weekly dataset of ‘A’ shares listed on the Chinese stock market, this paper examines short-term contrarian strategies under different market states from 1995–2010. We find statistically significant profits from contrarian strategies, especially during the period after 2007, when China (along with other countries) experienced an economic downturn following the worldwide financial crisis. Our empirical evidence suggests that: (1) no significant profit is generated from either momentum or contrarian strategies in the intermediate horizon; (2) after microstructure effects are adjusted for, contrarian strategies with only four to eight weeks holding periods based on the stocks’ previous four to eight week's performance generate statistically significant profits of around 0.2% per week; (3) the contrarian strategy following a ‘down’ market generates higher profit than those following an ‘up’ market, suggesting that a contrarian strategy could be used as a shelter when the market is in decline. The profits following a ‘down’ market are robust after risk adjustment

    Estimation of Extreme Quantiles for Functions of Dependent Random Variables

    Get PDF
    We propose a new method for estimating the extreme quantiles for a function of several dependent random variables. In contrast to the conventional approach based on extreme value theory, we do not impose the condition that the tail of the underlying distribution admits an approximate parametric form, and, furthermore, our estimation makes use of the full observed data. The proposed method is semiparametric as no parametric forms are assumed on all the marginal distributions. But we select appropriate bivariate copulas to model the joint dependence structure by taking the advantage of the recent development in constructing large dimensional vine copulas. Consequently a sample quantile resulted from a large bootstrap sample drawn from the fitted joint distribution is taken as the estimator for the extreme quantile. This estimator is proved to be consistent. The reliable and robust performance of the proposed method is further illustrated by simulation.Comment: 18 pages, 2 figure

    Nonlinear Regression Estimation Using Subset-Based Kernel Principal Components

    Get PDF
    We study the estimation of conditional mean regression functions through the so-called subset-based kernel principal component analysis (KPCA). Instead of using one global kernel feature space, we project a target function into different localized kernel feature spaces at dierent parts of the sample space. Each localized kernel feature space reflects the relationship on a subset between the response and covariates more parsimoniously. When the observations are collected from a strictly stationary and weakly dependent process, the orthonormal eigenfunctions which span the kernel feature space are consistently estimated by implementing an eigenanalysis on the subset-based kernel Gram matrix, and the estimated eigenfunctions are then used to construct the estimation of the mean regression function. Under some regularity conditions, the developed estimator is shown to be uniformly consistent over the subset with a convergence rate faster than those of some well-known nonparametric estimation methods. In addition, we also discuss some generalizations of the KPCA approach, and consider using the same subset-based KPCA approach to estimate the conditional distribution function. The numerical studies including three simulated examples and two real data sets illustrate the reliable performance of the proposed method. In particular, the improvement over the global KPCA method is evident

    Using Persistent Homology Topological Features to Characterize Medical Images: Case Studies on Lung and Brain Cancers

    Full text link
    Tumor shape is a key factor that affects tumor growth and metastasis. This paper proposes a topological feature computed by persistent homology to characterize tumor progression from digital pathology and radiology images and examines its effect on the time-to-event data. The proposed topological features are invariant to scale-preserving transformation and can summarize various tumor shape patterns. The topological features are represented in functional space and used as functional predictors in a functional Cox proportional hazards model. The proposed model enables interpretable inference about the association between topological shape features and survival risks. Two case studies are conducted using consecutive 143 lung cancer and 77 brain tumor patients. The results of both studies show that the topological features predict survival prognosis after adjusting clinical variables, and the predicted high-risk groups have significantly (at the level of 0.01) worse survival outcomes than the low-risk groups. Also, the topological shape features found to be positively associated with survival hazards are irregular and heterogeneous shape patterns, which are known to be related to tumor progression

    SGNet: Folding Symmetrical Protein Complex with Deep Learning

    Full text link
    Deep learning has made significant progress in protein structure prediction, advancing the development of computational biology. However, despite the high accuracy achieved in predicting single-chain structures, a significant number of large homo-oligomeric assemblies exhibit internal symmetry, posing a major challenge in structure determination. The performances of existing deep learning methods are limited since the symmetrical protein assembly usually has a long sequence, making structural computation infeasible. In addition, multiple identical subunits in symmetrical protein complex cause the issue of supervision ambiguity in label assignment, requiring a consistent structure modeling for the training. To tackle these problems, we propose a protein folding framework called SGNet to model protein-protein interactions in symmetrical assemblies. SGNet conducts feature extraction on a single subunit and generates the whole assembly using our proposed symmetry module, which largely mitigates computational problems caused by sequence length. Thanks to the elaborate design of modeling symmetry consistently, we can model all global symmetry types in quaternary protein structure prediction. Extensive experimental results on a benchmark of symmetrical protein complexes further demonstrate the effectiveness of our method
    • …
    corecore