647 research outputs found
Enabling Multi-level Trust in Privacy Preserving Data Mining
Privacy Preserving Data Mining (PPDM) addresses the problem of developing
accurate models about aggregated data without access to precise information in
individual data record. A widely studied \emph{perturbation-based PPDM}
approach introduces random perturbation to individual values to preserve
privacy before data is published. Previous solutions of this approach are
limited in their tacit assumption of single-level trust on data miners.
In this work, we relax this assumption and expand the scope of
perturbation-based PPDM to Multi-Level Trust (MLT-PPDM). In our setting, the
more trusted a data miner is, the less perturbed copy of the data it can
access. Under this setting, a malicious data miner may have access to
differently perturbed copies of the same data through various means, and may
combine these diverse copies to jointly infer additional information about the
original data that the data owner does not intend to release. Preventing such
\emph{diversity attacks} is the key challenge of providing MLT-PPDM services.
We address this challenge by properly correlating perturbation across copies at
different trust levels. We prove that our solution is robust against diversity
attacks with respect to our privacy goal. That is, for data miners who have
access to an arbitrary collection of the perturbed copies, our solution prevent
them from jointly reconstructing the original data more accurately than the
best effort using any individual copy in the collection. Our solution allows a
data owner to generate perturbed copies of its data for arbitrary trust levels
on-demand. This feature offers data owners maximum flexibility.Comment: 20 pages, 5 figures. Accepted for publication in IEEE Transactions on
Knowledge and Data Engineerin
The state of the market and the contrarian strategy: Evidence from China’s stock market
This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2012 The Chinese Economic Association.Using the most comprehensive weekly dataset of ‘A’ shares listed on the Chinese stock market, this paper examines short-term contrarian strategies under different market states from 1995–2010. We find statistically significant profits from contrarian strategies, especially during the period after 2007, when China (along with other countries) experienced an economic downturn following the worldwide financial crisis. Our empirical evidence suggests that: (1) no significant profit is generated from either momentum or contrarian strategies in the intermediate horizon; (2) after microstructure effects are adjusted for, contrarian strategies with only four to eight weeks holding periods based on the stocks’ previous four to eight week's performance generate statistically significant profits of around 0.2% per week; (3) the contrarian strategy following a ‘down’ market generates higher profit than those following an ‘up’ market, suggesting that a contrarian strategy could be used as a shelter when the market is in decline. The profits following a ‘down’ market are robust after risk adjustment
Estimation of Extreme Quantiles for Functions of Dependent Random Variables
We propose a new method for estimating the extreme quantiles for a function
of several dependent random variables. In contrast to the conventional approach
based on extreme value theory, we do not impose the condition that the tail of
the underlying distribution admits an approximate parametric form, and,
furthermore, our estimation makes use of the full observed data. The proposed
method is semiparametric as no parametric forms are assumed on all the marginal
distributions. But we select appropriate bivariate copulas to model the joint
dependence structure by taking the advantage of the recent development in
constructing large dimensional vine copulas. Consequently a sample quantile
resulted from a large bootstrap sample drawn from the fitted joint distribution
is taken as the estimator for the extreme quantile. This estimator is proved to
be consistent. The reliable and robust performance of the proposed method is
further illustrated by simulation.Comment: 18 pages, 2 figure
Nonlinear Regression Estimation Using Subset-Based Kernel Principal Components
We study the estimation of conditional mean regression functions through the so-called subset-based kernel principal component analysis (KPCA). Instead of using one global kernel feature space, we project a target function into different localized kernel feature spaces at dierent parts of the sample space. Each localized kernel feature space reflects the relationship on a subset between the response and covariates more parsimoniously. When the observations are collected from a strictly stationary and weakly dependent process, the orthonormal eigenfunctions which span the kernel feature space are consistently estimated by implementing an eigenanalysis on the subset-based kernel Gram matrix, and the estimated eigenfunctions are then used to construct the estimation of the mean regression function. Under some regularity conditions, the developed estimator is shown to be uniformly consistent over the subset with a convergence rate faster than those of some well-known nonparametric estimation methods. In addition, we also discuss some generalizations of the KPCA approach, and consider using the same subset-based KPCA approach to estimate the conditional distribution function. The numerical studies including three simulated examples and two real data sets illustrate the reliable performance of the proposed method. In particular, the improvement over the global KPCA method is evident
Using Persistent Homology Topological Features to Characterize Medical Images: Case Studies on Lung and Brain Cancers
Tumor shape is a key factor that affects tumor growth and metastasis. This
paper proposes a topological feature computed by persistent homology to
characterize tumor progression from digital pathology and radiology images and
examines its effect on the time-to-event data. The proposed topological
features are invariant to scale-preserving transformation and can summarize
various tumor shape patterns. The topological features are represented in
functional space and used as functional predictors in a functional Cox
proportional hazards model. The proposed model enables interpretable inference
about the association between topological shape features and survival risks.
Two case studies are conducted using consecutive 143 lung cancer and 77 brain
tumor patients. The results of both studies show that the topological features
predict survival prognosis after adjusting clinical variables, and the
predicted high-risk groups have significantly (at the level of 0.01) worse
survival outcomes than the low-risk groups. Also, the topological shape
features found to be positively associated with survival hazards are irregular
and heterogeneous shape patterns, which are known to be related to tumor
progression
SGNet: Folding Symmetrical Protein Complex with Deep Learning
Deep learning has made significant progress in protein structure prediction,
advancing the development of computational biology. However, despite the high
accuracy achieved in predicting single-chain structures, a significant number
of large homo-oligomeric assemblies exhibit internal symmetry, posing a major
challenge in structure determination. The performances of existing deep
learning methods are limited since the symmetrical protein assembly usually has
a long sequence, making structural computation infeasible. In addition,
multiple identical subunits in symmetrical protein complex cause the issue of
supervision ambiguity in label assignment, requiring a consistent structure
modeling for the training. To tackle these problems, we propose a protein
folding framework called SGNet to model protein-protein interactions in
symmetrical assemblies. SGNet conducts feature extraction on a single subunit
and generates the whole assembly using our proposed symmetry module, which
largely mitigates computational problems caused by sequence length. Thanks to
the elaborate design of modeling symmetry consistently, we can model all global
symmetry types in quaternary protein structure prediction. Extensive
experimental results on a benchmark of symmetrical protein complexes further
demonstrate the effectiveness of our method
- …