36 research outputs found
Near-optimal Linear Decision Trees for k-SUM and Related Problems
We construct near-optimal linear decision trees for a variety of decision problems in combinatorics and discrete geometry. For example, for any constant
k
, we construct linear decision trees that solve the
k
-SUM problem on
n
elements using
O
(
n
log
2
n
) linear queries. Moreover, the queries we use are comparison queries, which compare the sums of two
k
-subsets; when viewed as linear queries, comparison queries are 2
k
-sparse and have only { −1,0,1} coefficients. We give similar constructions for sorting sumsets A+B and for solving the SUBSET-SUM problem, both with optimal number of queries, up to poly-logarithmic terms.
Our constructions are based on the notion of “inference dimension,” recently introduced by the authors in the context of active classification with comparison queries. This can be viewed as another contribution to the fruitful link between machine learning and discrete geometry, which goes back to the discovery of the VC dimension
Do Prices Coordinate Markets?
Walrasian equilibrium prices can be said to coordinate markets: They support
a welfare optimal allocation in which each buyer is buying bundle of goods that
is individually most preferred. However, this clean story has two caveats.
First, the prices alone are not sufficient to coordinate the market, and buyers
may need to select among their most preferred bundles in a coordinated way to
find a feasible allocation. Second, we don't in practice expect to encounter
exact equilibrium prices tailored to the market, but instead only approximate
prices, somehow encoding "distributional" information about the market. How
well do prices work to coordinate markets when tie-breaking is not coordinated,
and they encode only distributional information?
We answer this question. First, we provide a genericity condition such that
for buyers with Matroid Based Valuations, overdemand with respect to
equilibrium prices is at most 1, independent of the supply of goods, even when
tie-breaking is done in an uncoordinated fashion. Second, we provide
learning-theoretic results that show that such prices are robust to changing
the buyers in the market, so long as all buyers are sampled from the same
(unknown) distribution
Identifying Human Kinase-Specific Protein Phosphorylation Sites by Integrating Heterogeneous Information from Various Sources
Phosphorylation is an important type of protein post-translational modification. Identification of possible phosphorylation sites of a protein is important for understanding its functions. Unbiased screening for phosphorylation sites by in vitro or in vivo experiments is time consuming and expensive; in silico prediction can provide functional candidates and help narrow down the experimental efforts. Most of the existing prediction algorithms take only the polypeptide sequence around the phosphorylation sites into consideration. However, protein phosphorylation is a very complex biological process in vivo. The polypeptide sequences around the potential sites are not sufficient to determine the phosphorylation status of those residues. In the current work, we integrated various data sources such as protein functional domains, protein subcellular location and protein-protein interactions, along with the polypeptide sequences to predict protein phosphorylation sites. The heterogeneous information significantly boosted the prediction accuracy for some kinase families. To demonstrate potential application of our method, we scanned a set of human proteins and predicted putative phosphorylation sites for Cyclin-dependent kinases, Casein kinase 2, Glycogen synthase kinase 3, Mitogen-activated protein kinases, protein kinase A, and protein kinase C families (avaiable at http://cmbi.bjmu.edu.cn/huphospho). The predicted phosphorylation sites can serve as candidates for further experimental validation. Our strategy may also be applicable for the in silico identification of other post-translational modification substrates