61,756 research outputs found
Distance functional dependencies in the presence of complex values
Distance functional dependencies (dFDs) have been introduced in the context of the relational data model as a generalisation of error-robust functional dependencies (erFDs). An erFD is a dependency that still holds, if errors are introduced into a relation, which cause the violation of an original functional dependency. A dFD with a distance d=2e+1 corresponds to an erFD with at most e errors in each tuple. Recently, an axiomatisation of dFDs has been obtained. Database theory, however, does no longer deal only with flat relations. Modern data models such as the higher-order Entity-Relationship model (HERM), object oriented datamodels (OODM), or the eXtensible Meakup Language (XML) provide constructors for complex values such as finite sets, multisets and lists. In this article, dFDs with complex values are investigated. Based on a generalisation of the HAmming distance for tuples to complex values, which exploits a lattice structure on subattributes, the major achievement is a finite axiomatisation of the new class of dependencies
Genome-scale analysis identifies paralog lethality as a vulnerability of chromosome 1p loss in cancer.
Functional redundancy shared by paralog genes may afford protection against genetic perturbations, but it can also result in genetic vulnerabilities due to mutual interdependency1-5. Here, we surveyed genome-scale short hairpin RNA and CRISPR screening data on hundreds of cancer cell lines and identified MAGOH and MAGOHB, core members of the splicing-dependent exon junction complex, as top-ranked paralog dependencies6-8. MAGOHB is the top gene dependency in cells with hemizygous MAGOH deletion, a pervasive genetic event that frequently occurs due to chromosome 1p loss. Inhibition of MAGOHB in a MAGOH-deleted context compromises viability by globally perturbing alternative splicing and RNA surveillance. Dependency on IPO13, an importin-β receptor that mediates nuclear import of the MAGOH/B-Y14 heterodimer9, is highly correlated with dependency on both MAGOH and MAGOHB. Both MAGOHB and IPO13 represent dependencies in murine xenografts with hemizygous MAGOH deletion. Our results identify MAGOH and MAGOHB as reciprocal paralog dependencies across cancer types and suggest a rationale for targeting the MAGOHB-IPO13 axis in cancers with chromosome 1p deletion
The generalized shrinkage estimator for the analysis of functional connectivity of brain signals
We develop a new statistical method for estimating functional connectivity
between neurophysiological signals represented by a multivariate time series.
We use partial coherence as the measure of functional connectivity. Partial
coherence identifies the frequency bands that drive the direct linear
association between any pair of channels. To estimate partial coherence, one
would first need an estimate of the spectral density matrix of the multivariate
time series. Parametric estimators of the spectral density matrix provide good
frequency resolution but could be sensitive when the parametric model is
misspecified. Smoothing-based nonparametric estimators are robust to model
misspecification and are consistent but may have poor frequency resolution. In
this work, we develop the generalized shrinkage estimator, which is a weighted
average of a parametric estimator and a nonparametric estimator. The optimal
weights are frequency-specific and derived under the quadratic risk criterion
so that the estimator, either the parametric estimator or the nonparametric
estimator, that performs better at a particular frequency receives heavier
weight. We validate the proposed estimator in a simulation study and apply it
on electroencephalogram recordings from a visual-motor experiment.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS396 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Evaluation of an automatic f-structure annotation algorithm against the PARC 700 dependency bank
An automatic method for annotating the Penn-II Treebank (Marcus et al., 1994) with high-level Lexical Functional Grammar (Kaplan and Bresnan, 1982; Bresnan, 2001; Dalrymple, 2001) f-structure representations is described in (Cahill et al., 2002; Cahill et al., 2004a; Cahill et al., 2004b; O’Donovan et al., 2004). The annotation algorithm and the automatically-generated f-structures are the basis for the automatic acquisition of wide-coverage and robust probabilistic approximations of LFG grammars (Cahill et al., 2002; Cahill et al., 2004a) and for the induction of LFG semantic forms (O’Donovan et al., 2004). The quality of the annotation algorithm and the f-structures it generates is, therefore, extremely important. To date, annotation quality has been measured in terms of precision and recall against the DCU 105. The annotation algorithm currently achieves an f-score of 96.57% for complete f-structures and 94.3% for preds-only
f-structures. There are a number of problems with evaluating against a gold standard of this size, most
notably that of overfitting. There is a risk of assuming that the gold standard is a complete and balanced
representation of the linguistic phenomena in a language and basing design decisions on this. It is, therefore,
preferable to evaluate against a more extensive, external standard. Although the DCU 105 is publicly available,
1 a larger well-established external standard can provide a more widely-recognised benchmark against which the quality of the f-structure annotation algorithm can be evaluated. For these reasons, we present an evaluation of the f-structure annotation algorithm of (Cahill et al., 2002; Cahill et al., 2004a; Cahill et al., 2004b; O’Donovan et al., 2004) against the PARC 700 Dependency Bank (King et al., 2003). Evaluation against an external gold standard is a non-trivial task as linguistic analyses may differ systematically between the gold standard and the output to be evaluated as regards feature geometry and nomenclature. We present conversion software to automatically account for many (but not all) of the systematic differences. Currently, we achieve an f-score of 87.31% for the f-structures generated from the original Penn-II trees and
an f-score of 81.79% for f-structures from parse trees produced by Charniak’s (2000) parser in our pipeline
parsing architecture against the PARC 700
Early Stop Criterion from the Bootstrap Ensemble
This paper addresses the problem of generalization error estimation in neural networks. A new early stop criterion based on a Bootstrap estimate of the generlization error is suggested. The estimate does not require the network to be trained to the minimum of the cost function, as required by other methods based on asymptotic theory. Moreover, in constrast to methods based on cross-validation which require data left out for testing, and thus biasing the estimate, the Bootstrap technique does not have this disadvantage. The potential of the suggested technique is demonstrated on various time-series problems. 1. INTRODUCTION The goal of neural network learning in signal processing is to identify robust functional dependencies between input and output data (for an introduction see e.g., [3]). Such learning usually proceeds from a finite random sample of training data; hence, the functions implemented by neural networks are stochastic depending on the particular available training set. T..
Quantifying dependencies for sensitivity analysis with multivariate input sample data
We present a novel method for quantifying dependencies in multivariate
datasets, based on estimating the R\'{e}nyi entropy by minimum spanning trees
(MSTs). The length of the MSTs can be used to order pairs of variables from
strongly to weakly dependent, making it a useful tool for sensitivity analysis
with dependent input variables. It is well-suited for cases where the input
distribution is unknown and only a sample of the inputs is available. We
introduce an estimator to quantify dependency based on the MST length, and
investigate its properties with several numerical examples. To reduce the
computational cost of constructing the exact MST for large datasets, we explore
methods to compute approximations to the exact MST, and find the multilevel
approach introduced recently by Zhong et al. (2015) to be the most accurate. We
apply our proposed method to an artificial testcase based on the Ishigami
function, as well as to a real-world testcase involving sediment transport in
the North Sea. The results are consistent with prior knowledge and heuristic
understanding, as well as with variance-based analysis using Sobol indices in
the case where these indices can be computed
- …