407 research outputs found
Top-K Queries on Uncertain Data: On Score Distribution and Typical Answers
Uncertain data arises in a number of domains, including data integration and sensor networks. Top-k queries that rank results according to some user-defined score are an important tool for exploring large uncertain data sets. As several recent papers have observed, the semantics of top-k queries on uncertain data can be ambiguous due to tradeoffs between reporting high-scoring tuples and tuples with a high probability of being in the resulting data set. In this paper, we demonstrate the need to present the score distribution of top-k vectors to allow the user to choose between results along this score-probability dimensions. One option would be to display the complete distribution of all potential top-k tuple vectors, but this set is too large to compute. Instead, we propose to provide a number of typical vectors that effectively sample this distribution. We propose efficient algorithms to compute these vectors. We also extend the semantics and algorithms to the scenario of score ties, which is not dealt with in the previous work in the area. Our work includes a systematic empirical study on both real dataset and synthetic datasets.National Natural Science Foundation (Grant number IIS-0086057)National Natural Science Foundation (Grant number IIS- 0325838)National Natural Science Foundation (Grant number IIS-0448124
Recommended from our members
Cryo-EM structure of the mature dengue virus at 3.5-Å resolution.
Regulated by pH, membrane-anchored proteins E and M function during dengue virus maturation and membrane fusion. Our atomic model of the whole virion from cryo-electron microscopy at 3.5-Å resolution reveals that in the mature virus at neutral extracellular pH, the N-terminal 20-amino-acid segment of M (involving three pH-sensing histidines) latches and thereby prevents spring-loaded E fusion protein from prematurely exposing its fusion peptide. This M latch is fastened at an earlier stage, during maturation at acidic pH in the trans-Golgi network. At a later stage, to initiate infection in response to acidic pH in the late endosome, M releases the latch and exposes the fusion peptide. Thus, M serves as a multistep chaperone of E to control the conformational changes accompanying maturation and infection. These pH-sensitive interactions could serve as targets for drug discovery
What Does the Gradient Tell When Attacking the Graph Structure
Recent research has revealed that Graph Neural Networks (GNNs) are
susceptible to adversarial attacks targeting the graph structure. A malicious
attacker can manipulate a limited number of edges, given the training labels,
to impair the victim model's performance. Previous empirical studies indicate
that gradient-based attackers tend to add edges rather than remove them. In
this paper, we present a theoretical demonstration revealing that attackers
tend to increase inter-class edges due to the message passing mechanism of
GNNs, which explains some previous empirical observations. By connecting
dissimilar nodes, attackers can more effectively corrupt node features, making
such attacks more advantageous. However, we demonstrate that the inherent
smoothness of GNN's message passing tends to blur node dissimilarity in the
feature space, leading to the loss of crucial information during the forward
process. To address this issue, we propose a novel surrogate model with
multi-level propagation that preserves the node dissimilarity information. This
model parallelizes the propagation of unaggregated raw features and multi-hop
aggregated features, while introducing batch normalization to enhance the
dissimilarity in node representations and counteract the smoothness resulting
from topological aggregation. Our experiments show significant improvement with
our approach.Furthermore, both theoretical and experimental evidence suggest
that adding inter-class edges constitutes an easily observable attack pattern.
We propose an innovative attack loss that balances attack effectiveness and
imperceptibility, sacrificing some attack effectiveness to attain greater
imperceptibility. We also provide experiments to validate the compromise
performance achieved through this attack loss
Decoupled Mixup for Data-efficient Learning
Mixup is an efficient data augmentation approach that improves the
generalization of neural networks by smoothing the decision boundary with mixed
data. Recently, dynamic mixup methods have improved previous static policies
effectively (e.g., linear interpolation) by maximizing salient regions or
maintaining the target in mixed samples. The discrepancy is that the generated
mixed samples from dynamic policies are more instance discriminative than the
static ones, e.g., the foreground objects are decoupled from the background.
However, optimizing mixup policies with dynamic methods in input space is an
expensive computation compared to static ones. Hence, we are trying to transfer
the decoupling mechanism of dynamic methods from the data level to the
objective function level and propose the general decoupled mixup (DM) loss. The
primary effect is that DM can adaptively focus on discriminative features
without losing the original smoothness of the mixup while avoiding heavy
computational overhead. As a result, DM enables static mixup methods to achieve
comparable or even exceed the performance of dynamic methods. This also leads
to an interesting objective design problem for mixup training that we need to
focus on both smoothing the decision boundaries and identifying discriminative
features. Extensive experiments on supervised and semi-supervised learning
benchmarks across seven classification datasets validate the effectiveness of
DM by equipping it with various mixup methods.Comment: The preprint revision, 15 pages, 6 figures. The source code is
available at https://github.com/Westlake-AI/openmixu
Cryo-EM model of the bullet-shaped vesicular stomatitis virus.
Vesicular stomatitis virus (VSV) is a bullet-shaped rhabdovirus and a model system of negative-strand RNA viruses. Through direct visualization by means of cryo-electron microscopy, we show that each virion contains two nested, left-handed helices: an outer helix of matrix protein M and an inner helix of nucleoprotein N and RNA. M has a hub domain with four contact sites that link to neighboring M and N subunits, providing rigidity by clamping adjacent turns of the nucleocapsid. Side-by-side interactions between neighboring N subunits are critical for the nucleocapsid to form a bullet shape, and structure-based mutagenesis results support this description. Together, our data suggest a mechanism of VSV assembly in which the nucleocapsid spirals from the tip to become the helical trunk, both subsequently framed and rigidified by the M layer
Endothelial Stomatal and Fenestral Diaphragms in Normal Vessels and Angiogenesis
Vascular endothelium lines the entire cardiovascular system where performs a series of vital functions including the control of microvascular permeability, coagulation inflammation, vascular tone as well as the formation of new vessels via vasculogenesis and angiogenesis in normal and disease states. Normal endothelium consists of heterogeneous populations of cells differentiated according to the vascular bed and segment of the vascular tree where they occur. One of the cardinal features is the expression of specific subcellular structures such as plasmalemmal vesicles or caveolae, transendothelial channels, vesiculo-vacuolar organelles, endothelial pockets and fenestrae, whose presence define several endothelial morphological types. A less explored observation is the differential expression of such structures in diverse settings of angiogenesis. This review will focus on the latest developments on the components, structure and function of these specific endothelial structures in normal endothelium as well as in diverse settings of angiogenesis
Re and<sup> 99m</sup>Tc complexes of BodP<sub>3</sub> – multi-modality imaging probes
A fluorescent tridentate phosphine, BodP(3) (2), forms rhenium complexes which effectively image cancer cells. Related technetium analogues are also readily prepared and have potential as dual SPECT/fluorescent biological probes
Random-phase approximation and its applications in computational chemistry and materials science
The random-phase approximation (RPA) as an approach for computing the
electronic correlation energy is reviewed. After a brief account of its basic
concept and historical development, the paper is devoted to the theoretical
formulations of RPA, and its applications to realistic systems. With several
illustrating applications, we discuss the implications of RPA for computational
chemistry and materials science. The computational cost of RPA is also
addressed which is critical for its widespread use in future applications. In
addition, current correction schemes going beyond RPA and directions of further
development will be discussed.Comment: 25 pages, 11 figures, published online in J. Mater. Sci. (2012
Effective Rheology of Bubbles Moving in a Capillary Tube
We calculate the average volumetric flux versus pressure drop of bubbles
moving in a single capillary tube with varying diameter, finding a square-root
relation from mapping the flow equations onto that of a driven overdamped
pendulum. The calculation is based on a derivation of the equation of motion of
a bubble train from considering the capillary forces and the entropy production
associated with the viscous flow. We also calculate the configurational
probability of the positions of the bubbles.Comment: 4 pages, 1 figur
TRY plant trait database - enhanced coverage and open access
Plant traits-the morphological, anatomical, physiological, biochemical and phenological characteristics of plants-determine how plants respond to environmental factors, affect other trophic levels, and influence ecosystem properties and their benefits and detriments to people. Plant trait data thus represent the basis for a vast area of research spanning from evolutionary biology, community and functional ecology, to biodiversity conservation, ecosystem and landscape management, restoration, biogeography and earth system modelling. Since its foundation in 2007, the TRY database of plant traits has grown continuously. It now provides unprecedented data coverage under an open access data policy and is the main plant trait database used by the research community worldwide. Increasingly, the TRY database also supports new frontiers of trait-based plant research, including the identification of data gaps and the subsequent mobilization or measurement of new data. To support this development, in this article we evaluate the extent of the trait data compiled in TRY and analyse emerging patterns of data coverage and representativeness. Best species coverage is achieved for categorical traits-almost complete coverage for 'plant growth form'. However, most traits relevant for ecology and vegetation modelling are characterized by continuous intraspecific variation and trait-environmental relationships. These traits have to be measured on individual plants in their respective environment. Despite unprecedented data coverage, we observe a humbling lack of completeness and representativeness of these continuous traits in many aspects. We, therefore, conclude that reducing data gaps and biases in the TRY database remains a key challenge and requires a coordinated approach to data mobilization and trait measurements. This can only be achieved in collaboration with other initiatives
- …