4,992 research outputs found
One-point fluctuation analysis of the high-energy neutrino sky
We perform the first one-point fluctuation analysis of the high-energy
neutrino sky. This method reveals itself to be especially suited to
contemporary neutrino data, as it allows to study the properties of the
astrophysical components of the high-energy flux detected by the IceCube
telescope, even with low statistics and in the absence of point source
detection. Besides the veto-passing atmospheric foregrounds, we adopt a simple
model of the high-energy neutrino background by assuming two main
extra-galactic components: star-forming galaxies and blazars. By leveraging
multi-wavelength data from Herschel and Fermi, we predict the spectral and
anisotropic probability distributions for their expected neutrino counts in
IceCube. We find that star-forming galaxies are likely to remain a diffuse
background due to the poor angular resolution of IceCube, and we determine an
upper limit on the number of shower events that can reasonably be associated to
blazars. We also find that upper limits on the contribution of blazars to the
measured flux are unfavourably affected by the skewness of the blazar flux
distribution. One-point event clustering and likelihood analyses of the IceCube
HESE data suggest that this method has the potential to dramatically improve
over more conventional model-based analyses, especially for the next generation
of neutrino telescopes.Comment: 41 pages, 6 figures, 2 tables; different blazar model than v1 but
same result
SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation
We present SimLex-999, a gold standard resource for evaluating distributional
semantic models that improves on existing resources in several important ways.
First, in contrast to gold standards such as WordSim-353 and MEN, it explicitly
quantifies similarity rather than association or relatedness, so that pairs of
entities that are associated but not actually similar [Freud, psychology] have
a low rating. We show that, via this focus on similarity, SimLex-999
incentivizes the development of models with a different, and arguably wider
range of applications than those which reflect conceptual association. Second,
SimLex-999 contains a range of concrete and abstract adjective, noun and verb
pairs, together with an independent rating of concreteness and (free)
association strength for each pair. This diversity enables fine-grained
analyses of the performance of models on concepts of different types, and
consequently greater insight into how architectures can be improved. Further,
unlike existing gold standard evaluations, for which automatic approaches have
reached or surpassed the inter-annotator agreement ceiling, state-of-the-art
models perform well below this ceiling on SimLex-999. There is therefore plenty
of scope for SimLex-999 to quantify future improvements to distributional
semantic models, guiding the development of the next generation of
representation-learning architectures
Fast and Robust Rank Aggregation against Model Misspecification
In rank aggregation, preferences from different users are summarized into a
total order under the homogeneous data assumption. Thus, model misspecification
arises and rank aggregation methods take some noise models into account.
However, they all rely on certain noise model assumptions and cannot handle
agnostic noises in the real world. In this paper, we propose CoarsenRank, which
rectifies the underlying data distribution directly and aligns it to the
homogeneous data assumption without involving any noise model. To this end, we
define a neighborhood of the data distribution over which Bayesian inference of
CoarsenRank is performed, and therefore the resultant posterior enjoys
robustness against model misspecification. Further, we derive a tractable
closed-form solution for CoarsenRank making it computationally efficient.
Experiments on real-world datasets show that CoarsenRank is fast and robust,
achieving consistent improvement over baseline methods
ChatAnything: Facetime Chat with LLM-Enhanced Personas
In this technical report, we target generating anthropomorphized personas for
LLM-based characters in an online manner, including visual appearance,
personality and tones, with only text descriptions. To achieve this, we first
leverage the in-context learning capability of LLMs for personality generation
by carefully designing a set of system prompts. We then propose two novel
concepts: the mixture of voices (MoV) and the mixture of diffusers (MoD) for
diverse voice and appearance generation. For MoV, we utilize the text-to-speech
(TTS) algorithms with a variety of pre-defined tones and select the most
matching one based on the user-provided text description automatically. For
MoD, we combine the recent popular text-to-image generation techniques and
talking head algorithms to streamline the process of generating talking
objects. We termed the whole framework as ChatAnything. With it, users could be
able to animate anything with any personas that are anthropomorphic using just
a few text inputs. However, we have observed that the anthropomorphic objects
produced by current generative models are often undetectable by pre-trained
face landmark detectors, leading to failure of the face motion generation, even
if these faces possess human-like appearances because those images are nearly
seen during the training (e.g., OOD samples). To address this issue, we
incorporate pixel-level guidance to infuse human face landmarks during the
image generation phase. To benchmark these metrics, we have built an evaluation
dataset. Based on it, we verify that the detection rate of the face landmark is
significantly increased from 57.0% to 92.5% thus allowing automatic face
animation based on generated speech content. The code and more results can be
found at https://chatanything.github.io/
Identification of Parallel Passages Across a Large Hebrew/Aramaic Corpus
We propose a method for efficiently finding all parallel passages in a large
corpus, even if the passages are not quite identical due to rephrasing and
orthographic variation. The key ideas are the representation of each word in
the corpus by its two most infrequent letters, finding matched pairs of strings
of four or five words that differ by at most one word and then identifying
clusters of such matched pairs. Using this method, over 4600 parallel pairs of
passages were identified in the Babylonian Talmud, a Hebrew-Aramaic corpus of
over 1.8 million words, in just over 30 seconds. Empirical comparisons on
sample data indicate that the coverage obtained by our method is essentially
the same as that obtained using slow exhaustive methods.Comment: Submission to the Journal of Data Mining and Digital Humanities
(Special Issue on Computer-Aided Processing of Intertextuality in Ancient
Languages
Disentangling Adversarial Robustness and Generalization
Obtaining deep networks that are robust against adversarial examples and
generalize well is an open problem. A recent hypothesis even states that both
robust and accurate models are impossible, i.e., adversarial robustness and
generalization are conflicting goals. In an effort to clarify the relationship
between robustness and generalization, we assume an underlying, low-dimensional
data manifold and show that: 1. regular adversarial examples leave the
manifold; 2. adversarial examples constrained to the manifold, i.e.,
on-manifold adversarial examples, exist; 3. on-manifold adversarial examples
are generalization errors, and on-manifold adversarial training boosts
generalization; 4. regular robustness and generalization are not necessarily
contradicting goals. These assumptions imply that both robust and accurate
models are possible. However, different models (architectures, training
strategies etc.) can exhibit different robustness and generalization
characteristics. To confirm our claims, we present extensive experiments on
synthetic data (with known manifold) as well as on EMNIST, Fashion-MNIST and
CelebA.Comment: Conference on Computer Vision and Pattern Recognition 201
- …