146,121 research outputs found
Analyzing the Fine Structure of Distributions
One aim of data mining is the identification of interesting structures in
data. For better analytical results, the basic properties of an empirical
distribution, such as skewness and eventual clipping, i.e. hard limits in value
ranges, need to be assessed. Of particular interest is the question of whether
the data originate from one process or contain subsets related to different
states of the data producing process. Data visualization tools should deliver a
clear picture of the univariate probability density distribution (PDF) for each
feature. Visualization tools for PDFs typically use kernel density estimates
and include both the classical histogram, as well as the modern tools like
ridgeline plots, bean plots and violin plots. If density estimation parameters
remain in a default setting, conventional methods pose several problems when
visualizing the PDF of uniform, multimodal, skewed distributions and
distributions with clipped data, For that reason, a new visualization tool
called the mirrored density plot (MD plot), which is specifically designed to
discover interesting structures in continuous features, is proposed. The MD
plot does not require adjusting any parameters of density estimation, which is
what may make the use of this plot compelling particularly to non-experts. The
visualization tools in question are evaluated against statistical tests with
regard to typical challenges of explorative distribution analysis. The results
of the evaluation are presented using bimodal Gaussian, skewed distributions
and several features with already published PDFs. In an exploratory data
analysis of 12 features describing quarterly financial statements, when
statistical testing poses a great difficulty, only the MD plots can identify
the structure of their PDFs. In sum, the MD plot outperforms the above
mentioned methods.Comment: 66 pages, 81 figures, accepted in PLOS ON
Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations
Joint-embedding based learning (e.g., SimCLR, MoCo, DINO) and
reconstruction-based learning (e.g., BEiT, SimMIM, MAE) are the two leading
paradigms for self-supervised learning of vision transformers, but they differ
substantially in their transfer performance. Here, we aim to explain these
differences by analyzing the impact of these objectives on the structure and
transferability of the learned representations. Our analysis reveals that
reconstruction-based learning features are significantly dissimilar to
joint-embedding based learning features and that models trained with similar
objectives learn similar features even across architectures. These differences
arise early in the network and are primarily driven by attention and
normalization layers. We find that joint-embedding features yield better linear
probe transfer for classification because the different objectives drive
different distributions of information and invariances in the learned
representation. These differences explain opposite trends in transfer
performance for downstream tasks that require spatial specificity in features.
Finally, we address how fine-tuning changes reconstructive representations to
enable better transfer, showing that fine-tuning re-organizes the information
to be more similar to pre-trained joint embedding models
Microscopic theory of nuclear-structure effects in atomic systems
In this thesis, nuclear-structure effects in atomic systems are investigated from the microscopic point of view. To this end, a detailed description of nuclear dynamics is incorporated into calculations of the finite-nuclear-size and nuclear-polarization corrections to atomic energy levels and the bound-electron g factor. Hydrogen-like highly charged ions as well as muonic atoms are considered. Nuclear ground-state charge distributions are obtained within the Hartree-Fock method, while complete nuclear excitation spectra are computed by means of the random-phase approximation. The interaction between nucleons is modelled by the effective Skyrme force. The effects of nuclear excitations on atomic properties are described in a field-theoretical framework, where the full Dirac spectrum of a bound electron or muon is taken into account with the help of finite basis-set methods. Special attention is given to analyzing the nuclear model dependence, and the uncertainties of the calculations are estimated. In addition, the suppression of nuclear-structure effects in various weighted differences is discussed. Finally, the developed methods and computational codes are applied to the long-standing problem of the fine-structure anomalies in heavy muonic atoms
Mechanism of Deep-focus Earthquakes Anomalous Statistics
Analyzing the NEIC-data we have shown that the spatial deep-focus earthquake
distribution in the Earth interior over the 1993-2006 is characterized by the
clearly defined periodical fine discrete structure with period L=50 km, which
is solely generated by earthquakes with magnitude M 3.9 to 5.3 and only on the
convergent boundary of plates. To describe the formation of this structure we
used the model of complex systems by A. Volynskii and S. Bazhenov. The key
property of this model consists in the presence of a rigid coating on a soft
substratum. It is shown that in subduction processes the role of a rigid
coating plays the slab substance (lithosphere) and the upper mantle acts as a
soft substratum. Within the framework of this model we have obtained the
estimation of average values of stress in the upper mantle and Young's modulus
for the oceanic slab (lithosphere) and upper mantle.Comment: 9 pages, 7 figure
A Spatio-Temporal Point Process Model for Ambulance Demand
Ambulance demand estimation at fine time and location scales is critical for
fleet management and dynamic deployment. We are motivated by the problem of
estimating the spatial distribution of ambulance demand in Toronto, Canada, as
it changes over discrete 2-hour intervals. This large-scale dataset is sparse
at the desired temporal resolutions and exhibits location-specific serial
dependence, daily and weekly seasonality. We address these challenges by
introducing a novel characterization of time-varying Gaussian mixture models.
We fix the mixture component distributions across all time periods to overcome
data sparsity and accurately describe Toronto's spatial structure, while
representing the complex spatio-temporal dynamics through time-varying mixture
weights. We constrain the mixture weights to capture weekly seasonality, and
apply a conditionally autoregressive prior on the mixture weights of each
component to represent location-specific short-term serial dependence and daily
seasonality. While estimation may be performed using a fixed number of mixture
components, we also extend to estimate the number of components using
birth-and-death Markov chain Monte Carlo. The proposed model is shown to give
higher statistical predictive accuracy and to reduce the error in predicting
EMS operational performance by as much as two-thirds compared to a typical
industry practice
A Big Data Analyzer for Large Trace Logs
Current generation of Internet-based services are typically hosted on large
data centers that take the form of warehouse-size structures housing tens of
thousands of servers. Continued availability of a modern data center is the
result of a complex orchestration among many internal and external actors
including computing hardware, multiple layers of intricate software, networking
and storage devices, electrical power and cooling plants. During the course of
their operation, many of these components produce large amounts of data in the
form of event and error logs that are essential not only for identifying and
resolving problems but also for improving data center efficiency and
management. Most of these activities would benefit significantly from data
analytics techniques to exploit hidden statistical patterns and correlations
that may be present in the data. The sheer volume of data to be analyzed makes
uncovering these correlations and patterns a challenging task. This paper
presents BiDAl, a prototype Java tool for log-data analysis that incorporates
several Big Data technologies in order to simplify the task of extracting
information from data traces produced by large clusters and server farms. BiDAl
provides the user with several analysis languages (SQL, R and Hadoop MapReduce)
and storage backends (HDFS and SQLite) that can be freely mixed and matched so
that a custom tool for a specific task can be easily constructed. BiDAl has a
modular architecture so that it can be extended with other backends and
analysis languages in the future. In this paper we present the design of BiDAl
and describe our experience using it to analyze publicly-available traces from
Google data clusters, with the goal of building a realistic model of a complex
data center.Comment: 26 pages, 10 figure
Emulsion Chamber with Big Radiation Length for Detecting Neutrino Oscillations
A conceptual scheme of a hybrid-emulsion spectrometer for investigating
various channels of neutrino oscillations is proposed. The design emphasizes
detection of leptons by detached vertices, reliable identification of
electrons, and good spectrometry for all charged particles and photons. A
distributed target is formed by layers of low-Z material,
emulsion-plastic-emulsion sheets, and air gaps in which decays are
detected. The tracks of charged secondaries, including electrons, are
momentum-analyzed by curvature in magnetic field using hits in successive thin
layers of emulsion. The leptons are efficiently detected in all major
decay channels, including \xedec. Performance of a model spectrometer, that
contains 3 tons of nuclear emulsion and 20 tons of passive material, is
estimated for different experimental environments. When irradiated by the
beam of a proton accelerator over a medium baseline of km/GeV, the spectrometer will efficiently detect either the \omutau and
\omue transitions in the mass-difference region of eV,
as suggested by the results of LSND. When exposed to the neutrino beam of a
muon storage ring over a long baseline of 10-20 km/GeV, the
model detector will efficiently probe the entire pattern of neutrino
oscillations in the region eV, as
suggested by the data on atmospheric neutrinos.Comment: 34 pages, 8 figure
- …