146,121 research outputs found

    Analyzing the Fine Structure of Distributions

    Full text link
    One aim of data mining is the identification of interesting structures in data. For better analytical results, the basic properties of an empirical distribution, such as skewness and eventual clipping, i.e. hard limits in value ranges, need to be assessed. Of particular interest is the question of whether the data originate from one process or contain subsets related to different states of the data producing process. Data visualization tools should deliver a clear picture of the univariate probability density distribution (PDF) for each feature. Visualization tools for PDFs typically use kernel density estimates and include both the classical histogram, as well as the modern tools like ridgeline plots, bean plots and violin plots. If density estimation parameters remain in a default setting, conventional methods pose several problems when visualizing the PDF of uniform, multimodal, skewed distributions and distributions with clipped data, For that reason, a new visualization tool called the mirrored density plot (MD plot), which is specifically designed to discover interesting structures in continuous features, is proposed. The MD plot does not require adjusting any parameters of density estimation, which is what may make the use of this plot compelling particularly to non-experts. The visualization tools in question are evaluated against statistical tests with regard to typical challenges of explorative distribution analysis. The results of the evaluation are presented using bimodal Gaussian, skewed distributions and several features with already published PDFs. In an exploratory data analysis of 12 features describing quarterly financial statements, when statistical testing poses a great difficulty, only the MD plots can identify the structure of their PDFs. In sum, the MD plot outperforms the above mentioned methods.Comment: 66 pages, 81 figures, accepted in PLOS ON

    Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations

    Full text link
    Joint-embedding based learning (e.g., SimCLR, MoCo, DINO) and reconstruction-based learning (e.g., BEiT, SimMIM, MAE) are the two leading paradigms for self-supervised learning of vision transformers, but they differ substantially in their transfer performance. Here, we aim to explain these differences by analyzing the impact of these objectives on the structure and transferability of the learned representations. Our analysis reveals that reconstruction-based learning features are significantly dissimilar to joint-embedding based learning features and that models trained with similar objectives learn similar features even across architectures. These differences arise early in the network and are primarily driven by attention and normalization layers. We find that joint-embedding features yield better linear probe transfer for classification because the different objectives drive different distributions of information and invariances in the learned representation. These differences explain opposite trends in transfer performance for downstream tasks that require spatial specificity in features. Finally, we address how fine-tuning changes reconstructive representations to enable better transfer, showing that fine-tuning re-organizes the information to be more similar to pre-trained joint embedding models

    Microscopic theory of nuclear-structure effects in atomic systems

    Get PDF
    In this thesis, nuclear-structure effects in atomic systems are investigated from the microscopic point of view. To this end, a detailed description of nuclear dynamics is incorporated into calculations of the finite-nuclear-size and nuclear-polarization corrections to atomic energy levels and the bound-electron g factor. Hydrogen-like highly charged ions as well as muonic atoms are considered. Nuclear ground-state charge distributions are obtained within the Hartree-Fock method, while complete nuclear excitation spectra are computed by means of the random-phase approximation. The interaction between nucleons is modelled by the effective Skyrme force. The effects of nuclear excitations on atomic properties are described in a field-theoretical framework, where the full Dirac spectrum of a bound electron or muon is taken into account with the help of finite basis-set methods. Special attention is given to analyzing the nuclear model dependence, and the uncertainties of the calculations are estimated. In addition, the suppression of nuclear-structure effects in various weighted differences is discussed. Finally, the developed methods and computational codes are applied to the long-standing problem of the fine-structure anomalies in heavy muonic atoms

    Mechanism of Deep-focus Earthquakes Anomalous Statistics

    Full text link
    Analyzing the NEIC-data we have shown that the spatial deep-focus earthquake distribution in the Earth interior over the 1993-2006 is characterized by the clearly defined periodical fine discrete structure with period L=50 km, which is solely generated by earthquakes with magnitude M 3.9 to 5.3 and only on the convergent boundary of plates. To describe the formation of this structure we used the model of complex systems by A. Volynskii and S. Bazhenov. The key property of this model consists in the presence of a rigid coating on a soft substratum. It is shown that in subduction processes the role of a rigid coating plays the slab substance (lithosphere) and the upper mantle acts as a soft substratum. Within the framework of this model we have obtained the estimation of average values of stress in the upper mantle and Young's modulus for the oceanic slab (lithosphere) and upper mantle.Comment: 9 pages, 7 figure

    A Spatio-Temporal Point Process Model for Ambulance Demand

    Full text link
    Ambulance demand estimation at fine time and location scales is critical for fleet management and dynamic deployment. We are motivated by the problem of estimating the spatial distribution of ambulance demand in Toronto, Canada, as it changes over discrete 2-hour intervals. This large-scale dataset is sparse at the desired temporal resolutions and exhibits location-specific serial dependence, daily and weekly seasonality. We address these challenges by introducing a novel characterization of time-varying Gaussian mixture models. We fix the mixture component distributions across all time periods to overcome data sparsity and accurately describe Toronto's spatial structure, while representing the complex spatio-temporal dynamics through time-varying mixture weights. We constrain the mixture weights to capture weekly seasonality, and apply a conditionally autoregressive prior on the mixture weights of each component to represent location-specific short-term serial dependence and daily seasonality. While estimation may be performed using a fixed number of mixture components, we also extend to estimate the number of components using birth-and-death Markov chain Monte Carlo. The proposed model is shown to give higher statistical predictive accuracy and to reduce the error in predicting EMS operational performance by as much as two-thirds compared to a typical industry practice

    A Big Data Analyzer for Large Trace Logs

    Full text link
    Current generation of Internet-based services are typically hosted on large data centers that take the form of warehouse-size structures housing tens of thousands of servers. Continued availability of a modern data center is the result of a complex orchestration among many internal and external actors including computing hardware, multiple layers of intricate software, networking and storage devices, electrical power and cooling plants. During the course of their operation, many of these components produce large amounts of data in the form of event and error logs that are essential not only for identifying and resolving problems but also for improving data center efficiency and management. Most of these activities would benefit significantly from data analytics techniques to exploit hidden statistical patterns and correlations that may be present in the data. The sheer volume of data to be analyzed makes uncovering these correlations and patterns a challenging task. This paper presents BiDAl, a prototype Java tool for log-data analysis that incorporates several Big Data technologies in order to simplify the task of extracting information from data traces produced by large clusters and server farms. BiDAl provides the user with several analysis languages (SQL, R and Hadoop MapReduce) and storage backends (HDFS and SQLite) that can be freely mixed and matched so that a custom tool for a specific task can be easily constructed. BiDAl has a modular architecture so that it can be extended with other backends and analysis languages in the future. In this paper we present the design of BiDAl and describe our experience using it to analyze publicly-available traces from Google data clusters, with the goal of building a realistic model of a complex data center.Comment: 26 pages, 10 figure

    Emulsion Chamber with Big Radiation Length for Detecting Neutrino Oscillations

    Get PDF
    A conceptual scheme of a hybrid-emulsion spectrometer for investigating various channels of neutrino oscillations is proposed. The design emphasizes detection of τ\tau leptons by detached vertices, reliable identification of electrons, and good spectrometry for all charged particles and photons. A distributed target is formed by layers of low-Z material, emulsion-plastic-emulsion sheets, and air gaps in which τ\tau decays are detected. The tracks of charged secondaries, including electrons, are momentum-analyzed by curvature in magnetic field using hits in successive thin layers of emulsion. The τ\tau leptons are efficiently detected in all major decay channels, including \xedec. Performance of a model spectrometer, that contains 3 tons of nuclear emulsion and 20 tons of passive material, is estimated for different experimental environments. When irradiated by the νμ\nu_\mu beam of a proton accelerator over a medium baseline of ∼1 \sim 1 km/GeV, the spectrometer will efficiently detect either the \omutau and \omue transitions in the mass-difference region of Δm2∼1\Delta m^2 \sim 1 eV2^2, as suggested by the results of LSND. When exposed to the neutrino beam of a muon storage ring over a long baseline of ∼ \sim 10-20 km/GeV, the model detector will efficiently probe the entire pattern of neutrino oscillations in the region Δm2∼10−2−10−3\Delta m^2 \sim 10^{-2}-10^{-3} eV2^2, as suggested by the data on atmospheric neutrinos.Comment: 34 pages, 8 figure
    • …
    corecore