102 research outputs found

    Statistical Analysis of Spherical Data: Clustering, Feature Selection and Applications

    Get PDF
    In the light of interdisciplinary applications, data to be studied and analyzed have witnessed a growth in volume and change in their intrinsic structure and type. In other words, in practice the diversity of resources generating objects have imposed several challenges for decision maker to determine informative data in terms of time, model capability, scalability and knowledge discovery. Thus, it is highly desirable to be able to extract patterns of interest that support the decision of data management. Clustering, among other machine learning approaches, is an important data engineering technique that empowers the automatic discovery of similar object’s clusters and the consequent assignment of new unseen objects to appropriate clusters. In this context, the majority of current research does not completely address the true structure and nature of data for particular application at hand. In contrast to most previous research, our proposed work focuses on the modeling and classification of spherical data that are naturally generated in many data mining and knowledge discovery applications. Thus, in this thesis we propose several estimation and feature selection frameworks based on Langevin distribution which are devoted to spherical patterns in offline and online settings. In this thesis, we first formulate a unified probabilistic framework, where we build probabilistic kernels based on Fisher score and information divergences from finite Langevin mixture for Support Vector Machine. We are motivated by the fact that the blending of generative and discriminative approaches has prevailed by exploring and adopting distinct characteristic of each approach toward constructing a complementary system combining the best of both. Due to the high demand to construct compact and accurate statistical models that are automatically adjustable to dynamic changes, next in this thesis, we propose probabilistic frameworks for high-dimensional spherical data modeling based on finite Langevin mixtures that allow simultaneous clustering and feature selection in offline and online settings. To this end, we adopted finite mixture models which have long been heavily relied on deterministic learning approaches such as maximum likelihood estimation. Despite their successful utilization in wide spectrum of areas, these approaches have several drawbacks as we will discuss in this thesis. An alternative approach is the adoption of Bayesian inference that naturally addresses data uncertainty while ensuring good generalization. To address this issue, we also propose a Bayesian approach for finite Langevin mixture model estimation and selection. When data change dynamically and grow drastically, finite mixture is not always a feasible solution. In contrast with previous approaches, which suppose an unknown finite number of mixture components, we finally propose a nonparametric Bayesian approach which assumes an infinite number of components. We further enhance our model by simultaneously detecting informative features in the process of clustering. Through extensive empirical experiments, we demonstrate the merits of the proposed learning frameworks on diverse high dimensional datasets and challenging real-world applications

    Periodische Hartree-Fock Theorie

    Get PDF
    The Hartree-Fock (HF) approximation is the most commonly used approximation method for many-fermion systems. We study the periodic HF model, which is used in the description of electrons in a crystal, in the space of square integrable functions on a given torus. In particular, the following issues are addressed: the restrictions under which the periodic HF minimizer is unique and the periodic HF energy is equal to the unrestricted HF energy; The generalization of Lieb's variational principle; The improvement of the gap estimate between the N + 1-st and N-th eigenvalues of the effective HF Hamiltonian using a fiber dependent lower bound on the difference; and in the discrete, one-dimensional case it is shown that a gap opens in the spectrum of the fibered HF effective Hamiltonian, i.e., the distance between its consecutive eigenvalues increases in the presence of a weak positive one-dimensional periodic potential.Die Hartree-Fock-Approximation (HF) ist die am häufigsten verwendete Approximationsmethode für Viel-Fermionen-Systeme. Wir untersuchen periodische HF-Modelle, die die Elektronen in einem Kristall beschreiben, unter Berücksichtigung des Raums der quadratintegrierbaren Funktionen auf einem gegebenen Torus. Insbesondere werden die folgenden Themen angesprochen: Zunächst behandeln wir die Einschränkungen, unter denen der periodische HF-Minimierer eindeutig ist und die periodische HFEnergie gleich der uneingeschränkten HF-Energie ist. Zudem stellen wir eine Verallgemeinerung des Liebschen Variationsprinzips vor. Anschlieÿend verbessern wir die Abschätzung der Lücke zwischen dem (N +1)-ten und dem N-ten Eigenwert des effektiven HF-Hamiltonoperators durch eine faserabhängige untere Schranke auf ihrem Differenz. Im diskreten, eindimensionalen Fall wird gezeigt, dass sich eine Lücke im Spektrum einer Faser des effektiven HF-Hamiltonoperators öffnet, d.h., der Abstand zwischen den aufeinanderfolgende Eigenwerte sich erhöht, falls es ein schwaches positives eindimensionales periodisches Potential gibt

    On email spam filtering using support vector machine

    Get PDF
    Electronic mail is a major revolution taking place over traditional communication systems due to its convenient, economical, fast, and easy to use nature. A major bottleneck in electronic communications is the enormous dissemination of unwanted, harmful emails known as "spam emails". A major concern is the developing of suitable filters that can adequately capture those emails and achieve high performance rate. Machine learning (ML) researchers have developed many approaches in order to tackle this problem. Within the context of machine learning, support vector machines (SVM) have made a large contribution to the development of spam email filtering. Based on SVM, different schemes have been proposed through text classification approaches (TC). A crucial problem when using SVM is the choice of kernels as they directly affect the separation of emails in the feature space. We investigate the use of several distance-based kernels to specify spam filtering behaviors using SVM. However, most of used kernels concern continuous data, and neglect the structure of the text. In contrast to classical blind kernels, we propose the use of various string kernels for spam filtering. We show how effectively string kernels suit spam filtering problem. On the other hand, data preprocessing is a vital part of text classification where the objective is to generate feature vectors usable by SVM kernels. We detail a feature mapping variant in TC that yields improved performance for the standard SVM in filtering task. Furthermore, we propose an online active framework for spam filtering. We present empirical results from an extensive study of online, transductive, and online active methods for classifying spam emails in real time. We show that active online method using string kernels achieves higher precision and recall rates

    Estimating Occupancy in Residential Context Using Bayesian Networks for Energy Management

    Get PDF
    International audienceA general approach is proposed to determine occupant behavior (occupancy and activity) in residential buildings and to use these estimates for improved energy management. Occupant behaviour is modelled with a Bayesian Network in an unsupervised manner. This algorithm makes use of domain knowledge gathered via questionnaires and recorded sensor data for motion detection, power, and hot water consumption as well as indoor COâ‚‚ concentration. Two case studies are presented which show the real world applicability of estimating occupant behaviour in this way. Furthermore, experiments integrating occupancy estimation and hot water production control show that energy efficiency can be increased by roughly 5% over known optimal control techniques and more than 25% over rule-based control while maintaining the same occupant comfort standards. The efficiency gains are strongly correlated with occupant behaviour and accuracy of the occupancy estimates

    Uptake of actinides by calcium silicate hydrate (C-S-H) phases

    Get PDF
    The sorption of actinides (Th, U – Am) was studied in dependence of the solid-to-liquid (S/L) ratio (0.5–20.0 g/L) and the calcium-to-silicon (C:S) ratio. The C:S ratio was varied between 1.80 and 0.70 to simulate the changing composition of the C-S-H phases during cement degradation from high to low C:S ratios. The decrease of the calcium content in the C-S-H phases by time is accompanied by a decrease in pH in the corresponding suspensions from 12.6 to 10.2. X-ray photoelectron spectroscopy (XPS) of the C-S-H phases showed an increasing depletion of Ca on the surface with increasing C:S ratio in comparison to the composition of the solid phase as a whole. The sorption experiments were performed with the redox stable species Am(III), Th(IV) and U(VI), as well as the redox sensitive Np(V) and Pu(III). The average distribution coefficients Rd for all investigated actinides are around 105 L/kg. The oxidation state of Pu retained by the C-S-H phases was investigated with high-energy resolution X-ray absorption near-edge structure (HR-XANES) spectroscopy. Samples with C:S ratios of 0.75 and 1.65 showed that the initially added Pu(III) was oxidized to Pu(IV) in the course of the experiment

    Speciation of neptunium during sorption and diffusion in natural clay

    Get PDF
    In argillaceous rocks, which are considered as a potential host rock for nuclear waste repositories, sorption and diffusion processes govern the migration behaviour of actinides like neptunium. For the safety analysis of such a repository, a molecular-level understanding of the transport and retardation phenomena of radioactive contaminants in the host rock is mandatory. The speciation of Np during sorption and diffusion in Opalinus Clay was studied at near neutral pH using a combination of spatially resolved synchrotron radiation techniques. During the sorption and diffusion experiments, the interaction of 8 μM Np(V) solutions with the clay lead to the formation of spots at the clay-water interface with increased Np concentrations as determined by μ-XRF. Several of these spots are correlated with areas of increased Fe concentration. Np L3-edge μ-XANES spectra revealed that up to 85% of the initial Np(V) was reduced to Np(IV). Pyrite could be identified by μ-XRD as a redox-active mineral phase responsible for the formation of Np(IV). The analysis of the diffusion profile within the clay matrix after an in-diffusion experiment for two months showed that Np(V) is progressively reduced with diffusion distance, i.e. Np(IV) amounted to ≈12% and ≈26% at 30 μm and 525 μm, respectively

    Uptake of Pu(IV) by hardened cement paste in the presence of gluconate at high and low ionic strengths

    Get PDF
    The uptake of Pu(IV) by hardened cement paste (HCP) at degradation state I was investigated in the absence and presence of gluconate (GLU). Furthermore, the influence of the ionic strength was examined in different background electrolytes. Artificial cement pore water (ACW, pH = 13) was used for low ionic strength (I = 0.3 M), and cement pore water based on the diluted caprock solution (ACW-VGL, pH = 12.5) was used for high ionic strength (I = 2.5 M). Sorption experiments were performed under an Ar atmosphere using HCP in the HCP/GLU binary system ([GLU]0 = 1 × 10−1–1 × 10−8 M) and the HCP/Pu(IV)/GLU ternary system ([239Pu(IV)]0 = 1 × 10−8 M, [GLU]0 = 1 × 10−2 M) with solid-to-liquid (S/L) ratios of 0.5–50 g L–1 within a contact time of 72 h. GLU sorbs strongly on HCP; a saturation of the sorption sites of HCP with GLU was observed at [GLU] ≥ 1 × 10−4 M at S/L = 5 g L–1. The effects of the order of addition of the components Pu(IV) and GLU on the sorption of Pu(IV) on HCP were investigated. In the absence of GLU, a quantitative uptake (S% ≥ 99%) of Pu(IV) by HCP was observed, independent of the ionic strength of the background electrolytes. In the presence of 1 × 10−2 M GLU, the sorption of Pu(IV) on HCP was significantly lower. For X-ray absorption fine structure (XAFS) measurements, powder samples with Pu ([239Pu(III)]0 = 5 × 10−6 M) sorbed on HCP (S/L = 2.5 g L–1) were prepared at pH ≈ 13 in ACW and ACW-VGL, respectively. One additional sample was prepared in the presence of GLU ([GLU]0 = 1 × 10−2 M) with ACW-VGL as the electrolyte for comparison. Pu LIII-edge X-ray absorption near-edge structure (XANES) spectra show that Pu is in the tetravalent oxidation state after being taken up by the HCP. The structural parameters obtained from extended X-ray absorption fine structure (EXAFS) analysis and comparison with literature indicate incorporation of Pu(IV) into the calcium-silicate-hydrate (C-S-H) phases of HCP. The different ionic strengths and the presence of GLU had no influence on the near-neighbor environment of Pu in HCP
    • …
    corecore