624 research outputs found

    Data Cube Approximation and Mining using Probabilistic Modeling

    Get PDF
    On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data. Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches

    Generalized multi-stream hidden Markov models.

    Get PDF
    For complex classification systems, data is usually gathered from multiple sources of information that have varying degree of reliability. In fact, assuming that the different sources have the same relevance in describing all the data might lead to an erroneous behavior. The classification error accumulates and can be more severe for temporal data where each sample is represented by a sequence of observations. Thus, there is compelling evidence that learning algorithms should include a relevance weight for each source of information (stream) as a parameter that needs to be learned. In this dissertation, we assumed that the multi-stream temporal data is generated by independent and synchronous streams. Using this assumption, we develop, implement, and test multi- stream continuous and discrete hidden Markov model (HMM) algorithms. For the discrete case, we propose two new approaches to generalize the baseline discrete HMM. The first one combines unsupervised learning, feature discrimination, standard discrete HMMs and weighted distances to learn the codebook with feature-dependent weights for each symbol. The second approach consists of modifying the HMM structure to include stream relevance weights, generalizing the standard discrete Baum-Welch learning algorithm, and deriving the necessary conditions to optimize all model parameters simultaneously. We also generalize the minimum classification error (MCE) discriminative training algorithm to include stream relevance weights. For the continuous HMM, we introduce a. new approach that integrates the stream relevance weights in the objective function. Our approach is based on the linearization of the probability density function. Two variations are proposed: the mixture and state level variations. As in the discrete case, we generalize the continuous Baum-Welch learning algorithm to accommodate these changes, and we derive the necessary conditions for updating the model parameters. We also generalize the MCE learning algorithm to derive the necessary conditions for the model parameters\u27 update. The proposed discrete and continuous HMM are tested on synthetic data sets. They are also validated on various applications including Australian Sign Language, audio classification, face classification, and more extensively on the problem of landmine detection using ground penetrating radar data. For all applications, we show that considerable improvement can be achieved compared to the baseline HMM and the existing multi-stream HMM algorithms

    SAMCLR: Contrastive pre-training on complex scenes using SAM for view sampling

    Full text link
    In Computer Vision, self-supervised contrastive learning enforces similar representations between different views of the same image. The pre-training is most often performed on image classification datasets, like ImageNet, where images mainly contain a single class of objects. However, when dealing with complex scenes with multiple items, it becomes very unlikely for several views of the same image to represent the same object category. In this setting, we propose SAMCLR, an add-on to SimCLR which uses SAM to segment the image into semantic regions, then sample the two views from the same region. Preliminary results show empirically that when pre-training on Cityscapes and ADE20K, then evaluating on classification on CIFAR-10, STL10 and ImageNette, SAMCLR performs at least on par with, and most often significantly outperforms not only SimCLR, but also DINO and MoCo.Comment: Accepted at NeurIPS 2023 Workshop on SS

    Two-dimensional electronic transport in rubrene: the impact of inter-chain coupling

    Full text link
    Organic semi-conductors have unique electronic properties and are important systems both at the fundamental level and also for their applications in electronic devices. In this article we focus on the particular case of rubrene which has one of the best electronic transport properties for application purposes. We show that this system can be well simulated by simple tight-binding systems representing one-dimensional (1D) chains that are weakly coupled to their neighboring chains in the same plane. This makes in principle this rubrene system somehow intermediate between 1D and isotropic 2D models. We analyse in detail the dc-transport and terahertz conductivity in the 1D and in the anisotropic 2D models. The transient localisation scenario allows us to reproduce satisfactorily some basics results such as mobility anisotropy and orders of magnitude as well as ac-conductivity in the terahertz range. This model shows in particular that even a weak inter-chain coupling is able to improve notably the propagation along the chains. This suggest also that a strong inter-chain coupling is important to get organic semi-conductors with the best possible transport properties for applicative purposes.Comment: 21 pages, 17 figure

    Alianzas y redes de parentesco de gitanos en Cataluña

    Get PDF
    A partir de un trabajo de tesis centrado en la « experiencia de la diversidad social vivida » de los niños gitanos sedentarios o en migración y de los niños marroquíes de familias que han inmigrado recientemente a Toulouse, Perpiñán y Barcelona, hemos tratado de identificar los procesos de abandono escolar de dichos niños en los contextos comunitarios en que la escuela no puede asegurarles, por si misma, la transmisión de las competencias culturales y sociales para su autonomía adulta y ciudadana. El análisis se centra en las interacciones que ligan estos dos espacios de socialización: la familia y la escuela, y en las competencias de los jóvenes niños provenientes de medios socio-económicos desfavorecidos, para atravesar estos distintos mundos de socialización. Para este artículo, presentaremos particularmente dos trayectorias intergeneracionales de familias gitanas catalanas que nos han permitido comprender cómo ciertas familias podían preservar la transmisión de competencias prácticas (savoir faire) y de aprendizajes escolares. A partir de estos trazados genealógicos veremos cómo la familia gitana de Perpiñán y la familia gitana barcelonesa, se refuerzan mutuamente aglomerándose en un clan de nuevos contornos, transnacionales, cuando una lectura limitada por la frontera política entre Francia y España nos sugeriría la desagregación, a veces incluso simultánea, de una y otra.Some characteristics of new migrants issued from Moroccan and settled gipsy communities are studied. It is demonstrated how they develop skills to be "here and there" based on a know-how for international travel, hence creating new models of identification relying on experiences of multiple interaction. These new types of migrants are highly mobile and produce micro-societies with singular norms with new adapted social interactions that transform the concerned institutions: school, family and economic processes. The study of genealogical lines indicates us how Barcelona and Perpignan Gypsy families reinforce each other by aggregating themselves into clans with new transnational outlines, whereas by contrast a reading limited by Franco-Spanish political borders suggests the disintegration of one and sometimes of both simultaneously. Those genealogical lines help bring to light material and symbolic spaces, that have led me to locate and analyse a new form of social autonomy

    Deep Learning for Mean Field Games with non-separable Hamiltonians

    Full text link
    This paper introduces a new method based on Deep Galerkin Methods (DGMs) for solving high-dimensional stochastic Mean Field Games (MFGs). We achieve this by using two neural networks to approximate the unknown solutions of the MFG system and forward-backward conditions. Our method is efficient, even with a small number of iterations, and is capable of handling up to 300 dimensions with a single layer, which makes it faster than other approaches. In contrast, methods based on Generative Adversarial Networks (GANs) cannot solve MFGs with non-separable Hamiltonians. We demonstrate the effectiveness of our approach by applying it to a traffic flow problem, which was previously solved using the Newton iteration method only in the deterministic case. We compare the results of our method to analytical solutions and previous approaches, showing its efficiency. We also prove the convergence of our neural network approximation with a single hidden layer using the universal approximation theorem
    • …
    corecore