624 research outputs found
Data Cube Approximation and Mining using Probabilistic Modeling
On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data.
Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be
used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches
Generalized multi-stream hidden Markov models.
For complex classification systems, data is usually gathered from multiple sources of information that have varying degree of reliability. In fact, assuming that the different sources have the same relevance in describing all the data might lead to an erroneous behavior. The classification error accumulates and can be more severe for temporal data where each sample is represented by a sequence of observations. Thus, there is compelling evidence that learning algorithms should include a relevance weight for each source of information (stream) as a parameter that needs to be learned. In this dissertation, we assumed that the multi-stream temporal data is generated by independent and synchronous streams. Using this assumption, we develop, implement, and test multi- stream continuous and discrete hidden Markov model (HMM) algorithms. For the discrete case, we propose two new approaches to generalize the baseline discrete HMM. The first one combines unsupervised learning, feature discrimination, standard discrete HMMs and weighted distances to learn the codebook with feature-dependent weights for each symbol. The second approach consists of modifying the HMM structure to include stream relevance weights, generalizing the standard discrete Baum-Welch learning algorithm, and deriving the necessary conditions to optimize all model parameters simultaneously. We also generalize the minimum classification error (MCE) discriminative training algorithm to include stream relevance weights. For the continuous HMM, we introduce a. new approach that integrates the stream relevance weights in the objective function. Our approach is based on the linearization of the probability density function. Two variations are proposed: the mixture and state level variations. As in the discrete case, we generalize the continuous Baum-Welch learning algorithm to accommodate these changes, and we derive the necessary conditions for updating the model parameters. We also generalize the MCE learning algorithm to derive the necessary conditions for the model parameters\u27 update. The proposed discrete and continuous HMM are tested on synthetic data sets. They are also validated on various applications including Australian Sign Language, audio classification, face classification, and more extensively on the problem of landmine detection using ground penetrating radar data. For all applications, we show that considerable improvement can be achieved compared to the baseline HMM and the existing multi-stream HMM algorithms
SAMCLR: Contrastive pre-training on complex scenes using SAM for view sampling
In Computer Vision, self-supervised contrastive learning enforces similar
representations between different views of the same image. The pre-training is
most often performed on image classification datasets, like ImageNet, where
images mainly contain a single class of objects. However, when dealing with
complex scenes with multiple items, it becomes very unlikely for several views
of the same image to represent the same object category. In this setting, we
propose SAMCLR, an add-on to SimCLR which uses SAM to segment the image into
semantic regions, then sample the two views from the same region. Preliminary
results show empirically that when pre-training on Cityscapes and ADE20K, then
evaluating on classification on CIFAR-10, STL10 and ImageNette, SAMCLR performs
at least on par with, and most often significantly outperforms not only SimCLR,
but also DINO and MoCo.Comment: Accepted at NeurIPS 2023 Workshop on SS
Two-dimensional electronic transport in rubrene: the impact of inter-chain coupling
Organic semi-conductors have unique electronic properties and are important
systems both at the fundamental level and also for their applications in
electronic devices. In this article we focus on the particular case of rubrene
which has one of the best electronic transport properties for application
purposes. We show that this system can be well simulated by simple
tight-binding systems representing one-dimensional (1D) chains that are weakly
coupled to their neighboring chains in the same plane. This makes in principle
this rubrene system somehow intermediate between 1D and isotropic 2D models. We
analyse in detail the dc-transport and terahertz conductivity in the 1D and in
the anisotropic 2D models. The transient localisation scenario allows us to
reproduce satisfactorily some basics results such as mobility anisotropy and
orders of magnitude as well as ac-conductivity in the terahertz range. This
model shows in particular that even a weak inter-chain coupling is able to
improve notably the propagation along the chains. This suggest also that a
strong inter-chain coupling is important to get organic semi-conductors with
the best possible transport properties for applicative purposes.Comment: 21 pages, 17 figure
Alianzas y redes de parentesco de gitanos en Cataluña
A partir de un trabajo de tesis centrado en la « experiencia de la diversidad social vivida » de los niños gitanos sedentarios o en migración y de los niños marroquÃes de familias que han inmigrado recientemente a Toulouse, Perpiñán y Barcelona, hemos tratado de identificar los procesos de abandono escolar de dichos niños en los contextos comunitarios en que la escuela no puede asegurarles, por si misma, la transmisión de las competencias culturales y sociales para su autonomÃa adulta y ciudadana. El análisis se centra en las interacciones que ligan estos dos espacios de socialización: la familia y la escuela, y en las competencias de los jóvenes niños provenientes de medios socio-económicos desfavorecidos, para atravesar estos distintos mundos de socialización. Para este artÃculo, presentaremos particularmente dos trayectorias intergeneracionales de familias gitanas catalanas que nos han permitido comprender cómo ciertas familias podÃan preservar la transmisión de competencias prácticas (savoir faire) y de aprendizajes escolares. A partir de estos trazados genealógicos veremos cómo la familia gitana de Perpiñán y la familia gitana barcelonesa, se refuerzan mutuamente aglomerándose en un clan de nuevos contornos, transnacionales, cuando una lectura limitada por la frontera polÃtica entre Francia y España nos sugerirÃa la desagregación, a veces incluso simultánea, de una y otra.Some characteristics of new migrants issued from Moroccan and settled gipsy communities are studied. It is demonstrated how they develop skills to be "here and there" based on a know-how for international travel, hence creating new models of identification relying on experiences of multiple interaction. These new types of migrants are highly mobile and produce micro-societies with singular norms with new adapted social interactions that transform the concerned institutions: school, family and economic processes. The study of genealogical lines indicates us how Barcelona and Perpignan Gypsy families reinforce each other by aggregating themselves into clans with new transnational outlines, whereas by contrast a reading limited by Franco-Spanish political borders suggests the disintegration of one and sometimes of both simultaneously. Those genealogical lines help bring to light material and symbolic spaces, that have led me to locate and analyse a new form of social autonomy
Deep Learning for Mean Field Games with non-separable Hamiltonians
This paper introduces a new method based on Deep Galerkin Methods (DGMs) for
solving high-dimensional stochastic Mean Field Games (MFGs). We achieve this by
using two neural networks to approximate the unknown solutions of the MFG
system and forward-backward conditions. Our method is efficient, even with a
small number of iterations, and is capable of handling up to 300 dimensions
with a single layer, which makes it faster than other approaches. In contrast,
methods based on Generative Adversarial Networks (GANs) cannot solve MFGs with
non-separable Hamiltonians. We demonstrate the effectiveness of our approach by
applying it to a traffic flow problem, which was previously solved using the
Newton iteration method only in the deterministic case. We compare the results
of our method to analytical solutions and previous approaches, showing its
efficiency. We also prove the convergence of our neural network approximation
with a single hidden layer using the universal approximation theorem
- …