Search CORE

328 research outputs found

Data Cube Approximation and Mining using Probabilistic Modeling

Author: Boujenoui Ameur
Goutte Cyril
Missaoui Rokia
Publication venue
Publication date: 01/01/2007
Field of study

On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data. Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches

A Geneaology of Correspondence Analysis: Part 2 - The Variants

Author: Beh Eric J
Lombardo Rosaria
Publication venue: Coordinamento SIBA - Università del Salento
Publication date: 17/10/2019
Field of study

In 2012, a comprehensive historical and genealogical discussion of correspondence analysis was published in Australian and New Zealand Journal of Statistics. That genealogy consisted of more than 270 key books and articles and focused on an historical development of the correspondence analysis,a statistical tool which provides the analyst with a visual inspection of the association between two or more categorical variables. In this new genealogy, we provide a brief overview of over 30 variants of correspondence analysis that now exist outside of the traditional approaches used to analysethe association between two or more categorical variables. It comprises of a bibliography of a more than 300 books and articles that were not included in the 2012 bibliography and highlights the growth in the development ofcorrespondence analysis across all areas of research

ESE - Salento University Publishing

Università del Salento: ESE - Salento University Publishing

A genealogy of correspondence analysis: part 2 - the variants

Author: Lombardo Rosaria
Publication venue
Publication date: 01/01/2019
Field of study

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

The zCOSMOS redshift survey: the three-dimensional classification cube and bimodality in galaxy physical properties

Author: A. Bongiorno
A. Cappi
A. Cimatti
A. Iovino
A. Leauthaud
A. Renzini
B. Garilli
B. Meneux
Bailin
Baldry
Baldwin
Ball
Balogh
Balogh
Bell
Bershady
Bertin
Brinchmann
Brusa
Bruzual
C. Halliday
C. Knobel
C. M. Carollo
C. Maier
C. Marinoni
C. Porciani
C. Scarlata
Capak
Cassata
Cimatti
Cirasuolo
Coleman
Colless
Conselice
D. Bottini
D. Maccagni
D. Vergani
Drory
E. Perez Montero
E. Ricciardelli
E. Zucca
F. Lamareille
Feldmann
Fioc
Forster
Franzetti
Fukugita
G. Coppa
G. Zamorani
Giallongo
H. J. McCracken
Hubble
Humason
Ilbert
J. D. Silverman
J.-F. Le Borgne
J.-P. Kneib
K. Caputi
K. Kovač
Kauffmann
Kennicutt
L. de Ravel
L. Guzzo
L. Pozzetti
L. Tasca
L. Tresse
Lamareille
Le Fèvre
Lilly
Lilly
M. Bolzonella
M. Fumana
M. Mignoli
M. Scodeggio
M. Tanaka
Manning
Mignoli
N. Scoville
O. Cucciati
O. Le Févre
P. Capak
P. Cassata
P. Franzetti
P. Kampczyk
P. Memeo
P. Oesch
R. Pellò
R. Scaramella
S. Bardelli
S. de la Torre
S. J. Lilly
Sargent
Sawicki
Scarlata
Scarlata
Scodeggio
Scoville
Strateva
T. Contini
Taniguchi
U. Abbas
V. Le Brun
V. Mainieri
van den Bergh
Wild
Y. Peng
Yan
Publication venue: 'EDP Sciences'
Publication date: 01/01/2008
Field of study

Aims. We investigate the relationships between three main optical galaxy observables (spectral properties, colours, and morphology), exploiting the data set provided by the COSMOS/zCOSMOS survey. The purpose of this paper is to define a simple galaxy classification cube, using a carefully selected sample of around 1000 galaxies. Methods. Using medium resolution spectra of the first 1k zCOSMOS-bright sample, optical photometry from the Subaru/COSMOS observations, and morphological measurements derived from ACS imaging, we analyze the properties of the galaxy population out to z~1. Applying three straightforward classification schemes (spectral, photometric, and morphological), we identify two main galaxy types, which appear to be linked to the bimodality of galaxy population. The three parametric classifications constitute the axes of a "classification cube". Results. A very good agreement exists between the classification from spectral data (quiescent/star-forming galaxies) and that based on colours (red/blue galaxies). The third parameter (morphology) is less well correlated with the first two: in fact a good correlation between the spectral classification and that based on morphological analysis (early-/late-type galaxies) is achieved only after partially complementing the morphological classification with additional colour information. Finally, analyzing the 3D-distribution of all galaxies in the sample, we find that about 85% of the galaxies show a fully concordant classification, being either quiescent, red, bulge-dominated galaxies (~20%) or star-forming, blue, disk-dominated galaxies (~65%). These results imply that the galaxy bimodality is a consistent behaviour both in morphology, colour and dominant stellar population, at least out to z~1.Comment: 11 pages, Accepted for publication in A&

Archivio istituzionale della ricerca - INRIM

University of Groningen

EDP Sciences OAI-PMH repository (1.2.0)

Archivio istituzionale della ricerca - Università di Padova

arXiv.org e-Print Archive

Crossref

AIR Universita degli studi di Milano

HAL-INSU

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

HAL-OBSPM

The zCOSMOS redshift survey: the three-dimensional classification cube and bimodality in galaxy physical properties

Author
Publication venue: 'EDP Sciences'
Publication date: 20/11/2008
Field of study

open59sìAims. We investigate the relationships between three main optical galaxy observables (spectral properties, colors, and morphology), exploiting the data set provided by the COSMOS/zCOSMOS survey. The purpose of this paper is to define a simple galaxy classification cube, with a carefully selected sample of ≈1000 galaxies. Methods. Using medium resolution spectra of the first zCOSMOS-bright sample, optical photometry from the Subaru/COSMOS observations, and morphological measurements derived from ACS imaging, we analyze the properties of the galaxy population out to z ~ 1. Applying three straightforward classification schemes (spectral, photometric, and morphological), we identify two main galaxy types, which appear to be linked to the bimodality of galaxy population. The three parametric classifications constitute the axes of a “classification cube”. Results. A very good agreement exists between the classification from spectral data (quiescent/star-forming galaxies) and the one based on colors (red/blue galaxies). The third parameter (morphology) is not as well correlated with the first two; in fact, a good correlation between the spectral classification and the classification based on morphological analysis (early-/late-type galaxies) is achieved only after partially complementing the morphological classification with additional color information. Finally, analyzing the 3D-distribution of all galaxies in the sample, we find that about 85% of the galaxies show a fully concordant classification, being either quiescent, red, bulge-dominated galaxies (~20%) or star-forming, blue, disk-dominated galaxies (~65%). These results imply that the galaxy bimodality is a consistent behavior both in morphology, color, and dominant stellar population, at least out to z ~ 1.openMignoli, M.; Zamorani, G.; Scodeggio, M.; Cimatti, A.; Halliday, C.; Lilly, S. J.; Pozzetti, L.; Vergani, D.; Carollo, C. M.; Contini, T.; Le Févre, O.; Mainieri, V.; Renzini, A.; Bardelli, S.; Bolzonella, M.; Bongiorno, A.; Caputi, K.; Coppa, G.; Cucciati, O.; de la Torre, S.; de Ravel, L.; Franzetti, P.; Garilli, B.; Iovino, A.; Kampczyk, P.; Kneib, J.-P.; Knobel, C.; Kovač, K.; Lamareille, F.; Le Borgne, J.-F.; Le Brun, V.; Maier, C.; Pellò, R.; Peng, Y.; Perez Montero, E.; Ricciardelli, E.; Scarlata, C.; Silverman, J. D.; Tanaka, M.; Tasca, L.; Tresse, L.; Zucca, E.; Abbas, U.; Bottini, D.; Capak, P.; Cappi, A.; Cassata, P.; Fumana, M.; Guzzo, L.; Leauthaud, A.; Maccagni, D.; Marinoni, C.; McCracken, H. J.; Memeo, P.; Meneux, B.; Oesch, P.; Porciani, C.; Scaramella, R.; Scoville, N.Mignoli, M.; Zamorani, G.; Scodeggio, M.; Cimatti, A.; Halliday, C.; Lilly, S. J.; Pozzetti, L.; Vergani, D.; Carollo, C. M.; Contini, T.; Le Févre, O.; Mainieri, V.; Renzini, A.; Bardelli, S.; Bolzonella, M.; Bongiorno, A.; Caputi, K.; Coppa, G.; Cucciati, O.; de la Torre, S.; de Ravel, L.; Franzetti, P.; Garilli, B.; Iovino, A.; Kampczyk, P.; Kneib, J. -P.; Knobel, C.; Kovač, K.; Lamareille, F.; Le Borgne, J. -F.; Le Brun, V.; Maier, C.; Pellò, R.; Peng, Y.; Perez Montero, E.; Ricciardelli, E.; Scarlata, C.; Silverman, J. D.; Tanaka, M.; Tasca, L.; Tresse, L.; Zucca, E.; Abbas, U.; Bottini, D.; Capak, P.; Cappi, A.; Cassata, P.; Fumana, M.; Guzzo, L.; Leauthaud, A.; Maccagni, D.; Marinoni, C.; Mccracken, H. J.; Memeo, P.; Meneux, B.; Oesch, P.; Porciani, C.; Scaramella, R.; Scoville, N

Archivio istituzionale della ricerca - INRIM

Glosarium Matematika

Author: Kerami Djati
Sitanggang Cormentyna
Publication venue: 'Pusat Bahasa IAIN Sultan Amai Gorontalo'
Publication date: 01/01/2008
Field of study

273 p.; 24 cm

library.uny.ac.id

Repositori Institusi Kemendikbud

Glosarium Matematika

Author: Iswati Ellya
Kerami Djati
Publication venue: Pusat Pembinaan dan Pengembangan Bahasa
Publication date: 01/01/1993
Field of study

Repositori Institusi Kemendikbud

Explanation of Exceptional Values in Multi-dimensional Business Databases

Author: Caron E.A.M. (Emiel)
Publication venue
Publication date: 14/11/2013
Field of study

“How can the functionality of multi-dimensional business databases be extended with diagnostic capabilities to support managerial decision-making?” This question states the main research problem addressed in this thesis. Before giving an answer, the question first requires clarification and delineation. In this chapter, the research question is placed briefly into context, both regarding academic and business relevance. This leads to the formulation of three specific research questions. Subsequently, a section is dedicated to each specific research question. An outline of this thesis concludes the chapter

EUR Research Repository

Erasmus University Digital Repository

Contributions to the multivariate Analysis of Marine Environmental Monitoring

Author: Graffelman Jan
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2000
Field of study

The thesis parts from the view that statistics starts with data, and starts by introducing the data sets studied: marine benthic species counts and chemical measurements made at a set of sites in the Norwegian Ekofisk oil field, with replicates and annually repeated. An introductory chapter details the sampling procedure and shows with reliability calculations that the (transformed) chemical variables have excellent reliability, whereas the biological variables have poor reliability, except for a small subset of abundant species. Transformed chemical variables are shown to be approximately normal. Bootstrap methods are used to assess whether the biological variables follow a Poisson distribution, and lead to the conclusion that the Poisson distribution must be rejected, except for rare species. A separate chapter details more work on the distribution of the species variables: truncated and zero-inflated Poisson distributions as well as Poisson mixtures are used in order to account for sparseness and overdispersion. Species are thought to respond to environmental variables, and regressions of the abundance of a few selected species onto chemical variables are reported. For rare species, logistic regression and Poisson regression are the tools considered, though there are problems of overdispersion. For abundant species, random coefficient models are needed in order to cope with intraclass correlation. The environmental variables, mainly heavy metals, are highly correlated, leading to multicollinearity problems. The next chapters use a multivariate approach, where all species data is now treated simultaneously. The theory of correspondence analysis is reviewed, and some theoretical results on this method are reported (bounds for singular values, centring matrices). An applied chapter discusses the correspondence analysis of the species data in detail, detects outliers, addresses stability issues, and considers different ways of stacking data matrices to obtain an integrated analysis of several years of data, and to decompose variation into a within-sites and between-sites component. More than 40 % of the total inertia is due to variation within stations. Principal components analysis is used to analyse the set of chemical variables. Attempts are made to integrate the analysis of the biological and chemical variables. A detailed theoretical development shows how continuous variables can be mapped in an optimal manner as supplementary vectors into a correspondence analysis biplot. Geometrical properties are worked out in detail, and measures for the quality of the display are given, whereas artificial data and data from the monitoring survey are used to illustrate the theory developed. The theory of display of supplementary variables in biplots is also worked out in detail for principal component analysis, with attention for the different types of scaling, and optimality of displayed correlations. A theoretical chapter follows that gives an in depth theoretical treatment of canonical correspondence analysis, (linearly constrained correspondence analysis, CCA for short) detailing many mathematical properties and aspects of this multivariate method, such as geometrical properties, biplots, use of generalized inverses, relationships with other methods, etc. Some applications of CCA to the survey data are dealt with in a separate chapter, with their interpretation and indication of the quality of the display of the different matrices involved in the analysis. Weighted principal component analysis of weighted averages is proposed as an alternative for CCA. This leads to a better display of the weighted averages of the species, and in the cases so far studied, also leads to biplots with a higher amount of explained variance for the environmental data. The thesis closes with a bibliography and outlines some suggestions for further research, such as a the generalization of canonical correlation analysis for working with singular covariance matrices, the use partial least squares methods to account for the excess of predictors, and data fusion problems to estimate missing biological data.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Secretaría de Estado de Cultura