Search CORE

439 research outputs found

Quand l'informatique observe les réseaux

Author: Lelu Alain
Publication venue: CNRS Editions
Publication date: 31/07/2012
Field of study

National audienceEtat de l'art en deux parties : 1) Quelles disciplines sont impliquées dans l'observation et l'exploitation à grande échelle des traces de réseaux (réseaux sociaux, de citations, de liens entre pages Web, ...) ; 2) Ce qui se prépare dans les laboratoires de recherche

HAL - Université de Franche-Comté

INRIA a CCSD electronic archive server

Slimming down a high-dimensional binary datatable: relevant eigen-subspace and substantial content

Author: Lelu Alain
Publication venue: Physica-Verlag
Publication date: 22/08/2010
Field of study

ISBN : 978-3-7908-2603-6International audienceDetermining the number of relevant dimensions in the eigen-space of a data matrix is a central issue in many data-mining applications. We tackle here the sub-problem of finding the ''right'' dimensionality of a type of data matrices often encountered in the domains of text or usage mining: large, sparse, high-dimensional binary datatables. We present here the application of a randomization test to this problem. We validate our approach first on artificial datasets, then on a real documentary data collection, i.e. 1900 documents described in a 3600 keywords dataspace, where the actual, intrinsic dimension appears to be 28 times less than the number of keywords - an important information when preparing to cluster or discriminate such data. We also present preliminary results on the problem of clearing the datatable from non-essential information bits

HAL - Université de Franche-Comté

INRIA a CCSD electronic archive server

Visualiser les textes et les mots : approches numériques, approches par les graphes

Author: Lelu Alain
Publication venue: CEPADUES, Toulouse
Publication date: 01/12/2008
Field of study

Etats de l'art : - 1) passage d'une collection de textes à sa représentation vectorielle, - 2) techniques de visualisation d'une collection de vecteurs : par l'algèbre linéaire, par les graphes, et techniques hybrides

HAL - Université de Franche-Comté

INRIA a CCSD electronic archive server

Relevant Eigen-Subspace of a Graph: A Randomization Test.

Author: Lelu Alain
Publication venue: HAL CCSD
Publication date: 16/05/2011
Field of study

12 pagesInternational audienceDetermining the number of relevant dimensions in the eigen-space of a graph Laplacian matrix is a central issue in many spectral graph-mining applications. We tackle here the sub-problem of finding the "right" dimensionality of Laplacian matrices, especially those often encountered in the domains of social or biological graphs: the ones underlying large, sparse, unoriented and unweighted graphs with a power-law degree distribution. We present here the application of a randomization test to this problem. We validate our approach first on an artificial sparse and powerlaw type graph, with two intermingled clusters, then on two real-world social graphs ("Football-league", "Mexican Politician Network"), where the actual, intrinsic dimensions appear to be 11 and 2 respectively ; we illustrate the optimality of the transformed dataspaces both visually, and numerically by means of a decision tree

HAL - Université de Franche-Comté

INRIA a CCSD electronic archive server

Jean-Baptiste Estoup and the origins of Zipf's law: a stenographer with a scientific mind (1868-1950)

Author: Lelu Alain
Publication venue: HAL CCSD
Publication date: 03/02/2014
Field of study

International audienceStatistical distributions with a power law have been observed for over a century in many domains of social sciences, as well as in natural and life sciences. They are of utmost importance for those building models applicable to human activities (e.g. the "long tail" phenomena). We present here the life and accomplishments of J-B. Estoup, who was the first to notice this type of distribution in the language domain, and inspired the subsequent formulations by G.K. Zipf and B. Mandelbrot. This study, first presented at the seminar on the history of probabilities and statistics held at Ecole des Hautes Etudes en Sciences Sociales on December the 7th, 2007 in Paris, is also a family testimony, the author being the grandson of J-B. Estoup

HAL - Université de Franche-Comté

INRIA a CCSD electronic archive server

Representing interaction in multiway contingency tables: MIDOVA, CA and log-linear model

Author: Cadot Martine
Lelu Alain
Publication venue: HAL CCSD
Publication date: 08/02/2011
Field of study

International audienceBeside CA and log-linear model, issued from the statistics domain, other research streams originating in Artificial Intelligence have coped with the interacting variables problem: we will present here the extension to categorical variables of our results on extracting and statistically validating " itemsets " in boolean datatables. We coined MIDOVA (Multidimensional Interaction Differential of Variation) our method for highlighting and representing complex links between qualitative variables, which includes interaction, well-suited to socio-economic data. We will compare it to the CA and log-linear model approaches, using the same 3-way example as Escofier and her colleagues. We will show that out method is effective for general N-way interactions (N may be far greater than 3), whether symmetrically or not, and results both in easy and detailed interpretability, as CA does, and in statistical significance testing, as the log-linear model does in the case of few variables

HAL - Université de Franche-Comté

INRIA a CCSD electronic archive server

Assessing livelihood and ecological benefits from restoration initiatives in the Philippines.

Author: Gata Larissa Lelu P.
Publication venue: 'Alliance of Bioversity International and CIAT'
Publication date: 16/03/2020
Field of study

CGSpace

Prostitute Praising Represented by Male Novelists in Post-1998 Religious Society

Author: Apristia Lelu Dina
Publication venue: 'Universitas Gadjah Mada'
Publication date: 31/10/2022
Field of study

Prostitute praising is represented by Remy Sylado in novel titled Ca-Bau-Kan: Hanya Sebuah Dosa (1999) and Arswendo Atmowiloto in novel titled Dewi Kawi (2008). Prostitute praising in the novels written by males in religious society in the midst of discourse about freedom of expression flowing in post-1998 era in Indonesia becomes problem of this research. Regarding the problem, this research aims to identify: (1) how prostitute praising is represented by males in their novel, (2) why male novelists produce such representations by applying Stuart Hall’s representation theory in relation to production of meaning through language and production of knowledge through discourse. The theory application reveals that male novelists represent prostitute praising in private and public domain which are mixed up and that there is relation between male and female in the domains siding with male as constructed by post-1998 discursive formation involving the state and religions to uphold masculine domination

Jurnal POETIKA

A Proposition for Fixing the Dimensionality of a Laplacian Low-rank Approximation of any Binary Data-matrix

Author: Cadot Martine
Lelu Alain
Publication venue: IARIA
Publication date: 24/02/2013
Field of study

International audienceLaplacian low-rank approximations are much appreciated in the context of graph spectral methods and Correspondence Analysis. We address here the problem of determining the dimensionality K* of the relevant eigenspace of a general binary datatable by a statistically well-founded method. We propose 1) a general framework for graph adjacency matrices and any rectangular binary matrix, 2) a randomization test for fixing K*. We illustrate with both artificial and real data

HAL - Université de Franche-Comté

INRIA a CCSD electronic archive server

Espaces intrinsèques des relations entre mots : une exploration multi-échelle.

Author: Lelu Alain
Roussanaly Azim
Publication venue: INALCO
Publication date: 03/06/2014
Field of study

International audienceDéterminer les liens de co-occurrence entre les mots d'un ensemble de textes nécessite le choix d'un empan, c'est à dire d'un découpage en individus statistiques de plus ou moins grande taille : depuis le simple N-gramme (empan glissant de N mots) jusqu'au texte complet, en passant par le virgulot, la phrase, le paragraphe, etc. Ces liens peuvent donner lieu à diverses catégorisations des mots, selon la "focale" utilisée. Notre étude porte sur un corpus d'articles de presse (3 mois de controverses sur les OGM et les perturbateurs endocriniens) auquel nous appliquons 1) notre procédure Morph d'étiquetage morpho-syntactique, de façon à désambiguer, étiqueter et lemmatiser au mieux la séquence des formes présentes, 2) notre test de validation des liens, par randomisations multiples de la matrice de présence des lemmes étiquetés dans les unités textuelles du niveau choisi, 3) notre procédure de détermination de la dimension intrinsèque de cette matrice, dont découle une estimation du nombre de clusters pertinents pour chaque niveau de granularité de l'analyse. Nos résultats montrent que les niveaux les plus grands détectent les "histoires" dont il est question dans le corpus, ceux de grain intermédiaire détectent en premier lieu les styles, puis les collocations, de degré de figement plus ou moins important. Cette approche 1) généralise celle de l'étiquetage non-supervisé de Schütze et al. (1995), basée sur les N-grammes de mots, 2) détermine l'espace de représentation optimal des mots et des unités de texte choisies, i.e. celui des K* premiers facteurs non-triviaux d'analyse factorielle des correspondances de la matrice (binaire, jusqu'ici), où K* est déterminé par un test de randomisation, adapté à n'importe quelle répartition des effectifs en lignes et en colonnes

HAL - Université de Franche-Comté

INRIA a CCSD electronic archive server