47 research outputs found

    How to improve robustness in Kohonen maps and display additional information in Factorial Analysis: application to text mining

    Full text link
    This article is an extended version of a paper presented in the WSOM'2012 conference [1]. We display a combination of factorial projections, SOM algorithm and graph techniques applied to a text mining problem. The corpus contains 8 medieval manuscripts which were used to teach arithmetic techniques to merchants. Among the techniques for Data Analysis, those used for Lexicometry (such as Factorial Analysis) highlight the discrepancies between manuscripts. The reason for this is that they focus on the deviation from the independence between words and manuscripts. Still, we also want to discover and characterize the common vocabulary among the whole corpus. Using the properties of stochastic Kohonen maps, which define neighborhood between inputs in a non-deterministic way, we highlight the words which seem to play a special role in the vocabulary. We call them fickle and use them to improve both Kohonen map robustness and significance of FCA visualization. Finally we use graph algorithmic to exploit this fickleness for classification of words

    El papel socioeconómico de las universidades parisinas medievales a través de la base de datos Studium Parisiense

    Get PDF
    Studium Parisiense is a database which intends to identify all the students and masters of Paris university. With nearly 20000 files, it may be half-way. We have tested this results in exploring the impact of the college system in medieval Paris. A chronological trend appears: the development of the college system in the 14th century is a more efficient solution to accommodate the growing academic population than the creation of the Augustinian canons houses (12th century), and of the mendicant convents (13th century). On the other hand, both in terms of international recruitment and of literary outputs, Paris colleges were inferior institutions, with the exception of the Sorbonne. However, it helped to provide better conditions of study and to discipline the student’s population on the left banks of the Seine, and by the end of the fifteenth century, Paris colleges had increased their reputation and attracted again European students.college; university; Paris; mendicant convents; studentsStudium Parisiense es una base de datos cuyo objetivo es el de identificar a todos los estudiantes y maestros de la Universidad de París. Aún por concluir, cuenta hoy con 20000 fichas. En base a estos datos, se ha intentado medir el impacto del sistema de colegios mayores en el París medieval. El resultado logrado apunta a una tendencia cronológica: el desarrollo del sistema de colegios mayores en el siglo XIV resulta ser una solución más eficaz para acoger a la creciente población académica que la creación de casas de canónigos agustinos (siglo XII) y la de conventos mendicantes (siglo XIII). Por otro lado, tanto en términos de reclutamiento internacional como de producción literaria, salvo la excepción de la Sorbona, los colegios mayores parisinos no dejaban de ser instituciones de segundo nivel. Sin embargo, estas instituciones, situadas en la ribera izquierda del Sena, proporcionaron mejores condiciones de estudio a la población estudiantil y ayudaron a disciplinarla. A finales del siglo XV, los colegios mayores parisinos lograron mejorar su reputación y atraer de nuevo estudiantes europeos

    The random subgraph model for the analysis of an ecclesiastical network in Merovingian Gaul

    Get PDF
    In the last two decades many random graph models have been proposed to extract knowledge from networks. Most of them look for communities or, more generally, clusters of vertices with homogeneous connection profiles. While the first models focused on networks with binary edges only, extensions now allow to deal with valued networks. Recently, new models were also introduced in order to characterize connection patterns in networks through mixed memberships. This work was motivated by the need of analyzing a historical network where a partition of the vertices is given and where edges are typed. A known partition is seen as a decomposition of a network into subgraphs that we propose to model using a stochastic model with unknown latent clusters. Each subgraph has its own mixing vector and sees its vertices associated to the clusters. The vertices then connect with a probability depending on the subgraphs only, while the types of edges are assumed to be sampled from the latent clusters. A variational Bayes expectation-maximization algorithm is proposed for inference as well as a model selection criterion for the estimation of the cluster number. Experiments are carried out on simulated data to assess the approach. The proposed methodology is then applied to an ecclesiastical network in Merovingian Gaul. An R code, called Rambo, implementing the inference algorithm is available from the authors upon request.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS691 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Lexical Recount between Factor Analysis and Kohonen Map: Mathematical Vocabulary of Arithmetic in the Vernacular Language of the Late Middle Ages

    No full text
    International audienceIn this paper we present a combination of factorial projections and of SOM algorithm applied to a text mining problem. The corpus consists of 8 medieval texts which were used to teach arithmetic techniques to merchants. Classical Factorial Component Analysis (FCA) gives nice representations of the selected words in association with the texts, but the quality of the representation is poor in the center of the graphs and it is not easy to look for the successive projections to conclude. So using the nice properties of Kohonen maps, we can highlight the words which seems to play a special role in the vocabulary since they are associated with very different words from a map to another. Finally we show that combination of both representations is a powerful help to text analysis

    Nouvelles frontières de l’historien

    Get PDF
    L’impact de l’environnement digital sur les pratiques historiennes est généralement réduit à une transformation des conditions de diffusion des produits de l’activité historienne. Nous montrons que le développement de nouvelles techniques de traitement des données a un impact sur la recherche historique qui a une certaine spécificité. Les données historiques sont rarement originellement numériques. La production de données adaptées à l’activité historienne suppose la mise en place de plates-formes complexes dont l’élaboration suppose une collaboration avec des physiciens et des informaticiens. Les données produites sont souvent incomplètes et inégalement documentées, ce qui suppose un paramétrage fin des outils statistiques utilisés, ce qui implique des échanges avec des mathématiciens. Nous en concluons que cette configuration contribue à redessiner la carte des relations professionnelles des historiens.The impact of digital technology on the research practices of historians is usually seen from the publishing end of their profession. We intend here to show that the development of new data processing methods can and does impact historical practices in specific ways. Digital historical data is rare and its production requires the creation of complex platforms with the collaboration of physicist and computer experts. The datas collected are often incomplete, and only partly documented requiring and the adjustment of statistical techniques, therefore a collaboration with statisticians. We argue that this configuration draws a new map of disciplinary alliances for historians.

    Articuler histoire et informatique, enseignement et recherche : le PIREH de l’université Panthéon-Sorbonne

    Get PDF
    La création et l’activité du Pôle informatique de recherche et d’enseignement en histoire (PIREH) de l’université Paris 1 Panthéon-Sorbonne s’inscrivent dans une tradition d’utilisation des outils informatiques et statistiques en histoire depuis la création de l’université en 1971. Cet article revient sur ses acteurs, leurs enseignements, leurs productions (revues, logiciels) dans les années 1970-1980, en les replaçant dans le contexte intellectuel et technique de l’époque. La création du PIREH en 1999 permet de structurer cette activité et ces formations (autour notamment des bases de données, de la lexicométrie et de l’analyse factorielle des correspondances). Depuis la fin des années 1990, ces pratiques et les enseignements qui y sont liés se diversifient, à la faveur du développement du Web, d’un meilleur accès à des outils variés et de collaborations interdisciplinaires auxquelles participe le PIREH. L’esquisse de cette histoire nous permet in fine de définir ce qui caractérise l’approche de l’informatique dans le travail pédagogique et scientifique du PIREH.The Center for Computing in History (Pôle informatique de recherche et d’enseignement en histoire, PIREH) of Paris 1 Panthéon-Sorbonne University is rooted in a tradition of using statistics and computer sciences in History since the creation of this university in 1971. This paper first presents the people involved in this field at Paris 1 in the 1970-1980s, their teaching, and their work (journals, software), by replacing them in the intellectual and technical context of the time. The creation of the PIREH in 1999 reinforced this movement and structured a curriculum based on the use of databases, text analysis and factorial analysis in History. Since the late 1990s, the activity of the PIREH is diversifying, in response to the development of the Web, to the availability of new and more accessible computing tools, and to several interdisciplinary collaborations. By retracing this history, one can better grasp the specific use the PIREH makes of computing for teaching and researching History

    Les cautelae : un corpus de problèmes mathématiques entre collection, série et culture mathématique

    No full text
    International audienc
    corecore