    SOM-based algorithms for qualitative variables

    It is well known that the SOM algorithm achieves a clustering of data which can be interpreted as an extension of Principal Component Analysis, because of its topology-preserving property. But the SOM algorithm can only process real-valued data. In previous papers, we have proposed several methods based on the SOM algorithm to analyze categorical data, which is the case in survey data. In this paper, we present these methods in a unified manner. The first one (Kohonen Multiple Correspondence Analysis, KMCA) deals only with the modalities, while the two others (Kohonen Multiple Correspondence Analysis with individuals, KMCA\_ind, Kohonen algorithm on DISJonctive table, KDISJ) can take into account the individuals, and the modalities simultaneously.Comment: Special Issue apr\`{e}s WSOM 03 \`{a} Kitakiush

    How to use the Kohonen algorithm to simultaneously analyse individuals in a survey

    The Kohonen algorithm (SOM, Kohonen,1984, 1995) is a very powerful tool for data analysis. It was originally designed to model organized connections between some biological neural networks. It was also immediately considered as a very good algorithm to realize vectorial quantization, and at the same time pertinent classification, with nice properties for visualization. If the individuals are described by quantitative variables (ratios, frequencies, measurements, amounts, etc.), the straightforward application of the original algorithm leads to build code vectors and to associate to each of them the class of all the individuals which are more similar to this code-vector than to the others. But, in case of individuals described by categorical (qualitative) variables having a finite number of modalities (like in a survey), it is necessary to define a specific algorithm. In this paper, we present a new algorithm inspired by the SOM algorithm, which provides a simultaneous classification of the individuals and of their modalities.Comment: Special issue ESANN 0

    How to improve robustness in Kohonen maps and display additional information in Factorial Analysis: application to text mining

    This article is an extended version of a paper presented in the WSOM'2012 conference [1]. We display a combination of factorial projections, SOM algorithm and graph techniques applied to a text mining problem. The corpus contains 8 medieval manuscripts which were used to teach arithmetic techniques to merchants. Among the techniques for Data Analysis, those used for Lexicometry (such as Factorial Analysis) highlight the discrepancies between manuscripts. The reason for this is that they focus on the deviation from the independence between words and manuscripts. Still, we also want to discover and characterize the common vocabulary among the whole corpus. Using the properties of stochastic Kohonen maps, which define neighborhood between inputs in a non-deterministic way, we highlight the words which seem to play a special role in the vocabulary. We call them fickle and use them to improve both Kohonen map robustness and significance of FCA visualization. Finally we use graph algorithmic to exploit this fickleness for classification of words

    Analysis of Professional Trajectories using Disconnected Self-Organizing Maps

    In this paper we address an important economic question. Is there, as mainstream economic theory asserts it, an homogeneous labor market with mechanisms which govern supply and demand for work, producing an equilibrium with its remarkable properties? Using the Panel Study of Income Dynamics (PSID) collected on the period 1984-2003, we study the situations of American workers with respect to employment. The data include all heads of household (men or women) as well as the partners who are on the labor market, working or not. They are extracted from the complete survey and we compute a few relevant features which characterize the worker's situations. To perform this analysis, we suggest using a Self-Organizing Map (SOM, Kohonen algorithm) with specific structure based on planar graphs, with disconnected components (called D-SOM), especially interesting for clustering. We compare the results to those obtained with a classical SOM grid and a star-shaped map (called SOS). Each component of D-SOM takes the form of a string and corresponds to an organized cluster. From this clustering, we study the trajectories of the individuals among the classes by using the transition probability matrices for each period and the corresponding stationary distributions. As a matter of fact, we find clear evidence of heterogeneous parts, each one with high homo-geneity, representing situations well identified in terms of activity and wage levels and in degree of stability in the workplace. These results and their interpretation in economic terms contribute to the debate about flexibility which is commonly seen as a way to obtain a better level of equilibrium on the labor market

    Recent Trends in Deep Learning Based Personality Detection

    Recently, the automatic prediction of personality traits has received a lot of attention. Specifically, personality trait prediction from multimodal data has emerged as a hot topic within the field of affective computing. In this paper, we review significant machine learning models which have been employed for personality detection, with an emphasis on deep learning-based methods. This review paper provides an overview of the most popular approaches to automated personality detection, various computational datasets, its industrial applications, and state-of-the-art machine learning models for personality detection with specific focus on multimodal approaches. Personality detection is a very broad and diverse topic: this survey only focuses on computational approaches and leaves out psychological studies on personality detection

    A robust framework for medical image segmentation through adaptable class-specific representation

    Medical image segmentation is an increasingly important component in virtual pathology, diagnostic imaging and computer-assisted surgery. Better hardware for image acquisition and a variety of advanced visualisation methods have paved the way for the development of computer based tools for medical image analysis and interpretation. The routine use of medical imaging scans of multiple modalities has been growing over the last decades and data sets such as the Visible Human Project have introduced a new modality in the form of colour cryo section data. These developments have given rise to an increasing need for better automatic and semiautomatic segmentation methods. The work presented in this thesis concerns the development of a new framework for robust semi-automatic segmentation of medical imaging data of multiple modalities. Following the specification of a set of conceptual and technical requirements, the framework known as ACSR (Adaptable Class-Specific Representation) is developed in the first case for 2D colour cryo section segmentation. This is achieved through the development of a novel algorithm for adaptable class-specific sampling of point neighbourhoods, known as the PGA (Path Growing Algorithm), combined with Learning Vector Quantization. The framework is extended to accommodate 3D volume segmentation of cryo section data and subsequently segmentation of single and multi-channel greyscale MRl data. For the latter the issues of inhomogeneity and noise are specifically addressed. Evaluation is based on comparison with previously published results on standard simulated and real data sets, using visual presentation, ground truth comparison and human observer experiments. ACSR provides the user with a simple and intuitive visual initialisation process followed by a fully automatic segmentation. Results on both cryo section and MRI data compare favourably to existing methods, demonstrating robustness both to common artefacts and multiple user initialisations. Further developments into specific clinical applications are discussed in the future work section

    Brazilian Higher Education Analysis Through Knowledge Discovery: Annual and Temporal Approaches

    A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Survey Methodologies and Marketing ResearchThis project presents the Ph.D. thesis proposal in the Information Management area and aims to contextualize the scenario of Higher Education Institutions (HEIs) in Brazil, generate new knowledge and provide subsidies to justify the relevance of the problem investigated and its contributions. It explores the Brazilian Higher Education Census, from 2010 to 2015, and other official and public databases in order to generate new knowledge, based on the fact that knowledge is the main factor of social development in the Age of the Knowledge Society and Economy. It proposes to answer the following research question: "How does the annual and temporal analysis of the Brazilian Higher Education Census and other public and official databases generate new knowledge and provide strategic information to ensure the Higher Education Institutions mission’s accomplishment?" To achieve its objective, it adopts an inductive research process as a research strategy, divided into two phases: an exploratory study, followed by the knowledge generation phase. It is an interpretative, constructionist, and quantitative study. As a methodological resource, it uses the Self-Organizing Maps (SOM), a type of neural network that explores hidden patterns in a large volume of data. In this case, specifically, it is used to discover new knowledge in the area of higher education, considering the higher education institutions, their undergraduate courses, teachers, and students. Besides, and therefore, it assesses the internal dynamics of the higher education institutions and, according to the Resource-Based View (RBV) theory, presents a new approach to identify their internal resources - a gap in the current literature. The proposed approach contributes to fostering new forms of relationship, based on the combination of similar or complementary resources between and among the institutions, which will enable them to become more entrepreneurial and to behave more collaboratively. The research also contributes to 1) the adoption of an innovative methodology - SOM - for the area of Education, specifically Higher Education and a new typology for grouping the educational institutions, courses, teachers and students; 2) the advancement of the theory of RBV; 3) the area of Education, lacking quantitative studies; and 4) the extension of the concept of the entrepreneurial university – the enhanced triple helices, based on their complementary and similar resources. This new knowledge plays a significant role in the implementation of competitive responses or decisions to take in a fiercely competitive environment and contributes to the advancement of the theory under study. Keywords: knowledge discovery, higher education, Self-Organizing Maps - SOM, entrepreneurial university.Este projeto apresenta a proposta de tese de doutoramento na área de Gestão da Informação e tem como objetivo contextualizar o cenário das Instituições de Ensino Superior (IESs) do Brasil, gerar novos conhecimentos e fornecer subsídios para justificar a relevância do problema investigado e suas contribuições. Explora o Censo Brasileiro do Ensino Superior, de 2010 a 2015, e outros bancos de dados oficiais e públicos, com o intuito de gerar novos conhecimentos, pautando-se no fato de que o conhecimento é o principal fator de desenvolvimento, tanto social quanto econômico, na Era da Economia e da Sociedade do Conhecimento. Sendo assim, se propõe a responder à seguinte pergunta de investigação: "Como a análise anual e temporal do Censo Brasileiro de Ensino Superior (IES) e de outros bancos de dados oficiais e públicos geram novos conhecimentos e fornecem informações estratégicas para garantir o cumprimento da missão central das Instituições de Ensino Superior? " Para alcançar o seu objetivo, adota um processo de investigação indutivo como estratégia de pesquisa, dividido em duas fases: um estudo exploratório, seguido da fase de geração de conhecimento. Trata-se de um estudo interpretativo, construcionista e quantitativo. Como recurso metodológico utiliza os Self-Organizing Maps (SOM), um tipo de rede neural que lida com um grande volume de dados para explorar padrões ocultos. Neste caso, especificamente, com o intuito de descobrir novos conhecimentos na área da educação superior, em especial, nas instituições de ensino, seus cursos de graduação, professores e estudantes. Além disso, e como consequência, avalia a dinâmica interna das instituições de ensino estudadas e, de acordo com a teoria da Visão Baseada em Recursos (RBV), apresenta uma nova abordagem para se avaliar os recursos internos institucionais - uma lacuna na literatura atual. Contribui também para fomentar novas formas de relacionamento, baseadas na combinação de recursos similares ou complementares entre as próprias instituições, o que lhes permitirá tornarem-se mais empreendedoras e comportarem-se de forma mais colaborativa. Como contributos, pode-se assinalar: 1) a adoção de uma metodologia inovadora – os SOM – para a área da Educação, especificamente, da Educação Superior e uma nova tipologia para o agrupamento das instituições de ensino, cursos de graduação, professores e alunos; 2) sua contribuição para o avanço da teoria da RBV, com a proposição de uma nova abordagem para a identificação e a análise dos recursos internos institucionais; 3) a contribuição para a área da Educação, carente de estudos de natureza quantitativa; e 4) a proposição de ampliação do conceito da tripa hélice para um formato aprimorado, com base em seus recursos complementares e similares. Esse novo conhecimento desempenha um papel significativo na implementação de respostas ou decisões competitivas a serem tomadas, em um ambiente competitivo acirrado, além de contribuir para o avanço das teorias em estudo

    Visualisation of multi-dimensional medical images with application to brain electrical impedance tomography

    Medical imaging plays an important role in modem medicine. With the increasing complexity and information presented by medical images, visualisation is vital for medical research and clinical applications to interpret the information presented in these images. The aim of this research is to investigate improvements to medical image visualisation, particularly for multi-dimensional medical image datasets. A recently developed medical imaging technique known as Electrical Impedance Tomography (EIT) is presented as a demonstration. To fulfil the aim, three main efforts are included in this work. First, a novel scheme for the processmg of brain EIT data with SPM (Statistical Parametric Mapping) to detect ROI (Regions of Interest) in the data is proposed based on a theoretical analysis. To evaluate the feasibility of this scheme, two types of experiments are carried out: one is implemented with simulated EIT data, and the other is performed with human brain EIT data under visual stimulation. The experimental results demonstrate that: SPM is able to localise the expected ROI in EIT data correctly; and it is reasonable to use the balloon hemodynamic change model to simulate the impedance change during brain function activity. Secondly, to deal with the absence of human morphology information in EIT visualisation, an innovative landmark-based registration scheme is developed to register brain EIT image with a standard anatomical brain atlas. Finally, a new task typology model is derived for task exploration in medical image visualisation, and a task-based system development methodology is proposed for the visualisation of multi-dimensional medical images. As a case study, a prototype visualisation system, named EIT5DVis, has been developed, following this methodology. to visualise five-dimensional brain EIT data. The EIT5DVis system is able to accept visualisation tasks through a graphical user interface; apply appropriate methods to analyse tasks, which include the ROI detection approach and registration scheme mentioned in the preceding paragraphs; and produce various visualisations
