Search CORE

3 research outputs found

An Enhanced Expectation Maximization Text Document Clustering Algorithm for E-Content Analysis

Author: K Ponmani
M Thangaraj
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 06/02/2023
Field of study

Nowadays, there are many types of digital materials that can be used in the classroom. Students and scholars are migrating from textbooks to digital study materials because textbooks are too large and expensive. Teachers and college students can use and modify the materials that are freely available or with some constraints for their learning and teaching. E-content can be designed, evolved, utilized, re-used, and distributed electronically from anywhere at anytime. Because of the flexibility of time, place, and speed of learning, e-content is becoming extremely popular. It can be readily and instantly shared and communicated with an infinite number of clients all across the globe. Document clustering is most commonly used to group documents that are related to a specific topic. Text document clustering can be used to group a collection of documents regarding the information they include and to deliver search results when a user searches the internet. In this paper mainly focuses on text document clustering to cope with massive collection of E-Content documents. Enhanced Expectation Maximization Text Document Clustering (EEMTDC) clustering algorithm was proposed and compared with Expectation Maximization (EM) clustering, K-Means clustering, and Hierarchical clustering (HC) algorithms. The experiment shows that the performance of proposed EEMTDC algorithm produces greater clustering accuracy than existing clustering algorithms

International Journal on Recent and Innovation Trends in Computing and Communication

On mathematical optimization for clustering categories in contingency tables

Author: Carrizosa Emilio
Guerrero Lozano Vanesa
Romero Morales Dolores
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/06/2022
Field of study

Many applications in data analysis study whether two categorical variables are independent using a function of the entries of their contingency table. Often, the categories of the variables, associated with the rows and columns of the table, are grouped, yielding a less granular representation of the categorical variables. The purpose of this is to attain reasonable sample sizes in the cells of the table and, more importantly, to incorporate expert knowledge on the allowable groupings. However, it is known that the conclusions on independence depend, in general, on the chosen granularity, as in the Simpson paradox. In this paper we propose a methodology to, for a given contingency table and a fixed granularity, find a clustered table with the highest χ2 statistic. Repeating this procedure for different values of the granularity, we can either identify an extreme grouping, namely the largest granularity for which the statistical dependence is still detected, or conclude that it does not exist and that the two variables are dependent regardless of the size of the clustered table. For this problem, we propose an assignment mathematical formulation and a set partitioning one. Our approach is flexible enough to include constraints on the desirable structure of the clusters, such as must-link or cannot-link constraints on the categories that can, or cannot, be merged together, and ensure reasonable sample sizes in the cells of the clustered table from which trustful statistical conclusions can be derived. We illustrate the usefulness of our methodology using a dataset of a medical study.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This research has been financed in part by research projects EC H2020 MSCA RISE NeEDS (Grant agreement ID: 822214), FQM-329, P18-FR-2369 and US-1381178 (Junta de Andalucía, with FEDER Funds), PID2019-110886RB-I00 and PID2019-104901RB-I00 (funded by MCIN/AEI/10.13039/501100011033). This support is gratefully acknowledged

Universidad Carlos III de Madrid e-Archivo

Skill requirements and labour polarisation: An association analysis based on Polish online job offers.

Author: Arendt Lukasz
Galecka-Burdziak Ewa
Núñez Fernando
Pater Robert
Usabiaga Carlos
Publication venue: 'Elsevier BV'
Publication date: 01/08/2022
Field of study

Abstract. This paper uses the methodological scheme of contingency tables to explore polarisation in the Polish labour market. We use a large database of online job offers published on selected Polish job portals in the period 2017-2019, whereas most of the studies on the polarisation hypothesis are based on employment data. The main advantage of our microdata is the use of information on the required skills of the vacancy. The contingency table allows us to generate clusters of vacancies whose attributes tend to appear jointly. The study reveals that office skills do not offer a particular advantage in an automated labour market, while information and computer technology skills and communication skills seem to have a shield effect in such an environment. In addition, a cluster of transversal skills (self-organisational, technical and interpersonal skills) constitutes an important requirement for most job offers. These skills should be widely developed within the educational system, at different levels. Resumen. El trabajo emplea el esquema metodológico de las tablas de contingencia para explorar la polarización en el mercado de trabajo polaco. Usamos una amplia base de datos de ofertas de trabajo online publicadas en destacados portales de empleo polacos en el periodo 2017-2019, a diferencia de la mayoría de los estudios sobre la hipótesis de polarización, que están basados en datos de empleo. La principal ventaja de nuestros microdatos es el uso de información sobre las competencias requeridas de la vacante. La tabla de contingencia nos permite generar clusters de vacantes cuyos atributos tienden a aparecer conjuntamente. El estudio revela que las competencias de oficina no ofrecen una ventaja particular en un mercado de trabajo automatizado, mientras que las competencias de tecnologías de la computación y la información parecen tener un efecto protector en dicho entorno. Además, observamos que un cluster de competencias transversales (competencias de auto-organización, técnicas e interpersonales) constituye un requisito importante para la mayoría de las ofertas de trabajo. Estas competencias deberían ser ampliamente desarrolladas en el sistema educativo, en sus diferentes niveles.Departamento de Economía, Métodos Cuantitativos e Historia Económica. Universidad Pablo de Olavide

Repositorio Institucional Olavide

idUS. Depósito de Investigación Universidad de Sevilla