3 research outputs found

    An Enhanced Expectation Maximization Text Document Clustering Algorithm for E-Content Analysis

    Get PDF
    Nowadays, there are many types of digital materials that can be used in the classroom. Students and scholars are migrating from textbooks to digital study materials because textbooks are too large and expensive. Teachers and college students can use and modify the materials that are freely available or with some constraints for their learning and teaching. E-content can be designed, evolved, utilized, re-used, and distributed electronically from anywhere at anytime. Because of the flexibility of time, place, and speed of learning, e-content is becoming extremely popular. It can be readily and instantly shared and communicated with an infinite number of clients all across the globe. Document clustering is most commonly used to group documents that are related to a specific topic. Text document clustering can be used to group a collection of documents regarding the information they include and to deliver search results when a user searches the internet. In this paper mainly focuses on text document clustering to cope with massive collection of E-Content documents. Enhanced Expectation Maximization Text Document Clustering (EEMTDC) clustering algorithm was proposed and compared with Expectation Maximization (EM) clustering, K-Means clustering, and Hierarchical clustering (HC) algorithms. The experiment shows that the performance of proposed EEMTDC algorithm produces greater clustering accuracy than existing clustering algorithms

    On mathematical optimization for clustering categories in contingency tables

    Get PDF
    Many applications in data analysis study whether two categorical variables are independent using a function of the entries of their contingency table. Often, the categories of the variables, associated with the rows and columns of the table, are grouped, yielding a less granular representation of the categorical variables. The purpose of this is to attain reasonable sample sizes in the cells of the table and, more importantly, to incorporate expert knowledge on the allowable groupings. However, it is known that the conclusions on independence depend, in general, on the chosen granularity, as in the Simpson paradox. In this paper we propose a methodology to, for a given contingency table and a fixed granularity, find a clustered table with the highest 蠂2 statistic. Repeating this procedure for different values of the granularity, we can either identify an extreme grouping, namely the largest granularity for which the statistical dependence is still detected, or conclude that it does not exist and that the two variables are dependent regardless of the size of the clustered table. For this problem, we propose an assignment mathematical formulation and a set partitioning one. Our approach is flexible enough to include constraints on the desirable structure of the clusters, such as must-link or cannot-link constraints on the categories that can, or cannot, be merged together, and ensure reasonable sample sizes in the cells of the clustered table from which trustful statistical conclusions can be derived. We illustrate the usefulness of our methodology using a dataset of a medical study.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This research has been financed in part by research projects EC H2020 MSCA RISE NeEDS (Grant agreement ID: 822214), FQM-329, P18-FR-2369 and US-1381178 (Junta de Andaluc铆a, with FEDER Funds), PID2019-110886RB-I00 and PID2019-104901RB-I00 (funded by MCIN/AEI/10.13039/501100011033). This support is gratefully acknowledged

    Skill requirements and labour polarisation: An association analysis based on Polish online job offers.

    Get PDF
    Abstract. This paper uses the methodological scheme of contingency tables to explore polarisation in the Polish labour market. We use a large database of online job offers published on selected Polish job portals in the period 2017-2019, whereas most of the studies on the polarisation hypothesis are based on employment data. The main advantage of our microdata is the use of information on the required skills of the vacancy. The contingency table allows us to generate clusters of vacancies whose attributes tend to appear jointly. The study reveals that office skills do not offer a particular advantage in an automated labour market, while information and computer technology skills and communication skills seem to have a shield effect in such an environment. In addition, a cluster of transversal skills (self-organisational, technical and interpersonal skills) constitutes an important requirement for most job offers. These skills should be widely developed within the educational system, at different levels. Resumen. El trabajo emplea el esquema metodol贸gico de las tablas de contingencia para explorar la polarizaci贸n en el mercado de trabajo polaco. Usamos una amplia base de datos de ofertas de trabajo online publicadas en destacados portales de empleo polacos en el periodo 2017-2019, a diferencia de la mayor铆a de los estudios sobre la hip贸tesis de polarizaci贸n, que est谩n basados en datos de empleo. La principal ventaja de nuestros microdatos es el uso de informaci贸n sobre las competencias requeridas de la vacante. La tabla de contingencia nos permite generar clusters de vacantes cuyos atributos tienden a aparecer conjuntamente. El estudio revela que las competencias de oficina no ofrecen una ventaja particular en un mercado de trabajo automatizado, mientras que las competencias de tecnolog铆as de la computaci贸n y la informaci贸n parecen tener un efecto protector en dicho entorno. Adem谩s, observamos que un cluster de competencias transversales (competencias de auto-organizaci贸n, t茅cnicas e interpersonales) constituye un requisito importante para la mayor铆a de las ofertas de trabajo. Estas competencias deber铆an ser ampliamente desarrolladas en el sistema educativo, en sus diferentes niveles.Departamento de Econom铆a, M茅todos Cuantitativos e Historia Econ贸mica. Universidad Pablo de Olavide