    Using KNN Algorithms for Determining the Recipient of Smart Indonesia Scholarship Program

    The Smart Indonesia Card (KIP) scholarship program is a government scholarship program through the Ministry of Religion of the Republic of Indonesia which is given to students who have a good academic level but have a weak economic level. Sultan Syarif Kasim State Islamic University, Riau accepts new students every year, but the quota for the KIP scholarship program is limited. With the limited quota for the KIP program, a system is needed that is able to classify submission data from students who register for the KIP program, so that the selection process can be carried out, quickly, precisely, and in accordance with the required quota. In this study, the K-Modes and K-Nearest Neighbor (KNN) Algorithms were used by using the achievement variables, report cards, and national exam scores when high school, father's income, parental status, and homeownership status. Reprocessing is carried out before the testing stage, testing is carried out by performing the initial stages, namely clustering using the K-Modes algorithm, then validating or testing data by applying the Grid Search Cross-Validation (GSCV) method, and finally predicting using the KNN algorithm. The test resulted in a performance value of 66.79


    Currently, Indonesia is included in a country with a population of old structures because of its advanced population of more than 7% of the total population and 2% comes from southern Sumatra. The large number of elderly citizens required a special government policy to formulate policies and special programs the population can use to alleviate the community. To help local government of South Sumatera government to determine the policy and program hence needed clustering elderly population by using K-mode algorithm existing in R-Studio. This study uses population census data of South Sumatera in 2010 obtained from Bapan Pusat Statistik with 47,358 data sample. From the results of this study made 4 clusters: K1 16244 people, K2 6061 people, K3 18681 people, and K4 6372 people. K1 is an elderly group of mostly men who live in the village and still work in agriculture and plantations. K2 is a cluster of women who still work and live in the village. The third K3 cluster is an elderly unemployed group that mostly lives in the city and 25% lives alone. The last K4 is a cluster of women who do not work anymore, live in the village and 73% illiterate. With the cluster the government can determine what is most appropriate for each cluster

    A Global-Relationship Dissimilarity Measure for the k

    The k-modes clustering algorithm has been widely used to cluster categorical data. In this paper, we firstly analyzed the k-modes algorithm and its dissimilarity measure. Based on this, we then proposed a novel dissimilarity measure, which is named as GRD. GRD considers not only the relationships between the object and all cluster modes but also the differences of different attributes. Finally the experiments were made on four real data sets from UCI. And the corresponding results show that GRD achieves better performance than two existing dissimilarity measures used in k-modes and Cao’s algorithms

    Identificação de oportunidades de melhorias relacionadas ao planejamento e controle da produção sobre processos finalísticos do comando logístico do Exército brasileiro

    Trabalho de Conclusão de Curso (graduação)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia de Produção, 2019.Um recorrente desafio na produtividade dentro do setor público ou privado brasileiro, está na eliminação de desperdícios. Dentro deste contexto, o setor público acaba sendo desfavorecido por embargos legislativos quando em comparação com o setor privado, mas a certeza de que este é o caminho para a redução de custos e melhor arranjo dos recursos disponíveis, é clara. Desta forma, a presente pesquisa tem como objetivo realizar análises estatísticas de algumas variáveis do planejamento e controle da produção (PCP) para alguns dos processos logísticos finalísticos de tecnologia da informação, aviação, munição e armamento do Exército Brasileiro (EB). A pesquisa considerou as atividades de planejamento de aquisições de suprimentos do comando logístico do EB. Para tanto, o início dessa análise deu-se por 27 processos mapeados com base em Business Process Management Notation (BPMN). Os elementos identificados nos mapas foram classificados na ferramenta Excel, a nível de atividades de cada processo (1093 no total) com a qual foi realizada análise para elencar os desperdícios e, respectivamente, as atividades que não acrescentavam qualquer tipo de valor ao processo. Após ordenar estas variáveis em uma matriz numérica que permite a realização de testes estatísticos como o de correlação qui-quadrada, associação e clusterização, foi possível classificar as atividades dos processos envolvidos, quantitativamente. As disposições dos dados resultantes se deram em tabelas por conta do tamanho da amostra, que inviabilizou a apresentação da correlação entre as variáveis estabelecidas na matriz binária em gráficos lineares. Entretanto, a pesquisa indicou o alto impacto de 3 dos 7 grandes desperdícios da mentalidade enxuta: por perda no próprio processamento, por espera e por transporte, tal como as suas relações com as demais variáveis significativas que foram apresentadas.A recurring challenge in productivity within the Brazilian public or private sector is the elimination of waste. Within this context, the public sector ends up being disadvantaged by legislative embargoes when compared to the private sector, but the certainty that this is the way to reduce costs and better arrangement of available resources is clear. In this way, the present research has the objective of performing statistical analyzes of some production planning and control variables (PCP) for some of the Brazilian Army's (EB) information technology, aviation, ammunition and armaments logistic processes. The research considered the activities of procurement planning of the logistics command of EB. To do so, the beginning of this analysis was done by 27 processes mapped based on Business Process Management Notation (BPMN). The elements identified in the maps were classified in the Excel tool, at the activity level of each process (1093 in total) with which an analysis was performed to list the wastes and, respectively, activities that did not add any type of value to the process. After ordering these variables into a numerical matrix that allows the performance of statistical tests such as chi-square correlation, association and clustering, it was possible to classify the activities of the involved processes, quantitatively. The resulting data provisions were given in tables on account of sample size, which made it impossible to present the correlation between the variables established in the binary matrix in linear graphs. However, the research indicated the high impact of 3 of the 7 large wastes of the lean mentality: loss in processing itself, waiting and transportation, as well as its relationship with the other significant variables that were presented

    An Efficient kk-modes Algorithm for Clustering Categorical Datasets

    Mining clusters from data is an important endeavor in many applications. The kk-means method is a popular, efficient, and distribution-free approach for clustering numerical-valued data, but does not apply for categorical-valued observations. The kk-modes method addresses this lacuna by replacing the Euclidean with the Hamming distance and the means with the modes in the kk-means objective function. We provide a novel, computationally efficient implementation of kk-modes, called OTQT. We prove that OTQT finds updates to improve the objective function that are undetectable to existing kk-modes algorithms. Although slightly slower per iteration due to algorithmic complexity, OTQT is always more accurate per iteration and almost always faster (and only barely slower on some datasets) to the final optimum. Thus, we recommend OTQT as the preferred, default algorithm for kk-modes optimization.Comment: 16 pages, 10 figures, 5 table

    Padrões de multimorbilidade em doentes internados com cancro da próstata em Portugal: Uma abordagem de análise de clusters

    A multimorbilidade pode ser definida como a coocorrência de duas ou mais condições crónicas. Este é um problema comum entre pacientes com cancro, que aumenta o outcome de resultados negativos, nomeadamente, morte prematura, complicações graves e baixa qualidade de vida, e resulta numa maior complexidade dos cuidados de saúde. A complexidade associada à coexistência de múltiplas condições crónicas, além da doença oncológica, requer abordagens de cuidados de saúde adaptadas e integradas. Isso representa um desafio significativo na vida dos pacientes, dos médicos e nos serviços de saúde como um todo. Este estudo tem como objetivo utilizar a análise de clusters para identificar e caracterizar padrões de multimorbilidade em pacientes com cancro da próstata usando dados hospitalares codificados clinicamente. Foram considerados os dados de internamentos com diagnóstico de cancro da próstata ocorridos em todos os hospitais públicos de Portugal Continental durante o período 2011-2017. Algoritmos de clustering de partição, nomeadamente, k-modes, PAM (Partitioning Around Medoids) e clustering hierárquico, foram empregues para identificar clusters de multimorbilidade. Os resultados obtidos a partir das diferentes abordagens de clustering foram comparados e avaliados quanto à relevância clínica. Um total de 10.394 episódios de internamentos foram analisados, com 6091 (58.6%) relatando multimorbilidade. Clusters semelhantes foram obtidos através dos diferentes algoritmos, com o método PAM apresentando alta estabilidade e melhores resultados em termos de coeficiente de silhueta média. A análise de 6 clusters obtidos com o PAM indica um padrão de diabetes coocorrendo com hipertensão e uma alta coocorrência de comorbilidades únicas, ou seja, hipertensão, doença pulmonar crónica, obesidade e arritmia concomitantes com o próprio cancro da próstata. A análise de clusters foi uma abordagem útil para detetar e caracterizar os diferentes padrões e perfis de multimorbilidade entre as hospitalizações por cancro da próstata em Portugal. Uma maior integração entre o cuidado do cancro e das comorbilidades deve ser reforçada para atender às necessidades dos pacientes com diversas doenças crónicas.Multimorbidity can be defined as the co-occurrence of two or more chronic conditions. This is a common issue among cancer patients, which increases the risk of negative outcomes, such as premature death, serious complications and poor quality of life, and results in greater complexity of care. The complexity associated with the coexistence of multiple chronic conditions, in addition to cancer, requires adapted and integrated health care approaches. This represents a significant challenge in the lives of patients, physicians and healthcare services as a whole. This study aims to use clustering analysis to identify and characterize multimorbidity patterns among prostate cancer patients using clinically coded hospital data. Data on hospital admissions with a diagnosis of prostate cancer occurring in all public hospitals in mainland Portugal during 2011-2017 were considered. Partitioning clustering algorithms, namely K-modes, PAM (Partitioning Around Medoids), and hierarchical clustering, have been employed to identify multimorbidity clusters. Results obtained from the different clustering approaches were compared and assessed regarding clinical relevance. A total of 10.394 inpatient episodes were analyzed, with 6091 (58.6%) reporting multimorbidity. Similar clusters were obtained through the different approaches, with PAM presenting high stability and better results in terms of average silhouette. The analysis of 6 clusters obtained with PAM indicates a pattern of diabetes co-occurring with hypertension and a high co-occurrence of single comorbidities, namely hypertension, chronic pulmonary disease, obesity, and arrhythmia with prostate cancer itself. Clustering analysis was a useful approach to detect and characterize the different patterns and profiles of multimorbidity among prostate cancer hospitalizations in Portugal. A greater integration between cancer and comorbidity care should be reinforced in order to meet the multimorbid patients’ need