3 research outputs found
Discovering Better AAAI Keywords via Clustering with Community-Sourced Constraints
Selecting good conference keywords is important because they often determine the composition of review committees and hence which papers are reviewed by whom. But presently conference keywords are generated in an ad-hoc manner by a small set of conference organizers. This approach is plainly not ideal. There is no guarantee, for example, that the generated keyword set aligns with what the community is actually working on and submitting to the conference in a given year. This is especially true in fast moving fields such as AI. The problem is exacerbated by the tendency of organizers to draw heavily on preceding years' keyword lists when generating a new set. Rather than a select few ordaining a keyword set that that represents AI at large, it would be preferable to generate these keywords more directly from the data, with input from research community members. To this end, we solicited feedback from seven AAAI PC members regarding a previously existing keyword set and used these 'community-sourced constraints' to inform a clustering over the abstracts of all submissions to AAAI 2013. We show that the keywords discovered via this data-driven, human-in-the-loop method are at least as preferred (by AAAI PC members) as 2013's manually generated set, and that they include categories previously overlooked by organizers. Many of the discovered terms were used for this year's conference
Clustering de Documentos con Restricciones de Tamaño
[EN] El análisis de clusters tiene por objetivo dividir objetos de datos en grupos, de tal
manera que los objetos dentro de un mismo grupo sean muy similares entre sà y
diferentes de los objetos de otros grupos. Tradicionalmente, el clustering es visto como
un método de aprendizaje no supervisado, que agrupa los objetos de datos basándose
únicamente en la información presentada en el conjunto de datos, sin información
externa. El K-Medoides es uno de los más famosos y sencillos algoritmos de
agrupamiento, donde el usuario define el número de clusters deseados.
En muchas aplicaciones del mundo real, tales como: codificación de imágenes,
agrupamientos espaciales en geo-informática, segmentación de clientes o agrupamiento
de documentos, por lo general hay restricciones o prioridades en la definición del
problema que limitan, el espacio de posibles soluciones, al problema o rango de interés
de las soluciones. Este tipo de problemas se tratan mediante métodos de agrupamiento
semi-supervisados.
El presente trabajo pretende diseñar, implementar y probar modificaciones en los
algoritmos de clustering tradicionales, para incorporar restricciones de tamaño en cada
cluster. EspecÃficamente, se proponen dos nuevos algoritmos de agrupamiento semisupervisado,
basados en: programación lineal entera binaria con restricciones del tipo
cannot-link y en una variación del algoritmo K-Medoides, respectivamente.
Para mostrar la aplicabilidad de los métodos de agrupación semi-supervisados
propuestos, se aborda el problema de configuración automática del programa de una
conferencia, con agrupación de artÃculos por similitud. Se incluyen experimentos,
aplicando las nuevas técnicas, sobre conjuntos de datos de conferencias reales: ICMLA-
2014, AAAI-2013 y AAAI-2014. Los resultados de estos experimentos muestran que los
nuevos métodos son capaces de resolver problemas prácticos y reales.[EN] Cluster analysis aims to divide data objects in groups, so that objects within a
group are very similar and different of those objects from other groups. Traditionally,
clustering is known as a method of unsupervised learning, which groups data objects
only based on the information presented in the dataset, without external information.
The K-Medoids is one of the most famous and simple clustering algorithms, where the
user defines the desired number of clusters.
In many real-world applications, such as image coding, spatial clustering in geoinformatics,
customer segmentation or grouping of documents, there are usually
constraints or priorities in the problem definition that limit the space of possible
solutions to the problem or rank the interest of the solutions. This kind of problems are
addressed by semi-supervised clustering methods.
This paper aims to design, implement and test modifications in traditional clustering
algorithms to incorporate size constraints in each cluster. Specifically, two new
algorithms are proposed to semi-supervised clustering, based on: binary integer linear
programming with cannot-link constraints and a variation of the K-Medoids algorithm,
respectively.
The applicability of the proposed semi-supervised clustering methods is illustrated by
addressing the problem of automatic configuration of conference schedules by clustering
articles by similarity. We include experiments, applying the new techniques, over real
conferences datasets: ICMLA-2014, AAAI-2013 and AAAI-2014. The results of these
experiments show that the new methods are able to solve practical and real problems.Vallejo Huanga, DF. (2016). Clustering de Documentos con Restricciones de Tamaño. http://hdl.handle.net/10251/69089.TFG