115 research outputs found
Consensus-Based Agglomerative Hierarchical Clustering
ProducciĂłn CientĂficaIn this contribution, we consider that a set of agents assess a set of alternatives
through numbers in the unit interval. In this setting, we introduce a measure
that assigns a degree of consensus to each subset of agents with respect to every
subset of alternatives. This consensus measure is defined as 1 minus the outcome
generated by a symmetric aggregation function to the distances between
the corresponding individual assessments. We establish some properties of the
consensus measure, some of them depending on the used aggregation function.
We also introduce an agglomerative hierarchical clustering procedure that is generated
by similarity functions based on the previous consensus measuresMinisterio de EconomĂa, Industria y Competitividad (ECO2012-32178)Junta de Castilla y LeĂłn (programa de apoyo a proyectos de investigaciĂłn – Ref. VA066U13
Consensus-based clustering under hesitant qualitative assessments
ProducciĂłn CientĂficaIn this paper, we consider that agents judge the feasible alternatives through linguistic terms – when they are confident in their opinions – or linguistic expressions formed by several consecutive linguistic terms – when they hesitate. In this context, we propose an agglomerative hierarchical clustering process where the clusters of agents are generated by using a distance-based consensus measure.Ministerio de EconomĂa, Industria y Competitividad (ECO2012-32178)Junta de Castilla y LeĂłn (programa de apoyo a proyectos de investigaciĂłn – Ref. VA066U13
Informational Paradigm, management of uncertainty and theoretical formalisms in the clustering framework: A review
Fifty years have gone by since the publication of the first paper on clustering based on fuzzy sets theory. In 1965, L.A. Zadeh had published “Fuzzy Sets” [335]. After only one year, the first effects of this seminal paper began to emerge, with the pioneering paper on clustering by Bellman, Kalaba, Zadeh [33], in which they proposed a prototypal of clustering algorithm based on the fuzzy sets theory
Modified balanced random forest for improving imbalanced data prediction
This paper proposes a Modified Balanced Random Forest (MBRF) algorithm as a classification technique to address imbalanced data. The MBRF process changes the process in a Balanced Random Forest by applying an under-sampling strategy based on clustering techniques for each data bootstrap decision tree in the Random Forest algorithm. To find the optimal performance of our proposed method compared with four clustering techniques, like: K-MEANS, Spectral Clustering, Agglomerative Clustering, and Ward Hierarchical Clustering. The experimental result show the Ward Hierarchical Clustering Technique achieved optimal performance, also the proposed MBRF method yielded better performance compared to the Balanced Random Forest (BRF) and Random Forest (RF) algorithms, with a sensitivity value or true positive rate (TPR) of 93.42%, a specificity or true negative rate (TNR) of 93.60%, and the best AUC accuracy value of 93.51%. Moreover, MBRF also reduced process running time
Knowledge aggregation in people recommender systems : matching skills to tasks
People recommender systems (PRS) are a special type of RS. They are often adopted to identify people capable of performing a task. Recommending people poses several challenges not exhibited in traditional RS. Elements such as availability, overload, unresponsiveness, and bad recommendations can have adverse effects. This thesis explores how people’s preferences can be elicited for single-event matchmaking under uncertainty and how to align them with appropriate tasks. Different methodologies are introduced to profile people, each based on the nature of the information from which it was obtained. These methodologies are developed into three use cases to illustrate the challenges of PRS and the steps taken to address them. Each one emphasizes the priorities of the matching process and the constraints under which these recommendations are made. First, multi-criteria profiles are derived completely from heterogeneous sources in an implicit manner characterizing users from multiple perspectives and multi-dimensional points-of-view without influence from the user. The profiles are introduced to the conference reviewer assignment problem. Attention is given to distribute people across items in order reduce potential overloading of a person, and neglect or rejection of a task. Second, people’s areas of interest are inferred from their resumes and expressed in terms of their uncertainty avoiding explicit elicitation from an individual or outsider. The profile is applied to a personnel selection problem where emphasis is placed on the preferences of the candidate leading to an asymmetric matching process. Third, profiles are created by integrating implicit information and explicitly stated attributes. A model is developed to classify citizens according to their lifestyles which maintains the original information in the data set throughout the cluster formation. These use cases serve as pilot tests for generalization to real-life implementations. Areas for future application are discussed from new perspectives.Els sistemes de recomanaciĂł de persones (PRS) sĂłn un tipus especial de sistemes recomanadors (RS). Sovint s’utilitzen per identificar persones per a realitzar una tasca. La recomanaciĂł de persones comporta diversos reptes no exposats en la RS tradicional. Elements com la disponibilitat, la sobrecĂ rrega, la falta de resposta i les recomanacions incorrectes poden tenir efectes adversos. En aquesta tesi s'explora com es poden obtenir les preferències dels usuaris per a la definiciĂł d'assignacions sota incertesa i com aquestes assignacions es poden alinear amb tasques definides. S'introdueixen diferents metodologies per definir el perfil d’usuaris, cadascun en funciĂł de la naturalesa de la informaciĂł necessĂ ria. Aquestes metodologies es desenvolupen i s’apliquen en tres casos d’ús per il·lustrar els reptes dels PRS i els passos realitzats per abordar-los. Cadascun destaca les prioritats del procĂ©s, l’encaix de les recomanacions i les seves limitacions. En el primer cas, els perfils es deriven de variables heterogènies de manera implĂcita per tal de caracteritzar als usuaris des de mĂşltiples perspectives i punts de vista multidimensionals sense la influència explĂcita de l’usuari. Això s’aplica al problema d'assignaciĂł d’avaluadors per a articles de conferències. Es presta especial atenciĂł al fet de distribuir els avaluadors entre articles per tal de reduir la sobrecĂ rrega potencial d'una persona i el neguit o el rebuig a la tasca. En el segon cas, les Ă rees d’interès per a caracteritzar les persones es dedueixen dels seus currĂculums i s’expressen en termes d’incertesa evitant que els interessos es demanin explĂcitament a les persones. El sistema s'aplica a un problema de selecciĂł de personal on es posa èmfasi en les preferències del candidat que condueixen a un procĂ©s d’encaix asimètric. En el tercer cas, els perfils dels usuaris es defineixen integrant informaciĂł implĂcita i atributs indicats explĂcitament. Es desenvolupa un model per classificar els ciutadans segons els seus estils de vida que mantĂ© la informaciĂł original del conjunt de dades del clĂşster al que ell pertany. Finalment, s’analitzen aquests casos com a proves pilot per generalitzar implementacions en futurs casos reals. Es discuteixen les Ă rees d'aplicaciĂł futures i noves perspectives.Postprint (published version
COMMUNITY DETECTION AND INFLUENCE MAXIMIZATION IN ONLINE SOCIAL NETWORKS
The detecting and clustering of data and users into communities on the social web are important and complex issues in order to develop smart marketing models in changing and evolving social ecosystems. These marketing models are created by individual decision to purchase a product and are influenced by friends and acquaintances. This leads to novel marketing models, which view users as members of online social network communities, rather than the traditional view of marketing to individuals. This thesis starts by examining models that detect communities in online social networks. Then an enhanced approach to detect community which clusters similar nodes together is suggested. Social relationships play an important role in determining user behavior. For example, a user might purchase a product that his/her friend recently bought. Such a phenomenon is called social influence and is used to study how far the action of one user can affect the behaviors of others. Then an original metric used to compute the influential power of social network users based on logs of common actions in order to infer a probabilistic influence propagation model. Finally, a combined community detection algorithm and suggested influence propagation approach reveals a new influence maximization model by identifying and using the most influential users within their communities. In doing so, we employed a fuzzy logic based technique to determine the key users who drive this influence in their communities and diffuse a certain behavior. This original approach contrasts with previous influence propagation models, which did not use similarity opportunities among members of communities to maximize influence propagation. The performance results show that the model activates a higher number of overall nodes in contemporary social networks, starting from a smaller set of key users, as compared to existing landmark approaches which influence fewer nodes, yet employ a larger set of key users
Determine OWA operator weights using kernel density estimation
Some subjective methods should divide input values into local
clusters before determining the ordered weighted averaging
(OWA) operator weights based on the data distribution characteristics
of input values. However, the process of clustering input values
is complex. In this paper, a novel probability density based
OWA (PDOWA) operator is put forward based on the data distribution
characteristics of input values. To capture the local cluster
structures of input values, the kernel density estimation (KDE) is
used to estimate the probability density function (PDF), which fits
to the input values. The derived PDF contains the density information
of input values, which reflects the importance of input
values. Therefore, the input values with high probability densities
(PDs) should be assigned with large weights, while the ones with
low PDs should be assigned with small weights. Afterwards, the
desirable properties of the proposed PDOWA operator are investigated.
Finally, the proposed PDOWA operator is applied to handle
the multicriteria decision making problem concerning the evaluation
of smart phones and it is compared with some existing
OWA operators. The comparative analysis shows that the proposed
PDOWA operator is simpler and more efficient than the
existing OWA operator
- …