Search CORE

23 research outputs found

Iterative Optimization and Simplification of Hierarchical Clusterings

Author: Fisher D.
Publication venue
Publication date: 01/01/1995
Field of study

Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high quality, but be computationally inexpensive as well. In general, we cannot have it both ways, but we can partition the search so that a system inexpensively constructs a `tentative' clustering for initial examination, followed by iterative optimization, which continues to search in background for improved clusterings. Given this motivation, we evaluate an inexpensive strategy for creating initial clusterings, coupled with several control strategies for iterative optimization, each of which repeatedly modifies an initial clustering in search of a better one. One of these methods appears novel as an iterative optimization strategy in clustering contexts. Once a clustering has been constructed it is judged by analysts -- often according to task-specific criteria. Several authors have abstracted these criteria and posited a generic performance task akin to pattern completion, where the error rate over completed patterns is used to `externally' judge clustering utility. Given this performance task, we adapt resampling-based pruning strategies used by supervised learning systems to the task of simplifying hierarchical clusterings, thus promising to ease post-clustering analysis. Finally, we propose a number of objective functions, based on attribute-selection measures for decision-tree induction, that might perform well on the error rate and simplicity dimensions.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Data Stream Clustering: Challenges and Issues

Author: Khalilian Madjid
Mustapha Norwati
Publication venue
Publication date: 01/01/2010
Field of study

Very large databases are required to store massive amounts of data that are continuously inserted and queried. Analyzing huge data sets and extracting valuable pattern in many applications are interesting for researchers. We can identify two main groups of techniques for huge data bases mining. One group refers to streaming data and applies mining techniques whereas second group attempts to solve this problem directly with efficient algorithms. Recently many researchers have focused on data stream as an efficient strategy against huge data base mining instead of mining on entire data base. The main problem in data stream mining means evolving data is more difficult to detect in this techniques therefore unsupervised methods should be applied. However, clustering techniques can lead us to discover hidden information. In this survey, we try to clarify: first, the different problem definitions related to data stream clustering in general; second, the specific difficulties encountered in this field of research; third, the varying assumptions, heuristics, and intuitions forming the basis of different approaches; and how several prominent solutions tackle different problems. Index Terms- Data Stream, Clustering, K-Means, Concept driftComment: IMECS201

arXiv.org e-Print Archive

CiteSeerX

Comparison and validation of community structures in complex networks

Author: Anna Lombardi
Ashburner
Azuaje
Bolshakova
Danon
Duch
Evans
Fisher
Girvan
Guimera
Gusfield
Jaccard
Maslov
Massen
Michael Hörnquist
Mika Gustafsson
Milligan
Newman
Newman
Newman
Newman
Rives
Rousseeuw
Stanley
Strehl
Zachary
Zhou
Publication venue: 'Elsevier BV'
Publication date: 10/01/2006
Field of study

The issue of partitioning a network into communities has attracted a great deal of attention recently. Most authors seem to equate this issue with the one of finding the maximum value of the modularity, as defined by Newman. Since the problem formulated this way is NP-hard, most effort has gone into the construction of search algorithms, and less to the question of other measures of community structures, similarities between various partitionings and the validation with respect to external information. Here we concentrate on a class of computer generated networks and on three well-studied real networks which constitute a bench-mark for network studies; the karate club, the US college football teams and a gene network of yeast. We utilize some standard ways of clustering data (originally not designed for finding community structures in networks) and show that these classical methods sometimes outperform the newer ones. We discuss various measures of the strength of the modular structure, and show by examples features and drawbacks. Further, we compare different partitions by applying some graph-theoretic concepts of distance, which indicate that one of the quality measures of the degree of modularity corresponds quite well with the distance from the true partition. Finally, we introduce a way to validate the partitionings with respect to external data when the nodes are classified but the network structure is unknown. This is here possible since we know everything of the computer generated networks, as well as the historical answer to how the karate club and the football teams are partitioned in reality. The partitioning of the gene network is validated by use of the Gene Ontology database, where we show that a community in general corresponds to a biological process.Comment: To appear in Physica A; 25 page

arXiv.org e-Print Archive

Crossref

CERN Document Server

Optimal metric for condition rating of existing buildings: is five the right number?

Author: Aguado de Cea Antonio
Casas Rius Joan Ramon
Ruiz Gorrindo Félix
Serrat Piè Carles
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2019
Field of study

This is an Accepted Manuscript of an article published by Taylor & Francis in Structure and Infrastructure Engineering on January 2019, available online: http://www.tandfonline.com/10.1080/15732479.2018.1557702In the context of the built environment in the recent years, the concept of maintenance has changed from corrective to preventive maintenance. There is evidence that preventive maintenance is much more efficient than corrective maintenance, since severe deteriorations that may represent danger to people are avoided, and also money is saved. To make periodic inspections of the buildings is useful to quantify, the extent to which deteriorations are severe or not, in order to facilitate decision making and prioritise interventions. To this purpose, many scales have been used and are used to assess the severity of damage and degradation of the building components. But it appears evident that there is not consensus among users and these scales are different between them, with different number of degrees and metrics for the measurement of the condition state. The main goal of this paper is to calculate which is the optimal metric (which is the optimal number of degrees) of a severity scale of damages in buildings, so the corresponding scale could be of widespread and of common use among professionals, avoiding the problems of comparison between different evaluators. The proposed methodology to calculate the optimal metric of a scale can be also extended to other scopes.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Stepwise iterative maximum likelihood clustering approach

Author: A Ben-Hur
A Cd
A Sharma
A Sharma
A Sharma
A Sharma
AK Jain
Alok Sharma
AP Dempster
B Mirkin
C Chen
D Defays
D Fisher
Daichi Shigemizu
E Elhamifar
E-J Yeoh
EF Lock
EK Latch
ER Berndt
I Misztal
J Felsenstein
J Khan
J Lee
J Lee
J-H Chiang
JS Liu
JS Long
K Wang
Keith A. Boroevich
M Ramoni
MD Wilkerson
Michiaki Kubo
MM Rahman
Q Mo
R Fletcher
R Sibson
RI Jennrich
S Farrell
S Jun
S Monti
S-J Horng
SA Armstrong
T Denoeux
T Hastie
Tatsuhiko Tsunoda
WC Davidon
X Zheng
Y Yamaguchi-Kabata
Yoichiro Kamatani
Yosvany López
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Aplicação de redes neurais ART e análise de textura para a classificação do estado de alteração de agregados minerais

Author: de Gouveia Lilian Tais
Senger Luciano Jose
Publication venue: 'Universidade Federal do Rio Grande do Sul'
Publication date: 16/03/2011
Field of study

Uma nova abordagem para identificação do estado de alteração de agregados minerais destinados à obras de construção civil é apresentada. Tal identificação é de fundamental importância para evitar insucessos e ocorrência de defeitos prematuros na realização de obras que podem ser atribuídos à qualidade do agregado utilizado quanto ao seu estado de alteração. Técnicas de processamento de imagens são empregadas para aquisição dos histogramas dos canais de cor das imagens, seguidos do cálculo da entropia dos histogramas que fornece as características principais para a classificação. Finalmente, um modelo de aquisição de conhecimento incremental e de classificação que emprega redes neurais ART (Adaptive Resonance Theory) é construído para automatizar o processo de classificação. O modelo de classificação é organizado em duas etapas. Na primeira etapa, os agregados são classificados como alterados e não alterados, e em uma segunda etapa, o grupo deagregados alterados é classificado quanto ao grau de alteração. O modelo proposto apresenta resultados de classificação melhores quando comparados com aqueles obtidos através de outros algoritmos de classificação

Em Questao

Archives of the Faculty of Veterinary Medicine UFRGS

2D–EM clustering approach for high-dimensional data through folding feature vectors

Author: A Ben-Hur
A Fujimoto
A Sharma
A Sharma
A Sharma
A Sharma
A Sharma
A Sharma
A Sharma
A Sharma
A Sharma
AAA Esmin
AK Jain
Alok Sharma
AP Dempster
C Bock
C Chen
C Fraley
D Defays
D Fisher
DD Lee
E Elhamifar
E-J Yeoh
EC Chi
EF Lock
EK Latch
GJ Gordon
H Kim
I Misztal
IS Dhillon
J Adachi
J Felsenstein
J Khan
J Lee
J Lee
J-H Chiang
JP Brunet
JS Liu
JS Long
K Wang
KA Heller
KK Paliwal
L Hubert
M Ramoni
MB Cohen
MD Wilkerson
N Yamada
NX Vinh
O Maimon
OH Kwon
Piotr J. Kamola
Q Mo
R Fletcher
R Sibson
RC de Amorim
S Farrell
S Guha
S Jun
S Monti
S Ramaswamy
S Vaithyanathan
S-J Horng
SA Armstrong
T Denoeux
T Hastie
Tatsuhiko Tsunoda
TR Golub
U von Luxburg
UM Fayyad
WC Davidon
WM Rand
XL Liang
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Method and system for data clustering for very large databases

Author: Livny Miron
Ramakrishnan Raghu
Zhang Tian
Publication venue
Publication date: 03/11/1998
Field of study

Multi-dimensional data contained in very large databases is efficiently and accurately clustered to determine patterns therein and extract useful information from such patterns. Conventional computer processors may be used which have limited memory capacity and conventional operating speed, allowing massive data sets to be processed in a reasonable time and with reasonable computer resources. The clustering process is organized using a clustering feature tree structure wherein each clustering feature comprises the number of data points in the cluster, the linear sum of the data points in the cluster, and the square sum of the data points in the cluster. A dense region of data points is treated collectively as a single cluster, and points in sparsely occupied regions can be treated as outliers and removed from the clustering feature tree. The clustering can be carried out continuously with new data points being received and processed, and with the clustering feature tree being restructured as necessary to accommodate the information from the newly received data points

NASA Technical Reports Server

Pengaruh kefungsian keluarga terhadap pengherotan kognitif pelajar dan ketahanan diri sebagai perantara

Author: Siti Hajar Mohamad Yusoff
Publication venue
Publication date: 01/01/2017
Field of study

In line with the National Education Philosophy (NEP), the Malaysian Education Blueprint (2013-2025) was launched in order to produce students who are capable of thinking constructively and able to face the obstacles, while developing leadership skills and potential in students. However, family functioning also plays a role in developing students' resilience and positive thinking. Therefore, this study is to identify the influence of family functioning on student cognitive distortion and resilience as a mediator. The quantitative approach was used in this study, while the cross-sectional surveys applied in the data collection process. The instruments used were Family Adaptability and Cohesion Evaluation Scales III, Resilient Scale, and Cognitive Distortion Scale. A total of 376 respondents from daily secondary school students in the Northern Zone were participated in this study. Systematic random sampling and disproportionate stratified random sampling had been used. Quantitative data were analyzed by t-test statistical analysis, Analysis of Variance (ANOVA), Pearson Correlation, and Multiple Regression using the Statistical Package for Social Science for Windows (SPSS). The findings showed a significant relation between family functioning with resilience and cognitive distortion. The results of multiple regression analysis showed that family functioning was acting as contributor to the resilience and cognitive distortion were experienced by students. The findings also showed the resilience act as a partial mediator on the relationship between family functioning and cognitive distortion, where the five main dimensions of resilience are self assurance, problem solver, organized, socially connected, and proactive have been identified as the main contributors. In conclusion, this study has builds a significant theoretical framework to show a good level of family functioning can affect the resilience and cognitive distortion among students, as well as being able to contribute to various parties such as school management, counselors, parents, and researchers in order to produce future leaders who capable in various aspects

Universiti Utara Malaysia: UUM eTheses