7 research outputs found
Труды Вычислительного центра. 36
http://www.ester.ee/record=b1181233*es
Um estudo sobre alguns metodos hierarquicos para analise de agrupamentos
Orientador : Gabriela StangenhausDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Ciencia da ComputaçãoResumo: Dez métodos hierárquicos aglomerativos para Análise de Agrupamentos tiveram seus desempenhos comparados ante a diferentes estruturas de dados. Foi construido um experimento com estrutura fatorial, onde os fatores eram diferentes aspectos de estrutura de dados. A presença de grupos com sobreposição, a matriz de dispersão dentro dos grupos e a correlação entre as variáveis foram alguns dos fatores considerados. Amostras foram simuladas sob as diferentes condições determinadas pelos cruzamentos dos níveis dos fatores. Os métodos foram aplicados à essas amostras e seus desempenhos mensurados quanto a recuperação das estruturas de grupos embutidas nas amostras. Dentre os fatores estudados, as sobreposição dos grupos foi o que mais afetou o desempenho dos métodos. São feitas sugestões para o emprego de alguns dos métodos. Não sendo detectada a presença de observações com valores discrepantes nos dados é sugerido o emprego dos métodos da Média das Ligações e o Centróide. Ante a presença de observações com essa característica é sugerido o emprego dos métodos da Ligação de Densidades em Dois Estágios e do Beta-Flexível. Nos dois casos ou na falta de informações sobre os dados, é sugerido que o método. De Ward deve sempre ser empregadoAbstract: Not informedMestradoMestre em Estatístic
Recommended from our members
An evaluation of cluster analysis and related multivariate techniques for operational research
The following work. is an investigation into methods of Cluster Analysis and Ordination. The main objective of this thesis has been to investigate the capabilities of these methods for practical usage. An important subsidiary aim has been to collect together related work which has been carried out in many different areas - ecology, biology, archaeology, psychology, etc.., into, one work. After a_ brief introduction to the general concepts of multivariate analysis in Section A of the thesis, Section B gives an introductory account of the methods of Clustering, Ordination and Seriation, putting them into the context of the by now better established multivariate techniques. Section C considers Cluster Analysis in depth, explaining and examining various methods reported in the literature, together with methods developed by the author. The suitability of the methods for practical use is discussed and, decision rules are set out for the choice of method to be used in any particular study, based on the results of extensive comparative tests of the methods. In Section D the various ordination methods are. considered, giving an, overall viere and relating the methods to each other. Particular emphasis is paid to the rather neglected metric methods. Section E, after a survey of published applications of the methods, suggests new areas where the methods previously discussed could be valuable aids for data investigation and problem solving. An Addenda is included which describes several operational research case studies using these methods. Computer programs are given for the most successful of the newly introduced cluster methods, and an extensive reference section is also included
Recommended from our members
Clustering Information Retrieval Search Outputs
Users are known to have difficulties in dealing with information retrieval search outputs especially if the outputs are above a certain size. It has been argued by several researchers that search output clustering can help users in their interaction with IR systems. Clustering may provide users an overview of the output by exploiting the topicality information that resides in the output but has not been used in the retrieval stage. It can enable them to find the relevant documents more easily and also help them to form an understanding of the different facets of the query that have been provided for their Inspection. This project aimed to investigate the viability of using clustering as a way of mediating users’ interaction with search outputs and attempted to identify its possible benefits.
Can&Ozkarahan’s(90) C3M algorithm was used to test the effectiveness of clustering as a way of search output presentation. C3M is a relatively simple, non-hierarchical method that has been shown to give compatible or superior results to best-known hierarchical methods.
The method was implemented in TCL and linked to the department’s experimental IR system Okapi. Implementation included a procedure of term selection for document representation which preceded the clustering process and a procedure involving cluster representation for users’ viewing following the clustering process. After some tuning of the implementation parameters for the databases used, several experiments were designed and conducted to assess whether clusters could group documents in useful ways.
One group of experiments aimed to assess the ability of the implementation to bring together topically related documents. It was quite difficult to gather data for such an assessment, but the existence of a set of data generated for TREC Interactive track(1996) enabled us to design experiments that at least approximately satisfied our objective. TREC provided a set of queries, and groups of relevant documents with facet assignments made by expert users. It was thus possible to make an Inference by measuring the correlation between the clusters relevant documents were assigned to and the facet assignments made for the documents by TREC experts.
The utility of this data set was limited for various reasons discussed in the related chapters, however, it can be concluded that clusters cannot be relied on to bring together relevant documents assigned to a certain facet. While there was some correlation between the cluster and facet assignments of the documents when the clustering was done only on relevant documents, no correlation could be found when the clustering was based on results of queries defined by City participants to the Interactive track.
Another group of experiments was conducted to compare output clustering with relevance ranking as a search output representation method. This comparison was necessary as an immediate consequence of clustering search output would be the loss of relevance ranking. It had to be assessed whether clustering could help users to find the relevant documents more easily than by relevance ranking, before any clustering solution could be proposed as an alternative to relevance ranked output.
For this purpose, two sets of user experiments(n=20 and n=57) were conducted based on the users’ own information needs. While changes have been made to the implementation between the first and the second set of experiments, the experimental design was almost the same in both runs. Users were first asked to rank clusters formed from the search output(top 50 documents) and then make relevance judgements for the individual documents for the same output. The precision of cluster(s) marked best by the users were then compared to precision values that would be attained by relevance ranking at comparable thresholds.
The results from the 1st group of user experiments were not conclusive(in some part due to the smallness of the data set), but they drew our attention to the importance of representation of clusters and documents for users’ viewing. After some changes to the implementation, mainly related to representation issues, and an intermediate set of 10 experiments to assess two new representation formats, a set of 57 user experiments were conducted to measure and compare precision values attainable by clustering versus relevance ranking.
These experiments revealed no significant precision difference between clustered outputs and ranked lists. The number of cases where one method achieved better than the other was slightly higher for the ranked lists at the top cluster level and slightly higher for the clustered representation at the top two clusters level. However the overall average precision values were higher for the ranked list at both levels.
As such, clustering did not appear to be preferable to ranked lists especially as It also represented overheads in both computing time and resources involved in creation of the clusters, and the time and effort taken by the users to inspect them.
An interesting outcome of the user experiments was the ability of the users to identify clusters that do not include relevant information. There were less relevant documents among the clusters marked last by the users as compared to the documents ranked last at similar threshold levels. This brought out the possibility of using clusters as an exclusion tool to improve the precision of ranked lists. After exclusion of documents from the last cluster, ranked lists performed significantly better than the clusters at the top cluster level.
There was also some evidence (consisting of observation of users during the experiments and a few user comments) that clusters could be used to provide the users with a glimpse of the search results, in order to decide whether to inspect the search results or initiate a new query straight away.
In summary, cumulative experiment results imply that clustering cannot outperform relevance ranking, and seems to deserve only a secondary role in users’ interaction with IR systems. However, it should also be noted that the experiment results are not representative of the whole set of possible user types and search situations and it may be possible to Identify search situations where clustering can be more beneficial than relevance ranking
Place in social process : an exploratory data analysis of outcomes from localised labour exchange
The principle tenet of this Study is that place and its role in social process is
poorly understood. This is a serious problem in human geography where one of the
major tasks is to elucidate the spatial elements in social process. The resulting
difficulties are compounded in empirical analysis where the spatial and social are
highly disaggregated. Any response must, therefore, address these features of the
problem if the situation is to be redressed. A twofold response was formulated.
The first, concentrates attention on labour exchange as a key element of social
process and investigates spatial differences from the highly disaggregated local
perspective. The second involves transferring Tukey's philosophy of
EXPLORATORY DATA ANALYSIS to geographical research. This has been done
to overcome analytical rigidities which impede progress where theory concept or dat a
are sufficiently suspect as to cause uncertainty. Implementation of this strategy
progresses from comparatively simple and conventional treatments of place in labour
exchange to more sophisticated examinations which explore spatial aspects of
differentiation in controlled analytical environments. Substantive investigation of labour exchange, from an exploratory point of
view, provides powerful insights into the role of place in labour exchange because it
is less constrained than conventional treatments. These insights are manifest
through analyses of extent and nature in differentiation between places. Results are
of three types: structure in place and social process which establish a prima facie
case for more general analysis; structure of place in a widely defined social
environment; structure in social process which is sufficiently general as to sustain
hypotheses of ubiquitous spatial structure. These interim findings, of merit in their
own right, combine to provide a sound foundation for proposition of a model
relating place to social process. This model is significant because it reverses the
principal tenet of traditional empirical models, which reduce place to the status of
an analytical convenience, and argues that it is inherent in considerations of social
process