Search CORE

13 research outputs found

Recommended from our members

Parallel computing in information retrieval - An updated review

The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for Text Retrieval. We analyse parallel IR systems using a classification due to Rasmussen [1] and describe some parallel IR systems. We give a description of the retrieval models used in parallel Information Processing.. We describe areas of research which we believe are needed

City Research Online

Crossref

Methods of Hierarchical Clustering

Author: Contreras Pedro
Murtagh Fionn
Publication venue
Publication date: 01/01/2011
Field of study

We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical density-based approaches. Finally we describe a recently developed very efficient (linear time) hierarchical clustering algorithm, which can also be viewed as a hierarchical grid-based algorithm.Comment: 21 pages, 2 figures, 1 table, 69 reference

arXiv.org e-Print Archive

Royal Holloway Research Online

Royal Holloway - Pure

Recommended from our members

Distributed Inverted Files and Performance: A Study of Parallelism and Data Distribution Methods in IR

Author: Macfarlane A.
Publication venue
Publication date
Field of study

The study investigates the performance of parallel information retrieval (IR) algorithms on different data distribution methods for Inverted files to identify which is the best for the requirements of specific IR tasks. We define a data distribution method as a way of distributing Inverted file data to local disks on a parallel machine. A data distribution method may be on-the-fly (with one copy of the index held), replication (all nodes have all of the index) or partitioning (data for index is split amongst nodes). Partitioning of inverted file data can be done in many ways but we consider only two: by term (Termld) and by document (Dodd). Termld partitioning is a type of partitioning which distributes unique word data to a single partition, while D odd partitioning distributes unique document data to a single partition. We consider the issue of improving the performance of standard IR algorithms on these data distribution methods by looking at sequential job service not concurrent job service, e.g. we consider the issue of sequential query service not concurrent query service. This methodology rules out some distribution methods for some tasks studied. We consider the following main tasks of IR: indexing, search, passage retrieval, inverted file update and query optimisation for routing /filtering. We produce a synthetic performance model for each of these tasks for the purposes of comparison. We have two subsidiary aims; one was to demonstrate portability of our implemented data structures and algorithms on different parallel machines. Secondly, we also study the possibility of increased retrieval effectiveness by examining a larger section of the search space for both passage retrieval and routing/filtering. We consider the implications of concurrency in updates on Inverted files. Our theoretical and empirical results show that in most cases the D odd partitioning method is the best data distribution method apart from routing/filtering where replication was found to be superior

City Research Online

Characterisation of Condition Monitoring Information for Diagnosis and Prognosis using Advanced Statistical Models

Author: Smith Ann
Publication venue
Publication date
Field of study

This research focuses on classification of categorical events using advanced statistical models. Primarily utilised to detect and identify individual component faults and deviations from normal healthy operation of reciprocating compressors. Effective monitoring of condition ensuring optimal efficiency and reliability whilst maintaining the highest possible safety standards and reducing costs and inconvenience due to impaired performance. Variability of operating conditions being revealed through examination of vibration signals recorded at strategic points of the process. Analysis of these signals informing expectations with respect to tolerable degrees of imperfection in specific components. Isolating inherent process variability from extraneous variability affords reliable means of ascertaining system health and functionality. Vibration envelope spectra offering highly responsive model parameters for diagnostic purposes. This thesis examines novel approaches to alleviating the computational burdens of large data analysis through investigation of the potential input variables. Three methods are investigated as follows: Method one employs multivariate variable clustering to ascertain homogeneity amongst input variables. A series of heterogeneous groups being formed from each of which explanatory input variables are selected. Data reduction techniques, method two, offer an alternative means of constructing predictive classifiers. A reduced number of reconstructed explanatory variables provide enhanced modelling capabilities ensuring algorithmic convergence. The final novel approach proposed combines both these methods alongside wavelet data compression techniques. Simplifying number of input parameters and individual signal volume whilst retaining crucial information for deterministic supremacy

University of Huddersfield Repository

Intelligent System Based Assessments of Academic Journals

Author: Su Pan
Publication venue
Publication date: 27/08/2015
Field of study

Aberystwyth Research Portal

A new framework for clustering

Author: Zhou Wu
Publication venue: 'University of Waterloo'
Publication date: 01/01/2010
Field of study

The difficulty of clustering and the variety of clustering methods suggest the need for a theoretical study of clustering. Using the idea of a standard statistical framework, we propose a new framework for clustering. For a well-defined clustering goal we assume that the data to be clustered come from an underlying distribution and we aim to find a high-density cluster tree. We regard this tree as a parameter of interest for the underlying distribution. However, it is not obvious how to determine a connected subset in a discrete distribution whose support is located in a Euclidean space. Building a cluster tree for such a distribution is an open problem and presents interesting conceptual and computational challenges. We solve this problem using graph-based approaches and further parameterize clustering using the high-density cluster tree and its extension. Motivated by the connection between clustering outcomes and graphs, we propose a graph family framework. This framework plays an important role in our clustering framework. A direct application of the graph family framework is a new cluster-tree distance measure. This distance measure can be written as an inner product or kernel. It makes our clustering framework able to perform statistical assessment of clustering via simulation. Other applications such as a method for integrating partitions into a cluster tree and methods for cluster tree averaging and bagging are also derived from the graph family framework

University of Waterloo's Institutional Repository

EFFICIENCY OF HIERARCHIC AGGLOMERATIVE CLUSTERING USING THE ICL DISTRIBUTED ARRAY PROCESSOR

Author: ANDERBERG M.R.
DAVIS E.W
EDIE M. RASMUSSEN
GOSTICK R.W
GOSTICK R.W
HWANG K.
MEILANDER W.C.
MURTAGH F.
PARKINSON D.
PETER WILLETT
QUINN M.J.
SALEH A.O.
SPATH H.
WHITE R.A.
WISHART D.
Publication venue: 'Emerald'
Publication date
Field of study

Crossref

Análise de clusters aplicada ao sucesso/insucesso em matemática

Author: Quintal Guida Maria da Conceição Caldeira
Publication venue: Universidade da Madeira
Publication date: 01/01/2006
Field of study

De acordo com [Mirkin B., 1996], classificação é um agrupamento existente ou ideal daqueles que se parecem (ou são semelhantes) e separação dos que são dissemelhantes. Sendo o objectivo/razão da classificação: (1) formar e adquirir conhecimento, (2) analizar a estrutura do fenómeno e (3) relacionar entre si diferentes aspectos do fenómeno em questão. No estudo do sucesso/insucesso da Matemática está de algum modo subjacente nos nossos objectivos “classificar” os alunos de acordo com os factores que se pretende que sejam determinantes nos resultados a Matemática. Por outro lado, voltamos a recorrer à classificação quando pretendemos estabelecer os tipos de factores determinantes nos resultados da Matemática. Os objectivos da Análise de Clusters são: (1) analisar a estrutura dos dados; (2) verificar/relacionar os aspectos dos dados entre si; (3) ajudar na concepção da classificação. Pensámos que esta técnica da análise exploratória de dados poderia representar uma ferramenta muito potente para o estudo do sucesso/insucesso da Matemática no Ensino Básico. O trabalho desenvolvido nesta dissertação prova que a Análise de Clusters responde adequadamente às questões que se podem formular quando se tenta enquadrar socialmente e pedagogicamente o sucesso/insucesso da Matemática.Rita Vasconcelo

Repositório Digital da Universidade da Madeira