Search CORE

11,800 research outputs found

Random Indexing K-tree

Author: De Vine Lance
De Vries Christopher M.
Geva Shlomo
Publication venue
Publication date: 01/01/2009
Field of study

Random Indexing (RI) K-tree is the combination of two algorithms for clustering. Many large scale problems exist in document clustering. RI K-tree scales well with large inputs due to its low complexity. It also exhibits features that are useful for managing a changing collection. Furthermore, it solves previous issues with sparse document vectors when using K-tree. The algorithms and data structures are defined, explained and motivated. Specific modifications to K-tree are made for use with RI. Experiments have been executed to measure quality. The results indicate that RI K-tree improves document cluster quality over the original K-tree algorithm.Comment: 8 pages, ADCS 2009; Hyperref and cleveref LaTeX packages conflicted. Removed clevere

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

Author: Cunningham Padraig
Delany Sarah Jane
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/04/2020
Field of study

Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

arXiv.org e-Print Archive

Arrow@TUDublin

Seeding hESCs to achieve optimal colony clonality

Author: Bojic S.
Lako M.
Laude A.
Neganova I.
Orozco-Fuentes Sirio
Parker N. G.
Shukurov A.
Wadkin L. E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2019
Field of study

Human embryonic stem cells (hESCs) and induced pluripotent stem cells (iPSCs) have promising clinical applications which often rely on clonally-homogeneous cell populations. To achieve this, it is important to ensure that each colony originates from a single founding cell and to avoid subsequent merging of colonies during their growth. Clonal homogeneity can be obtained with low seeding densities; however, this leads to low yield and viability. It is therefore important to quantitatively assess how seeding density affects clonality loss so that experimental protocols can be optimised to meet the required standards. Here we develop a quantitative framework for modelling the growth of hESC colonies from a given seeding density based on stochastic exponential growth. This allows us to identify the timescales for colony merges and over which colony size no longer predicts the number of founding cells. We demonstrate the success of our model by applying it to our own experiments of hESC colony growth; while this is based on a particular experimental set-up, the model can be applied more generally to other cell lines and experimental conditions to predict these important timescales

Northumbria Research Link

Anisotropic diffusion limited aggregation in three dimensions : universality and nonuniversality

Author: Ellák Somfai
M. E. Glicksman
Nicholas R. Goold
Robin C. Ball
Publication venue: 'American Physical Society (APS)'
Publication date: 28/01/2005
Field of study

We explore the macroscopic consequences of lattice anisotropy for diffusion limited aggregation (DLA) in three dimensions. Simple cubic and bcc lattice growths are shown to approach universal asymptotic states in a coherent fashion, and the approach is accelerated by the use of noise reduction. These states are strikingly anisotropic dendrites with a rich hierarchy of structure. For growth on an fcc lattice, our data suggest at least two stable fixed points of anisotropy, one matching the bcc case. Hexagonal growths, favoring six planar and two polar directions, appear to approach a line of asymptotic states with continuously tunable polar anisotropy. The more planar of these growths visually resembles real snowflake morphologies. Our simulations use a new and dimension-independent implementation of the DLA model. The algorithm maintains a hierarchy of sphere coverings of the growth, supporting efficient random walks onto the growth by spherical moves. Anisotropy was introduced by restricting growth to certain preferred directions

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Visualising the structure of document search results: A comparison of graph theoretic approaches

Author: Busing F.
Chen C.
Coxon A.
Leuski A.
Salton G.
Skupin A.
Timothy Cribbin
Van Rijsbergen C.J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/04/2009
Field of study

This is the post-print of the article - Copyright @ 2010 Sage PublicationsPrevious work has shown that distance-similarity visualisation or ‘spatialisation’ can provide a potentially useful context in which to browse the results of a query search, enabling the user to adopt a simple local foraging or ‘cluster growing’ strategy to navigate through the retrieved document set. However, faithfully mapping feature-space models to visual space can be problematic owing to their inherent high dimensionality and non-linearity. Conventional linear approaches to dimension reduction tend to fail at this kind of task, sacrificing local structural in order to preserve a globally optimal mapping. In this paper the clustering performance of a recently proposed algorithm called isometric feature mapping (Isomap), which deals with non-linearity by transforming dissimilarities into geodesic distances, is compared to that of non-metric multidimensional scaling (MDS). Various graph pruning methods, for geodesic distance estimation, are also compared. Results show that Isomap is significantly better at preserving local structural detail than MDS, suggesting it is better suited to cluster growing and other semantic navigation tasks. Moreover, it is shown that applying a minimum-cost graph pruning criterion can provide a parameter-free alternative to the traditional K-neighbour method, resulting in spatial clustering that is equivalent to or better than that achieved using an optimal-K criterion

Crossref

Brunel University Research Archive

Using distributional similarity to organise biomedical terminology

Author: Dowdall James
Keller Bill
Schneider Gerold
Weeds Julie
Weir David
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2005
Field of study

We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy

ZORA

Sussex Research Online