Search CORE

42,491 research outputs found

Identifying hidden contexts

Author: Zliobaite Indre
Publication venue: Springer LNAI
Publication date: 24/05/2011
Field of study

In this study we investigate how to identify hidden contexts from the data in classification tasks. Contexts are artifacts in the data, which do not predict the class label directly. For instance, in speech recognition task speakers might have different accents, which do not directly discriminate between the spoken words. Identifying hidden contexts is considered as data preprocessing task, which can help to build more accurate classifiers, tailored for particular contexts and give an insight into the data structure. We present three techniques to identify hidden contexts, which hide class label information from the input data and partition it using clustering techniques. We form a collection of performance measures to ensure that the resulting contexts are valid. We evaluate the performance of the proposed techniques on thirty real datasets. We present a case study illustrating how the identified contexts can be used to build specialized more accurate classifiers

Bournemouth University Research Online

Hierarchical growing cell structures: TreeGCS

Author: Austin J.
Hodge V.J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2001
Field of study

We propose a hierarchical clustering algorithm (TreeGCS) based upon the Growing Cell Structure (GCS) neural network of Fritzke. Our algorithm refines and builds upon the GCS base, overcoming an inconsistency in the original GCS algorithm, where the network topology is susceptible to the ordering of the input vectors. Our algorithm is unsupervised, flexible, and dynamic and we have imposed no additional parameters on the underlying GCS algorithm. Our ultimate aim is a hierarchical clustering neural network that is both consistent and stable and identifies the innate hierarchical structure present in vector-based data. We demonstrate improved stability of the GCS foundation and evaluate our algorithm against the hierarchy generated by an ascendant hierarchical clustering dendogram. Our approach emulates the hierarchical clustering of the dendogram. It demonstrates the importance of the parameter settings for GCS and how they affect the stability of the clustering

Crossref

White Rose Research Online

How Many Topics? Stability Analysis for Topic Models

Author: C. Lin
D. Greene
D.D. Lee
D.M. Blei
E. Levine
H.W. Kuhn
J.P. Brunet
L.N. Hutchins
M. Kendall
P. Jaccard
R. Fagin
S. Ben-David
T. Lange
W. Webber
Publication venue
Publication date: 01/01/2014
Field of study

Topic modeling refers to the task of discovering the underlying thematic structure in a text corpus, where the output is commonly presented as a report of the top terms appearing in each topic. Despite the diversity of topic modeling algorithms that have been proposed, a common challenge in successfully applying these techniques is the selection of an appropriate number of topics for a given corpus. Choosing too few topics will produce results that are overly broad, while choosing too many will result in the "over-clustering" of a corpus into many small, highly-similar topics. In this paper, we propose a term-centric stability analysis strategy to address this issue, the idea being that a model with an appropriate number of topics will be more robust to perturbations in the data. Using a topic modeling approach based on matrix factorization, evaluations performed on a range of corpora show that this strategy can successfully guide the model selection process.Comment: Improve readability of plots. Add minor clarification

arXiv.org e-Print Archive

Crossref

Research Repository UCD

Irish Universities

Stable Feature Selection for Biomarker Discovery

Author: He Zengyou
Yu Weichuan
Publication venue
Publication date: 01/01/2010
Field of study

Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development

arXiv.org e-Print Archive

CiteSeerX

Hong Kong University of Science and Technology Institutional Repository

Constraining the Power Spectrum using Clusters

Author: A. Klypin
Abell
Abell
Achilli
Adams
Athanassopoulos
Bahcall
Bahcall
Bahcall
Bailand
Bardeen
Batuski
Bennett
Bernardeau
Bernardeau
Biviano
Blumenthal
Blumenthal
Bonometto
Borgani
Borgani
Borgani
Borgani
Borgani
Borgani
Branch
C.C. Smith
Cen
Colafrancesco
Coles
Coles
Collins
Croft
Dalton
Davis
Dekel
Doroshkevich
Efstathiou
Eke
Eke
Evrard
Fang
Gaztañaga
Gaztañaga
Ghigna
Górski
Górski
Hockney
Hodges
Holtzman
Holtzman
J. Holtzman
J.R. Primack
Jing
Jing
Jing
K.M. Górski
Katgert
Kennicutt
Kerscher
Klypin
Klypin
Klypin
Klypin
Klypin
Kofman
L. Moscardini
Lacey
Lacey
Liddle
Liddle
Liddle
Lilje
Ling
Loveday
Lucchin
Lumsden
M. Plionis
Ma
Maddox
Mann
Matarrese
Melott
Mo
Monaco
Moscardini
Moscardini
Mould
Nichol
Nichol
Olivier
Park
Peacock
Peacock
Peebles
Peebles
Plionis
Plionis
Plionis
Plionis
Pogosyan
Postman
Postman
Press
Primack
Quintana
R. Stompor
Raychaudhury
Romer
Ross
S. Borgani
Sahni
Sandage
Sathyaprakash
Scaramella
Scaramella
Shandarin
Shapley
Smoot
Stompor
Stompor
Sugiyama
Sugiyama
Sutherland
Tini Brunozzi
Tonnen
Turner
Van Dalen
Viana
Walter
West
White
White
White
Wright
Zel'dovich
Publication venue: 'Elsevier BV'
Publication date: 13/11/1996
Field of study

(Shortened Abstract). We analyze a redshift sample of Abell/ACO clusters and compare them with numerical simulations based on the truncated Zel'dovich approximation (TZA), for a list of eleven dark matter (DM) models. For each model we run several realizations, on which we estimate cosmic variance effects. We analyse correlation statistics, the probability density function, and supercluster properties from percolation analysis. As a general result, we find that the distribution of galaxy clusters provides a constraint only on the shape of the power spectrum, but not on its amplitude: a shape parameter 0.18 < \Gamma < 0.25 and an effective spectral index at 20Mpc/h in the range [-1.1,-0.9] are required by the Abell/ACO data. In order to obtain complementary constraints on the spectrum amplitude, we consider the cluster abundance as estimated using the Press--Schechter approach, whose reliability is explicitly tested against N--body simulations. We conclude that, of the cosmological models considered here, the only viable models are either Cold+Hot DM ones with \Omega_\nu = [0.2-0.3], better if shared between two massive neutrinos, and flat low-density CDM models with \Omega_0 = [0.3-0.5].Comment: 37 pages, Latex file, 9 figures; New Astronomy, in pres

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Trieste

Crossref

CERN Document Server