Search CORE

33,690 research outputs found

Document Clustering with K-tree

Author: De Vries Christopher M.
Geva Shlomo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.Comment: 12 pages, INEX 200

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

TopSig: Topology Preserving Document Signatures

Author: De Vries Christopher M.
Geva Shlomo
Publication venue
Publication date: 01/01/2011
Field of study

Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queensland University of Technology ePrints Archive

K-tree: Large Scale Document Clustering

Author: De Vries Christopher M.
Geva Shlomo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.Comment: 2 pages, SIGIR 200

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queensland University of Technology ePrints Archive

Massive galaxies with very young AGN

Author: de Vries
I. A. G. Snellen
M. D. Lehnert
M. N. Bremer
Nathan de Vries
R. T. Schilizzi
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 16/07/2007
Field of study

Gigahertz Peaked Spectrum (GPS) radio galaxies are generally thought to be the young counterparts of classical extended radio sources and live in massive ellipticals. GPS sources are vital for studying the early evolution of radio-loud AGN, the trigger of their nuclear activity, and the importance of feedback in galaxy evolution. We study the Parkes half-Jansky sample of GPS radio galaxies of which now all host galaxies have been identified and 80% has their redshifts determined (0.122 < z < 1.539). Analysis of the absolute magnitudes of the GPS host galaxies show that at z > 1 they are on average a magnitude fainter than classical 3C radio galaxies. This suggests that the AGN in young radio galaxies have not yet much influenced the overall properties of the host galaxy. However their restframe UV luminosities indicate that there is a low level of excess as compared to passive evolution models.Comment: To appear in the proceedings of "Formation and Evolution of Galaxy Bulges", IAUS 245; M. Bureau, E. Athanassoula & B. Barbuy, ed

arXiv.org e-Print Archive

Crossref

HAL-INSU

HAL-OBSPM

Random Indexing K-tree

Author: De Vine Lance
De Vries Christopher M.
Geva Shlomo
Publication venue
Publication date: 01/01/2009
Field of study

Random Indexing (RI) K-tree is the combination of two algorithms for clustering. Many large scale problems exist in document clustering. RI K-tree scales well with large inputs due to its low complexity. It also exhibits features that are useful for managing a changing collection. Furthermore, it solves previous issues with sparse document vectors when using K-tree. The algorithms and data structures are defined, explained and motivated. Specific modifications to K-tree are made for use with RI. Experiments have been executed to measure quality. The results indicate that RI K-tree improves document cluster quality over the original K-tree algorithm.Comment: 8 pages, ADCS 2009; Hyperref and cleveref LaTeX packages conflicted. Removed clevere

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

The role of geographic mobility in reducing education-job mismatches in the Netherlands.

Author: Corvers Frank
Hensen Maud M.
Vries M. Robert de
Publication venue
Publication date
Field of study

In this paper we investigate the relationship between geographic mobility and education-job mismatch in the Netherlands. We focus on the role of geographic mobility in reducing the probability of graduates working (i) jobs below their education level; (ii) jobs outside their study field; (iii) part-time jobs; (iv) flexible jobs; or (v) jobs paid below the wage expected at the beginning of the career. For this purpose we use data on secondary and higher vocational education graduates in the period 1996–2001. We show that graduates who are mobile have higher probability of finding jobs at the acquired education level than those who are not. Moreover, mobile graduates have higher probability of finding full-time or permanent jobs. This suggests that mobility is sought to prevent not only having to take a job below the acquired education level, but also other education-job mismatches; graduates are spatially flexible particularly to ensure full-time jobs.Geographic labour mobility;job mismatch;occupational choice;

Research Papers in Economics

On Multidominance and Linearization

Author: de Vries M.
Publication venue
Publication date: 01/01/2009
Field of study

This article centers around two questions: What is the relation between movement and structure sharing, and how can complex syntactic structures be linearized? It is shown that regular movement involves internal remerge, and sharing or ‘sideward movement’ external remerge. Without ad hoc restric-tions on the input, both options follow from Merge. They can be represented in terms of multidominance. Although more structural freedom ensues than standardly thought, the grammar is not completely unconstrained: Argu-ably, proliferation of roots is prohibited. Furthermore, it is explained why external remerge has somewhat different consequences than internal re-merge. For instance, apparent non-local behavior is attested. At the PF inter-face, the linearization of structures involving remerge is non-trivial. A cen-tral problem is identified, apart from the general issue why remerged mater-ial is only pronounced once: There are seemingly contradictory linearization demands for internal and external remerge. This can be resolved by taking into account the different structural configurations. It is argued that the line-arization is a PF procedure involving a recursive structure scanning algo-rithm that makes use of the inherent asymmetry between sister nodes im-posed by the operation of Merge

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Directory of Open Access Journals

Biolinguistics (E-Journal)

Dissertations of the University of Groningen

A semi-empirical dynamic soil acidification model for use in spatially explicit integrated assessment models for Europe

Author: Posch M.
Reinds G.J.
Vries W., de
Publication venue: Alterra
Publication date
Field of study

A semi-empirical soil acidification model was developed for use in integrated assessment models on a European scale. The model simulates the time development of base saturation and aluminium concentration using an empirical relationship with pH. An accompanying data set was developed by overlaying European maps of soils, land use, climate and altitude followed by a procedure that aggragates the input data over soil-texture combinations in each EMEP 150 km x 150 km grid cell. Model tests show that themodel gives results comparable to the SMART model, although it overestimates initial base saturation in some areas with high acid input and simulates a faster recovery from acidification than SMART

Wageningen University & Research Publications