35,968 research outputs found
Document Clustering with K-tree
This paper describes the approach taken to the XML Mining track at INEX 2008
by a group at the Queensland University of Technology. We introduce the K-tree
clustering algorithm in an Information Retrieval context by adapting it for
document clustering. Many large scale problems exist in document clustering.
K-tree scales well with large inputs due to its low complexity. It offers
promising results both in terms of efficiency and quality. Document
classification was completed using Support Vector Machines.Comment: 12 pages, INEX 200
TopSig: Topology Preserving Document Signatures
Performance comparisons between File Signatures and Inverted Files for text
retrieval have previously shown several significant shortcomings of file
signatures relative to inverted files. The inverted file approach underpins
most state-of-the-art search engine algorithms, such as Language and
Probabilistic models. It has been widely accepted that traditional file
signatures are inferior alternatives to inverted files. This paper describes
TopSig, a new approach to the construction of file signatures. Many advances in
semantic hashing and dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature file based,
search engines. This paper introduces a different signature file approach that
builds upon and extends these recent advances. We are able to demonstrate
significant improvements in the performance of signature file based indexing
and retrieval, performance that is comparable to that of state of the art
inverted file based systems, including Language models and BM25. These findings
suggest that file signatures offer a viable alternative to inverted files in
suitable settings and from the theoretical perspective it positions the file
signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
K-tree: Large Scale Document Clustering
We introduce K-tree in an information retrieval context. It is an efficient
approximation of the k-means clustering algorithm. Unlike k-means it forms a
hierarchy of clusters. It has been extended to address issues with sparse
representations. We compare performance and quality to CLUTO using document
collections. The K-tree has a low time complexity that is suitable for large
document collections. This tree structure allows for efficient disk based
implementations where space requirements exceed that of main memory.Comment: 2 pages, SIGIR 200
Massive galaxies with very young AGN
Gigahertz Peaked Spectrum (GPS) radio galaxies are generally thought to be
the young counterparts of classical extended radio sources and live in massive
ellipticals. GPS sources are vital for studying the early evolution of
radio-loud AGN, the trigger of their nuclear activity, and the importance of
feedback in galaxy evolution. We study the Parkes half-Jansky sample of GPS
radio galaxies of which now all host galaxies have been identified and 80% has
their redshifts determined (0.122 < z < 1.539). Analysis of the absolute
magnitudes of the GPS host galaxies show that at z > 1 they are on average a
magnitude fainter than classical 3C radio galaxies. This suggests that the AGN
in young radio galaxies have not yet much influenced the overall properties of
the host galaxy. However their restframe UV luminosities indicate that there is
a low level of excess as compared to passive evolution models.Comment: To appear in the proceedings of "Formation and Evolution of Galaxy
Bulges", IAUS 245; M. Bureau, E. Athanassoula & B. Barbuy, ed
Models of Dynamic Data for Emergency Response: A Comparative Study
The first hours after a disaster happens are very chaotic and difficult but perhaps the most important for successfully fighting the consequences, saving human lives and reducing damages in private and public properties. Despite some advances, complete inventory of the information needed during the emergency response remains challenging. In the last years several nationally and internationally funded projects have concentrated on inventory of emergency response processes, structures for storing dynamic information and standards and services for accessing needed data sets. A good inventory would clarify many aspects of the information exchange such as data sets, models, representations; a good structuring would facilitate the fast access to a desired piece of information, as well as the automation of analysis of the information. Consequently the information can be used better in the decision-making process.\ud
This paper presents our work on models for dynamic data for different disasters and incidents in Europe. The Dutch data models are derived from a thorough study on emergency response procedure in the Netherlands. Two more models developed within the project HUMBOLDT reflect several cross border disaster management scenarios in Europe. These models are compared with the Geospatial Data Model of the Department of Homeland Security in USA. The paper draws conclusions about the type of geographical information needed to perform emergency response operations and the possibility to have a generic model to be used world-wide
Random Indexing K-tree
Random Indexing (RI) K-tree is the combination of two algorithms for
clustering. Many large scale problems exist in document clustering. RI K-tree
scales well with large inputs due to its low complexity. It also exhibits
features that are useful for managing a changing collection. Furthermore, it
solves previous issues with sparse document vectors when using K-tree. The
algorithms and data structures are defined, explained and motivated. Specific
modifications to K-tree are made for use with RI. Experiments have been
executed to measure quality. The results indicate that RI K-tree improves
document cluster quality over the original K-tree algorithm.Comment: 8 pages, ADCS 2009; Hyperref and cleveref LaTeX packages conflicted.
Removed clevere
Paramagnetic resonance effect in viscoelastic materials Annual progress report, 1 Jan. - 31 Dec. 1968
Electron paramagnetic resonance investigation of fracture in viscoelastic material
A semi-empirical dynamic soil acidification model for use in spatially explicit integrated assessment models for Europe
A semi-empirical soil acidification model was developed for use in integrated assessment models on a European scale. The model simulates the time development of base saturation and aluminium concentration using an empirical relationship with pH. An accompanying data set was developed by overlaying European maps of soils, land use, climate and altitude followed by a procedure that aggragates the input data over soil-texture combinations in each EMEP 150 km x 150 km grid cell. Model tests show that themodel gives results comparable to the SMART model, although it overestimates initial base saturation in some areas with high acid input and simulates a faster recovery from acidification than SMART
The role of geographic mobility in reducing education-job mismatches in the Netherlands.
In this paper we investigate the relationship between geographic mobility and education-job mismatch in the Netherlands. We focus on the role of geographic mobility in reducing the probability of graduates working (i) jobs below their education level; (ii) jobs outside their study field; (iii) part-time jobs; (iv) flexible jobs; or (v) jobs paid below the wage expected at the beginning of the career. For this purpose we use data on secondary and higher vocational education graduates in the period 1996–2001. We show that graduates who are mobile have higher probability of finding jobs at the acquired education level than those who are not. Moreover, mobile graduates have higher probability of finding full-time or permanent jobs. This suggests that mobility is sought to prevent not only having to take a job below the acquired education level, but also other education-job mismatches; graduates are spatially flexible particularly to ensure full-time jobs.Geographic labour mobility;job mismatch;occupational choice;
- …