34,352 research outputs found

    Document Clustering with K-tree

    Get PDF
    This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.Comment: 12 pages, INEX 200

    TopSig: Topology Preserving Document Signatures

    Get PDF
    Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201

    K-tree: Large Scale Document Clustering

    Get PDF
    We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.Comment: 2 pages, SIGIR 200

    Massive galaxies with very young AGN

    Full text link
    Gigahertz Peaked Spectrum (GPS) radio galaxies are generally thought to be the young counterparts of classical extended radio sources and live in massive ellipticals. GPS sources are vital for studying the early evolution of radio-loud AGN, the trigger of their nuclear activity, and the importance of feedback in galaxy evolution. We study the Parkes half-Jansky sample of GPS radio galaxies of which now all host galaxies have been identified and 80% has their redshifts determined (0.122 < z < 1.539). Analysis of the absolute magnitudes of the GPS host galaxies show that at z > 1 they are on average a magnitude fainter than classical 3C radio galaxies. This suggests that the AGN in young radio galaxies have not yet much influenced the overall properties of the host galaxy. However their restframe UV luminosities indicate that there is a low level of excess as compared to passive evolution models.Comment: To appear in the proceedings of "Formation and Evolution of Galaxy Bulges", IAUS 245; M. Bureau, E. Athanassoula & B. Barbuy, ed

    Random Indexing K-tree

    Get PDF
    Random Indexing (RI) K-tree is the combination of two algorithms for clustering. Many large scale problems exist in document clustering. RI K-tree scales well with large inputs due to its low complexity. It also exhibits features that are useful for managing a changing collection. Furthermore, it solves previous issues with sparse document vectors when using K-tree. The algorithms and data structures are defined, explained and motivated. Specific modifications to K-tree are made for use with RI. Experiments have been executed to measure quality. The results indicate that RI K-tree improves document cluster quality over the original K-tree algorithm.Comment: 8 pages, ADCS 2009; Hyperref and cleveref LaTeX packages conflicted. Removed clevere

    Models of Dynamic Data for Emergency Response: A Comparative Study

    Get PDF
    The first hours after a disaster happens are very chaotic and difficult but perhaps the most important for successfully fighting the consequences, saving human lives and reducing damages in private and public properties. Despite some advances, complete inventory of the information needed during the emergency response remains challenging. In the last years several nationally and internationally funded projects have concentrated on inventory of emergency response processes, structures for storing dynamic information and standards and services for accessing needed data sets. A good inventory would clarify many aspects of the information exchange such as data sets, models, representations; a good structuring would facilitate the fast access to a desired piece of information, as well as the automation of analysis of the information. Consequently the information can be used better in the decision-making process.\ud This paper presents our work on models for dynamic data for different disasters and incidents in Europe. The Dutch data models are derived from a thorough study on emergency response procedure in the Netherlands. Two more models developed within the project HUMBOLDT reflect several cross border disaster management scenarios in Europe. These models are compared with the Geospatial Data Model of the Department of Homeland Security in USA. The paper draws conclusions about the type of geographical information needed to perform emergency response operations and the possibility to have a generic model to be used world-wide

    A semi-empirical dynamic soil acidification model for use in spatially explicit integrated assessment models for Europe

    Get PDF
    A semi-empirical soil acidification model was developed for use in integrated assessment models on a European scale. The model simulates the time development of base saturation and aluminium concentration using an empirical relationship with pH. An accompanying data set was developed by overlaying European maps of soils, land use, climate and altitude followed by a procedure that aggragates the input data over soil-texture combinations in each EMEP 150 km x 150 km grid cell. Model tests show that themodel gives results comparable to the SMART model, although it overestimates initial base saturation in some areas with high acid input and simulates a faster recovery from acidification than SMART

    Paramagnetic resonance effect in viscoelastic materials Annual progress report, 1 Jan. - 31 Dec. 1968

    Get PDF
    Electron paramagnetic resonance investigation of fracture in viscoelastic material

    The role of geographic mobility in reducing education-job mismatches in the Netherlands.

    Get PDF
    In this paper we investigate the relationship between geographic mobility and education-job mismatch in the Netherlands. We focus on the role of geographic mobility in reducing the probability of graduates working (i) jobs below their education level; (ii) jobs outside their study field; (iii) part-time jobs; (iv) flexible jobs; or (v) jobs paid below the wage expected at the beginning of the career. For this purpose we use data on secondary and higher vocational education graduates in the period 1996–2001. We show that graduates who are mobile have higher probability of finding jobs at the acquired education level than those who are not. Moreover, mobile graduates have higher probability of finding full-time or permanent jobs. This suggests that mobility is sought to prevent not only having to take a job below the acquired education level, but also other education-job mismatches; graduates are spatially flexible particularly to ensure full-time jobs.Geographic labour mobility;job mismatch;occupational choice;
    corecore