3,310 research outputs found

    A study on mutual information-based feature selection for text categorization

    Get PDF
    Feature selection plays an important role in text categorization. Automatic feature selection methods such as document frequency thresholding (DF), information gain (IG), mutual information (MI), and so on are commonly applied in text categorization. Many existing experiments show IG is one of the most effective methods, by contrast, MI has been demonstrated to have relatively poor performance. According to one existing MI method, the mutual information of a category c and a term t can be negative, which is in conflict with the definition of MI derived from information theory where it is always non-negative. We show that the form of MI used in TC is not derived correctly from information theory. There are two different MI based feature selection criteria which are referred to as MI in the TC literature. Actually, one of them should correctly be termed "pointwise mutual information" (PMI). In this paper, we clarify the terminological confusion surrounding the notion of "mutual information" in TC, and detail an MI method derived correctly from information theory. Experiments with the Reuters-21578 collection and OHSUMED collection show that the corrected MI method’s performance is similar to that of IG, and it is considerably better than PMI

    Blip10000: a social video dataset containing SPUG content for tagging and retrieval

    Get PDF
    The increasing amount of digital multimedia content available is inspiring potential new types of user interaction with video data. Users want to easilyfind the content by searching and browsing. For this reason, techniques are needed that allow automatic categorisation, searching the content and linking to related information. In this work, we present a dataset that contains comprehensive semi-professional user generated (SPUG) content, including audiovisual content, user-contributed metadata, automatic speech recognition transcripts, automatic shot boundary les, and social information for multiple `social levels'. We describe the principal characteristics of this dataset and present results that have been achieved on different tasks

    Silicon spin diffusion transistor: materials, physics and device characteristics

    No full text
    The realisation that eaveryday electronics has ignored the spin of the carrier in favour of its charge is the foundation of the field of spintronics. Starting with simple two-terminal devices based on GMR and tunnel magnetoresistance, the technology has advanced to consider three-terminal devices that aim to combine spin sensitivity with a high current gain and a large current output. These devices require both efficient spin injection and semiconductor fabrication. In this paper, a discussion is presented of the design, operation and characteristics of the only spin transistor that has yielded a current gain greater than one in combination with reasonable output current

    Local exchange-correlation vector potential with memory in Time-Dependent Density Functional Theory: the generalized hydrodynamics approach

    Full text link
    Using Landau Fermi liquid theory we derive a nonlinear non-adiabatic approximation for the exchange-correlation (xc) vector potential defined by the xc stress tensor. The stress tensor is a local nonlinear functional of two basic variables - the displacement vector and the second-rank tensor which describes the evolution of momentum in a local frame moving with Eulerian velocity. For irrotational motion and equilibrium initial state the dependence on the tensor variable reduces to that on a metrics generated by a dynamical deformation of the system.Comment: RevTex, 5 pages, no figures. Final version published in PR

    On a two variable class of Bernstein-Szego measures

    Full text link
    The one variable Bernstein-Szego theory for orthogonal polynomials on the real line is extended to a class of two variable measures. The polynomials orthonormal in the total degree ordering and the lexicographical ordering are constructed and their recurrence coefficients discussed.Comment: minor change

    Random Networks with given Rich-club Coefficient

    Get PDF
    In complex networks it is common to model a network or generate a surrogate network based on the conservation of the network's degree distribution. We provide an alternative network model based on the conservation of connection density within a set of nodes. This density is measure by the rich-club coefficient. We present a method to generate surrogates networks with a given rich-club coefficient. We show that by choosing a suitable local linking term, the generated random networks can reproduce the degree distribution and the mixing pattern of real networks. The method is easy to implement and produces good models of real networks.Comment: revised version, new figure

    Comprehensive two-dimensional gas chromatography (GC x GC) measurements of volatile organic compounds in the atmosphere

    Get PDF
    Abstract. During the MINOS campaign in August 2001 comprehensive two-dimensional gas chromatography (GC×GC) was applied to the in situ measurements of atmospheric volatile organic compounds (VOCs) at the Finokalia ground station, Crete. The measurement system employs a thermal desorption unit for on-line sampling and injection, and a GC×GC separation system equipped with a flame ionization detector (FID) for detection. The system was optimized to resolve C7 − C14 organic components. Two-dimensional chromatograms from measurements of Finokalia air samples show several hundred wellseparated peaks. To facilitate peak identification, cartridge samples collected at Finokalia were analyzed using the same GC×GC system coupled with a time-of-flight mass spectrometer (TOF-MS). The resulting mass spectra were deconvoluted and compared to spectra from a database for tentative peak identification. About 650 peaks have been identified in the two-dimensional plane, with significant signal/noise ratios (>100) and high spectra similarities (>800). By comparing observed retention indices with those found in the literature, 235 of the identifications have been confirmed. 150 of the confirmed compounds show up in the C7 − C14 range of the chromatogram from the in situ measurement. However, at least as many peaks remain unidentified. For quantification of the GC×GC measurements, peak volumes of measured compounds have been integrated and externally calibrated using a standard gas mixture.
    • 

    corecore