3,310 research outputs found
A study on mutual information-based feature selection for text categorization
Feature selection plays an important role in text categorization. Automatic feature selection methods such as document frequency thresholding (DF), information gain (IG), mutual information (MI), and so on are commonly applied in text categorization. Many existing experiments show IG is one of the most effective methods, by contrast, MI has been demonstrated to have relatively poor performance. According to one existing MI method, the mutual information of a category c and a term t can be negative, which is in conflict with the definition of MI derived from information theory where it is always non-negative. We show that the form of MI used in TC is not derived correctly from information theory. There are two different MI based feature selection criteria which are referred to as MI in the TC literature. Actually, one of
them should correctly be termed "pointwise mutual information" (PMI). In this paper, we clarify the terminological confusion surrounding the notion of "mutual information" in TC, and detail an MI method derived correctly from information theory. Experiments with the Reuters-21578 collection and OHSUMED collection show that the corrected MI methodâs performance is similar to that of IG, and it is considerably better than PMI
Blip10000: a social video dataset containing SPUG content for tagging and retrieval
The increasing amount of digital multimedia content available is inspiring potential new types of user interaction with video data. Users want to easilyfind the content by searching and browsing. For this reason, techniques are needed that allow automatic categorisation, searching the content and linking to related information.
In this work, we present a dataset that contains comprehensive semi-professional user generated (SPUG) content, including audiovisual content, user-contributed metadata, automatic speech recognition transcripts, automatic shot boundary les, and social information for multiple `social levels'. We describe the principal characteristics of this dataset and present results that have been achieved on different tasks
Silicon spin diffusion transistor: materials, physics and device characteristics
The realisation that eaveryday electronics has ignored the spin of the carrier in favour of its charge is the foundation of the field of spintronics. Starting with simple two-terminal devices based on GMR and tunnel magnetoresistance, the technology has advanced to consider three-terminal devices that aim to combine spin sensitivity with a high current gain and a large current output. These devices require both efficient spin injection and semiconductor fabrication. In this paper, a discussion is presented of the design, operation and characteristics of the only spin transistor that has yielded a current gain greater than one in combination with reasonable output current
Local exchange-correlation vector potential with memory in Time-Dependent Density Functional Theory: the generalized hydrodynamics approach
Using Landau Fermi liquid theory we derive a nonlinear non-adiabatic
approximation for the exchange-correlation (xc) vector potential defined by the
xc stress tensor. The stress tensor is a local nonlinear functional of two
basic variables - the displacement vector and the second-rank tensor which
describes the evolution of momentum in a local frame moving with Eulerian
velocity. For irrotational motion and equilibrium initial state the dependence
on the tensor variable reduces to that on a metrics generated by a dynamical
deformation of the system.Comment: RevTex, 5 pages, no figures. Final version published in PR
On a two variable class of Bernstein-Szego measures
The one variable Bernstein-Szego theory for orthogonal polynomials on the
real line is extended to a class of two variable measures. The polynomials
orthonormal in the total degree ordering and the lexicographical ordering are
constructed and their recurrence coefficients discussed.Comment: minor change
Random Networks with given Rich-club Coefficient
In complex networks it is common to model a network or generate a surrogate
network based on the conservation of the network's degree distribution. We
provide an alternative network model based on the conservation of connection
density within a set of nodes. This density is measure by the rich-club
coefficient. We present a method to generate surrogates networks with a given
rich-club coefficient. We show that by choosing a suitable local linking term,
the generated random networks can reproduce the degree distribution and the
mixing pattern of real networks. The method is easy to implement and produces
good models of real networks.Comment: revised version, new figure
Comprehensive two-dimensional gas chromatography (GC x GC) measurements of volatile organic compounds in the atmosphere
Abstract. During the MINOS campaign in August 2001 comprehensive two-dimensional gas chromatography (GCĂGC) was applied to the in situ measurements of atmospheric volatile organic compounds (VOCs) at the Finokalia ground station, Crete. The measurement system employs a thermal desorption unit for on-line sampling and injection, and a GCĂGC separation system equipped with a flame ionization detector (FID) for detection. The system was optimized to resolve C7 â C14 organic components. Two-dimensional chromatograms from measurements of Finokalia air samples show several hundred wellseparated peaks. To facilitate peak identification, cartridge samples collected at Finokalia were analyzed using the same GCĂGC system coupled with a time-of-flight mass spectrometer (TOF-MS). The resulting mass spectra were deconvoluted and compared to spectra from a database for tentative peak identification. About 650 peaks have been identified in the two-dimensional plane, with significant signal/noise ratios (>100) and high spectra similarities (>800). By comparing observed retention indices with those found in the literature, 235 of the identifications have been confirmed. 150 of the confirmed compounds show up in the C7 â C14 range of the chromatogram from the in situ measurement. However, at least as many peaks remain unidentified. For quantification of the GCĂGC measurements, peak volumes of measured compounds have been integrated and externally calibrated using a standard gas mixture.
- âŠ