119 research outputs found

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    Time separation as a hidden variable to the Copenhagen school of quantum mechanics

    Full text link
    The Bohr radius is a space-like separation between the proton and electron in the hydrogen atom. According to the Copenhagen school of quantum mechanics, the proton is sitting in the absolute Lorentz frame. If this hydrogen atom is observed from a different Lorentz frame, there is a time-like separation linearly mixed with the Bohr radius. Indeed, the time-separation is one of the essential variables in high-energy hadronic physics where the hadron is a bound state of the quarks, while thoroughly hidden in the present form of quantum mechanics. It will be concluded that this variable is hidden in Feynman's rest of the universe. It is noted first that Feynman's Lorentz-invariant differential equation for the bound-state quarks has a set of solutions which describe all essential features of hadronic physics. These solutions explicitly depend on the time separation between the quarks. This set also forms the mathematical basis for two-mode squeezed states in quantum optics, where both photons are observable, but one of them can be treated a variable hidden in the rest of the universe. The physics of this two-mode state can then be translated into the time-separation variable in the quark model. As in the case of the un-observed photon, the hidden time-separation variable manifests itself as an increase in entropy and uncertainty.Comment: LaTex 10 pages with 5 figure. Invited paper presented at the Conference on Advances in Quantum Theory (Vaxjo, Sweden, June 2010), to be published in one of the AIP Conference Proceedings serie

    MSV3d: database of human MisSense variants mapped to 3D protein structure

    Get PDF
    The elucidation of the complex relationships linking genotypic and phenotypic variations to protein structure is a major challenge in the post-genomic era. We present MSV3d (Database of human MisSense Variants mapped to 3D protein structure), a new database that contains detailed annotation of missense variants of all human proteins (20 199 proteins). The multi-level characterization includes details of the physico-chemical changes induced by amino acid modification, as well as information related to the conservation of the mutated residue and its position relative to functional features in the available or predicted 3D model. Major releases of the database are automatically generated and updated regularly in line with the dbSNP (database of Single Nucleotide Polymorphism) and SwissVar releases, by exploiting the extensive Décrypthon computational grid resources. The database (http://decrypthon.igbmc.fr/msv3d) is easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in XML or flat file formats

    The asparagine-transamidosome from Helicobacter pylori: a dual-kinetic mode in non-discriminating aspartyl-tRNA synthetase safeguards the genetic code

    Get PDF
    Helicobacter pylori catalyzes Asn-tRNAAsn formation by use of the indirect pathway that involves charging of Asp onto tRNAAsn by a non-discriminating aspartyl-tRNA synthetase (ND-AspRS), followed by conversion of the mischarged Asp into Asn by the GatCAB amidotransferase. We show that the partners of asparaginylation assemble into a dynamic Asn-transamidosome, which uses a different strategy than the Gln-transamidosome to prevent the release of the mischarged aminoacyl-tRNA intermediate. The complex is described by gel-filtration, dynamic light scattering and kinetic measurements. Two strategies for asparaginylation are shown: (i) tRNAAsn binds GatCAB first, allowing aminoacylation and immediate transamidation once ND-AspRS joins the complex; (ii) tRNAAsn is bound by ND-AspRS which releases the Asp-tRNAAsn product much slower than the cognate Asp-tRNAAsp; this kinetic peculiarity allows GatCAB to bind and transamidate Asp-tRNAAsn before its release by the ND-AspRS. These results are discussed in the context of the interrelation between the Asn and Gln-transamidosomes which use the same GatCAB in H. pylori, and shed light on a kinetic mechanism that ensures faithful codon reassignment for Asn

    From sleep spindles of natural sleep to spike and wave discharges of typical absence seizures: is the hypothesis still valid?

    Get PDF
    The temporal coincidence of sleep spindles and spike-and-wave discharges (SWDs) in patients with idiopathic generalized epilepsies, together with the transformation of spindles into SWDs following intramuscular injection of the weak GABAA receptor (GABAAR) antagonist, penicillin, in an experimental model, brought about the view that SWDs may represent ‘perverted’ sleep spindles. Over the last 20 years, this hypothesis has received considerable support, in particular by in vitro studies of thalamic oscillations following pharmacological/genetic manipulations of GABAARs. However, from a critical appraisal of the evidence in absence epilepsy patients and well-established models of absence epilepsy it emerges that SWDs can occur as frequently during wakefulness as during sleep, with their preferential occurrence in either one of these behavioural states often being patient dependent. Moreover, whereas the EEG expression of both SWDs and sleep spindles requires the integrity of the entire cortico-thalamo-cortical network, SWDs initiates in cortex while sleep spindles in thalamus. Furthermore, the hypothesis of a reduction in GABAAR function across the entire cortico-thalamo-cortical network as the basis for the transformation of sleep spindles into SWDs is no longer tenable. In fact, while a decreased GABAAR function may be present in some cortical layers and in the reticular thalamic nucleus, both phasic and tonic GABAAR inhibitions of thalamo-cortical neurons are either unchanged or increased in this epileptic phenotype. In summary, these differences between SWDs and sleep spindles question the view that the EEG hallmark of absence seizures results from a transformation of this EEG oscillation of natural sleep

    The genetic diversity, phylogeography and morphology of Elphidiidae (Foraminifera) in the Northeast Atlantic

    Get PDF
    Genetic characterisation (SSU rRNA genotyping) and Scanning ElectronMicroscope (SEM) imaging of individualtests were used in tandem to determine the modern species richness of the foraminiferal family Elphidiidae(Elphidium, Haynesina and related genera) across the Northeast Atlantic shelf biomes. Specimens were collectedat 25 locations fromthe High Arctic to Iberia, and a total of 1013 individual specimenswere successfully SEMimagedand genotyped. Phylogenetic analyses were carried out in combination with 28 other elphidiid sequencesfrom GenBank and seventeen distinct elphidiid genetic types were identified within the sample set, sevenbeing sequenced for the first time. Genetic types cluster into sevenmain cladeswhich largely represent their generalmorphologicalcharacter. Differences between genetic types at the genetic, morphological and biogeographiclevels are indicative of species level distinction. Their biogeographic distributions, in combination with elphidiidSSU sequences from GenBank and high resolution images from the literature show that each of them exhibitsspecies-specific rather than clade-specific biogeographies. Due to taxonomic uncertainty and divergent taxonomicconcepts between schools, we believe that morphospecies names should not be placed onto molecularphylogenies unless both the morphology and genetic type have been linked to the formally named holotype,or equivalent. Based on strictmorphological criteria,we advocate using only a three-stage approach to taxonomyfor practical application in micropalaeontological studies. It comprises genotyping, the production of a formalmorphological description of the SEM images associated with the genetic type and then the allocation of themost appropriate taxonomic name by comparison with the formal type description. Using this approach, wewere able to apply taxonomic names to fifteen genetic types. One of the remaining two may be potentially cryptic,and one is undescribed in the literature. In general, the phylogeographic distribution is in agreement with ourknowledge of the ecology and biogeographical distribution of the corresponding morphospecies, highlighting thegenerally robust taxonomic framework of the Elphidiidae in time and space
    corecore