474 research outputs found

    Fusion of tf.idf Weighted Bag of Visual Features for Image Classification

    No full text
    International audienceImage representation using bag of visual words approach is commonly used in image classification. Features are extracted from images and clustered into a visual vocabulary. Images can then be represented as a normalized histogram of visual words similarly to textual documents represented as a weighted vector of terms. As a result, text categorization techniques are applicable to image classification. In this paper, our contribution is twofold. First, we propose a suitable Term-Frequency and Inverse Document Frequency weighting scheme to characterize the importance of visual words. Second, we present a method to fuse different bag-of-words obtained with different vocabularies. We show that using our tf.idf normalization and the fusion leads to better classification rates than other normalization methods, other fusion schemes or other approaches evaluated on the SIMPLIcity collection

    Simulation based estimation of branching models for LTR retrotransposons

    Full text link
    Motivation: LTR retrotransposons are mobile elements that are able, like retroviruses, to copy and move inside eukaryotic genomes. In the present work, we propose a branching model for studying the propagation of LTR retrotransposons in these genomes. This model allows to take into account both positions and degradations of LTR retrotransposons copies. In our model, the duplication rate is also allowed to vary with the degradation level. Results: Various functions have been implemented in order to simulate their spread and visualization tools are proposed. Based on these simulation tools, we show that an accurate estimation of the parameters of this propagation model can be performed. We applied this method to the study of the spread of the transposable elements ROO, GYPSY, and DM412 on a chromosome of \textit{Drosophila melanogaster}. Availability: Our proposal has been implemented using Python software. Source code is freely available on the web at https://github.com/SergeMOULIN/retrotransposons-spread.Comment: 7 pages, 3 figures, 7 tables. Submit to "Bioiformatics" on March 1, 201

    Combinaison d'information visuelle et textuelle pour la recherche d'information multimédia

    No full text
    International audienceNous prĂ©sentons dans cet article un modĂšle de reprĂ©sentation de documents multimĂ©dia combinant des informations textuelles et des descripteurs visuels. Le texte et l'image composant un document sont chacun dĂ©crits par un vecteur de poids tf.idftf.idf en suivant une approche "sac-de-mots". Le modĂšle utilisĂ© permet d'effectuer des requĂȘtes multimĂ©dia pour la recherche d'information. Notre mĂ©thode est Ă©valuĂ©e sur la base imageCLEF'08 pour laquelle nous possĂ©dons la vĂ©ritĂ© de terrain. Plusieurs expĂ©rimentations ont Ă©t\Ă© menĂ©es avec diffĂ©rents descripteurs et plusieurs combinaisons de modalitĂ©s. L'analyse des rĂ©sultats montre qu'un modĂšle de document multimĂ©dia permet d'augmenter les performances d'un systĂšme de recherche basĂ© uniquement sur une seule modalitĂ©, qu'elle soit textuelle ou visuelle

    UJM at INEX 2009 XML Mining Track

    No full text
    8 pagesInternational audienceThis paper reports our experiments carried out for the INEX XML Mining track 2009, consisting in developing categorization methods for multi-labeled XML documents. We represent XML documents as vectors of indexed terms. The purpose of our experiments is twofold: firstly we aim to compare strategies that reduce the index size using an improved feature selection criteria CCD. Secondly, we compare a thresholding strategy (MCut) we proposed with common RCut, PCut strategies. The index size was reduced in such a way that the results were less good than expected. However, we obtained good improvements with the MCut thresholding strategy

    Impact de l'information visuelle pour la Recherche d'Images par le contenu et le contexte

    No full text
    15 pagesNational audienceLes documents multimĂ©dia composĂ©s de texte et d'images sont de plus en plus prĂ©sents grĂące Ă  Internet et Ă  l'augmentation des capacitĂ©s de stockage. Cet article prĂ©sente un modĂšle de reprĂ©sentation de documents multimĂ©dia qui combine l'information textuelle et l'information visuelle. En utilisant une approche par sac de mot, un document composĂ© de texte et d'image peut ĂȘtre dĂ©crit par des vecteurs correspondant Ă  chaque type d'information. Pour une requĂȘte multimĂ©dia donnĂ©e, une liste de documents pertinents est retournĂ©e en combinant linĂ©airement les rĂ©sultats obtenus sĂ©parĂ©ment sur chaque modalitĂ©. Le but de cet article est d'Ă©tudier l'impact, sur les rĂ©sultats, du poids attribuĂ© Ă  l'information visuelle par rapport Ă  l'information textuelle. Des expĂ©rimentations, rĂ©alisĂ©es sur la collection multimĂ©dia ImageCLEF extraite de l'encyclopĂ©die Wikipedia, montrent que les rĂ©sultats peuvent ĂȘtre amĂ©liorĂ©s aprĂšs une premiĂšre Ă©tape d'apprentissage de ce poids

    UJM at INEX 2008 XML mining Track

    No full text
    International audienceThis paper reports our experiments carried out for the INEX XML Mining track, consisting in developing categorization (or classification) and clustering methods for XML documents. We represent XML documents as vectors of index terms. For our first participation, the purpose of our experiments is twofold: Firstly, our overall aim is to set up a categorization text only approach that can be used as a baseline for further work which will take into account the structure of the XML documents. Secondly, our goal is to define two criteria based on terms distribution for reducing the size of the index. Results of our baseline are good and using our two criteria, we improve these results while we slightly reduce the index term. The results are slightly worse when we reduce sharply the index of terms

    Fisher Linear Discriminant Analysis for Text-Image Combination in Multimedia Information Retrieval

    No full text
    International audienceWith multimedia information retrieval, combining different modalities - text, image, audio or video provides additional information and generally improves the overall system performance. For this purpose, the linear combination method is presented as simple, flexible and effective. However, it requires to choose the weight assigned to each modality. This issue is still an open problem and is addressed in this paper. Our approach, based on Fisher Linear Discriminant Analysis, aims to learn these weights for multimedia documents composed of text and images. Text and images are both represented with the classical bag-of-words model. Our method was tested over the ImageCLEF datasets 2008 and 2009. Results demonstrate that our combination approach not only outperforms the use of the single textual modality but provides a nearly optimal learning of the weights with an efficient computation. Moreover, it is pointed out that the method allows to combine more than two modalities without increasing the complexity and thus the computing tim

    UJM at ImageCLEFwiki 2008

    No full text
    6 pagesThis paper reports our multimedia information retrieval experiments carried out for the ImageCLEF track (ImageCLEFwiki). The task is to answer to user information needs, i.e. queries which may be composed of several modalities (text, image, concept) with ranked lists of relevant documents. The purpose of our experiments is twofold: firstly, our overall aim is to develop a multimedia document model combining text and/or image modalities. Secondly, we aim to compare results of our model using a multimedia query with a text only model. Our multimedia document model is based on a vector of textual and visual terms. The textual terms correspond to words. The visual ones result from local colour descriptors which are automatically extracted and quantized by k-means, leading to an image vocabulary. They represent the colour property of an image region. To perform a query, we compute a similarity score between each document vector (textual + visual terms) and the query using the Okapi method based on the tf.idf approach. We have submitted 6 runs either automatic or manual, using textual, visual or both information. Thanks to these 6 runs, we aim to study several aspects of our model, as the choice of the visual words and local features, the way of combining textual and visual words for a query and the performance improvements obtained when adding visual information to a pure textual model. Concerning the choice of the visual words, results show us that they are significant in some cases where the visualness of the query is meaningful. The conclusion about the combination of textual and visual words is surprising. We obtain worth results when we add directly the text to the visual words. Finally, results also inform that visual information bring complementary relevant documents that were not found with the text query. These initial results are promising and encourage the development of our multimedia model

    Sorption of Aldrich Humic Acids onto Hematite: Insights into Fractionation Phenomena by Electrospray Ionization with Quadrupole Time-of-Flight Mass Spectrometry

    Get PDF
    International audienceSorption induced fractionation of purified Aldrich humic acid (PAHA) on hematite is studied through the modification of electrospray ionization (ESI)quadrupole time-of-flight (QToF) mass spectra of supernatants from retention experiments. The ESI mass spectra show an increase of the “mean molecular masses” of the molecules that constitutes humic aggregates. The low molecular weight fraction (LMWF; m/z ≀ 600 Da) is preferentially sorbed compared to two other fractions. The resolution provided by ESI-QToF mass spectrometer in the low-mass range provided evidence of further fractionation induced by sorption within the LMWF. Among the two latter fractions, the high molecular weight fraction (HMWF; m/z ≈ 1700 Da) seems to be more prone to sorption compared to the intermediate molecular weight fraction (IMWF; m/z ≈ 900 Da). The IMWF seems to be more hydrophilic as it should be richer in O, N and alkyl C from the proportion of even mass, and poorer in aromatic structures from mass defect analysis in ESI mass spectra
    • 

    corecore