3,162 research outputs found

    Reducing the loss of information through annealing text distortion

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Granados, A. ;Cebrian, M. ; Camacho, D. ; de Borja Rodriguez, F. "Reducing the Loss of Information through Annealing Text Distortion". IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7 pp. 1090 - 1102, July 2011Compression distances have been widely used in knowledge discovery and data mining. They are parameter-free, widely applicable, and very effective in several domains. However, little has been done to interpret their results or to explain their behavior. In this paper, we take a step toward understanding compression distances by performing an experimental evaluation of the impact of several kinds of information distortion on compression-based text clustering. We show how progressively removing words in such a way that the complexity of a document is slowly reduced helps the compression-based text clustering and improves its accuracy. In fact, we show how the nondistorted text clustering can be improved by means of annealing text distortion. The experimental results shown in this paper are consistent using different data sets, and different compression algorithms belonging to the most important compression families: Lempel-Ziv, Statistical and Block-Sorting.This work was supported by the Spanish Ministry of Education and Science under TIN2010-19872 and TIN2010-19607 projects

    Analysis and study on text representation to improve the accuracy of the Normalized Compression Distance

    Full text link
    The huge amount of information stored in text form makes methods that deal with texts really interesting. This thesis focuses on dealing with texts using compression distances. More specifically, the thesis takes a small step towards understanding both the nature of texts and the nature of compression distances. Broadly speaking, the way in which this is done is exploring the effects that several distortion techniques have on one of the most successful distances in the family of compression distances, the Normalized Compression Distance -NCD-.Comment: PhD Thesis; 202 page

    Evaluating the impact of information distortion on normalized compression distance

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-540-87448-5_8Proceedings of Second International Castle Meeting, ICMCTA 2008, Castillo de la Mota, Medina del Campo, Spain, September 15-19, 2008.In this paper we apply different techniques of information distortion on a set of classical books written in English. We study the impact that these distortions have upon the Kolmogorov complexity and the clustering by compression technique (the latter based on Normalized Compression Distance, NCD). We show how to decrease the complexity of the considered books introducing several modifications in them. We measure how the information contained in each book is maintained using a clustering error measure. We find experimentally that the best way to keep the clustering error is by means of modifications in the most frequent words. We explain the details of these information distortions and we compare with other kinds of modifications like random word distortions and unfrequent word distortions. Finally, some phenomenological explanations from the different empirical results that have been carried out are presented.This work was supported by TIN 2004-04363-CO03-03, TIN 2007-65989, CAM S-SEM-0255-2006, TIN2007-64718 and TSI 2005-08255-C07-06. We would also like to thank Franscico Sánchez for his useful comments on this draft

    Contextual Information Retrieval based on Algorithmic Information Theory and Statistical Outlier Detection

    Full text link
    The main contribution of this paper is to design an Information Retrieval (IR) technique based on Algorithmic Information Theory (using the Normalized Compression Distance- NCD), statistical techniques (outliers), and novel organization of data base structure. The paper shows how they can be integrated to retrieve information from generic databases using long (text-based) queries. Two important problems are analyzed in the paper. On the one hand, how to detect "false positives" when the distance among the documents is very low and there is actual similarity. On the other hand, we propose a way to structure a document database which similarities distance estimation depends on the length of the selected text. Finally, the experimental evaluations that have been carried out to study previous problems are shown.Comment: Submitted to 2008 IEEE Information Theory Workshop (6 pages, 6 figures

    Nonlinear power spectrum in the presence of massive neutrinos: perturbation theory approach, galaxy bias and parameter forecasts

    Get PDF
    Future or ongoing galaxy redshift surveys can put stringent constraints on neutrinos masses via the high-precision measurements of galaxy power spectrum, when combined with cosmic microwave background (CMB) information. In this paper we develop a method to model galaxy power spectrum in the weakly nonlinear regime for a mixed dark matter (CDM plus finite-mass neutrinos) model, based on perturbation theory (PT) whose validity is well tested by simulations for a CDM model. In doing this we carefully study various aspects of the nonlinear clustering and then arrive at a useful approximation allowing for a quick computation of the nonlinear power spectrum as in the CDM case. The nonlinear galaxy bias is also included in a self-consistent manner within the PT framework. Thus the use of our PT model can give a more robust understanding of the measured galaxy power spectrum as well as allow for higher sensitivity to neutrino masses due to the gain of Fourier modes beyond the linear regime. Based on the Fisher matrix formalism, we find that BOSS or Stage-III type survey, when combined with Planck CMB information, gives a precision of total neutrino mass constraint, sigma(m_nu,tot) 0.1eV, while Stage-IV type survey may achieve sigma(m_nu,tot) 0.05eV, i.e. more than a 1-sigma detection of neutrino masses. We also discuss possible systematic errors on dark energy parameters caused by the neutrino mass uncertainty. The significant correlation between neutrino mass and dark energy parameters is found, if the information on power spectrum amplitude is included. More importantly, for Stage-IV type survey, a best-fit dark energy model may be biased and falsely away from the underlying true model by more than the 1-sigma statistical errors, if neutrino mass is ignored in the model fitting.Comment: 33 pages, 11 figure

    深層学習に基づく画像圧縮と品質評価

    Get PDF
    早大学位記番号:新8427早稲田大

    Towards video streaming in IoT environments: vehicular communication perspective

    Get PDF
    Multimedia oriented Internet of Things (IoT) enables pervasive and real-time communication of video, audio and image data among devices in an immediate surroundings. Today's vehicles have the capability of supporting real time multimedia acquisition. Vehicles with high illuminating infrared cameras and customized sensors can communicate with other on-road devices using dedicated short-range communication (DSRC) and 5G enabled communication technologies. Real time incidence of both urban and highway vehicular traffic environment can be captured and transmitted using vehicle-to-vehicle and vehicle-to-infrastructure communication modes. Video streaming in vehicular IoT (VSV-IoT) environments is in growing stage with several challenges that need to be addressed ranging from limited resources in IoT devices, intermittent connection in vehicular networks, heterogeneous devices, dynamism and scalability in video encoding, bandwidth underutilization in video delivery, and attaining application-precise quality of service in video streaming. In this context, this paper presents a comprehensive review on video streaming in IoT environments focusing on vehicular communication perspective. Specifically, significance of video streaming in vehicular IoT environments is highlighted focusing on integration of vehicular communication with 5G enabled IoT technologies, and smart city oriented application areas for VSV-IoT. A taxonomy is presented for the classification of related literature on video streaming in vehicular network environments. Following the taxonomy, critical review of literature is performed focusing on major functional model, strengths and weaknesses. Metrics for video streaming in vehicular IoT environments are derived and comparatively analyzed in terms of their usage and evaluation capabilities. Open research challenges in VSV-IoT are identified as future directions of research in the area. The survey would benefit both IoT and vehicle industry practitioners and researchers, in terms of augmenting understanding of vehicular video streaming and its IoT related trends and issues
    corecore