2,259 research outputs found

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    On thermodynamics second law in the modified Gauss Bonnet gravity

    Full text link
    The second law and the generalized second law of thermodynamics in cosmology in the framework of the modified Gauss-Bonnet theory of gravity are investigated. The conditions upon which these laws hold are derived and discussed.Comment: 9pages, typos corrected, references adde

    Oscillations of the F(R) dark energy in the accelerating universe

    Full text link
    Oscillations of the F(R)F(R) dark energy around the phantom divide line, ωDE=1\omega_{DE}=-1, both during the matter era and also in the de Sitter epoch are investigated. The analysis during the de Sitter epoch is revisited by expanding the modified equations of motion around the de Sitter solution. Then, during the matter epoch, the time dependence of the dark energy perturbations is discussed by using two different local expansions. For high values of the red shift, the matter epoch is a stable point of the theory, giving the possibility to expand the F(R)F(R)-functions in terms of the dark energy perturbations. In the late-time matter era, the realistic case is considered where dark energy tends to a constant. The results obtained are confirmed by precise numerical computation on a specific model of exponential gravity. A novel and very detailed discussion is provided on the critical points in the matter era and on the relation of the oscillations with possible singularities.Comment: 23 pages, 11 figures, version to appear in EPJ

    Collective emotions online and their influence on community life

    Get PDF
    E-communities, social groups interacting online, have recently become an object of interdisciplinary research. As with face-to-face meetings, Internet exchanges may not only include factual information but also emotional information - how participants feel about the subject discussed or other group members. Emotions are known to be important in affecting interaction partners in offline communication in many ways. Could emotions in Internet exchanges affect others and systematically influence quantitative and qualitative aspects of the trajectory of e-communities? The development of automatic sentiment analysis has made large scale emotion detection and analysis possible using text messages collected from the web. It is not clear if emotions in e-communities primarily derive from individual group members' personalities or if they result from intra-group interactions, and whether they influence group activities. We show the collective character of affective phenomena on a large scale as observed in 4 million posts downloaded from Blogs, Digg and BBC forums. To test whether the emotions of a community member may influence the emotions of others, posts were grouped into clusters of messages with similar emotional valences. The frequency of long clusters was much higher than it would be if emotions occurred at random. Distributions for cluster lengths can be explained by preferential processes because conditional probabilities for consecutive messages grow as a power law with cluster length. For BBC forum threads, average discussion lengths were higher for larger values of absolute average emotional valence in the first ten comments and the average amount of emotion in messages fell during discussions. Our results prove that collective emotional states can be created and modulated via Internet communication and that emotional expressiveness is the fuel that sustains some e-communities.Comment: 23 pages including Supporting Information, accepted to PLoS ON

    Maximum Satisfiability in Software Analysis: Applications and Techniques

    Get PDF
    A central challenge in software analysis concerns balancing different competing tradeoffs. To address this challenge, we propose an approach based on the Maximum Satisfiability (MaxSAT) problem, an optimization extension of the Boolean Satisfiability (SAT) problem. We demonstrate the approach on three diverse applications that advance the state-of-the-art in balancing tradeoffs in software analysis. Enabling these applications on real-world programs necessitates solving large MaxSAT instances comprising over 10301030 clauses in a sound and optimal manner. We propose a general framework that scales to such instances by iteratively expanding a subset of clauses while providing soundness and optimality guarantees. We also present new techniques to instantiate and optimize the framework

    FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Human genome contains millions of common single nucleotide polymorphisms (SNPs) and these SNPs play an important role in understanding the association between genetic variations and human diseases. Many SNPs show correlated genotypes, or linkage disequilibrium (LD), thus it is not necessary to genotype all SNPs for association study. Many algorithms have been developed to find a small subset of SNPs called tag SNPs that are sufficient to infer all the other SNPs. Algorithms based on the <it>r</it><sup>2 </sup>LD statistic have gained popularity because <it>r</it><sup>2 </sup>is directly related to statistical power to detect disease associations. Most of existing <it>r</it><sup>2 </sup>based algorithms use pairwise LD. Recent studies show that multi-marker LD can help further reduce the number of tag SNPs. However, existing tag SNP selection algorithms based on multi-marker LD are both time-consuming and memory-consuming. They cannot work on chromosomes containing more than 100 k SNPs using length-3 tagging rules.</p> <p>Results</p> <p>We propose an efficient algorithm called FastTagger to calculate multi-marker tagging rules and select tag SNPs based on multi-marker LD. FastTagger uses several techniques to reduce running time and memory consumption. Our experiment results show that FastTagger is several times faster than existing multi-marker based tag SNP selection algorithms, and it consumes much less memory at the same time. As a result, FastTagger can work on chromosomes containing more than 100 k SNPs using length-3 tagging rules.</p> <p>FastTagger also produces smaller sets of tag SNPs than existing multi-marker based algorithms, and the reduction ratio ranges from 3%-9% when length-3 tagging rules are used. The generated tagging rules can also be used for genotype imputation. We studied the prediction accuracy of individual rules, and the average accuracy is above 96% when <it>r</it><sup>2 </sup>≥ 0.9.</p> <p>Conclusions</p> <p>Generating multi-marker tagging rules is a computation intensive task, and it is the bottleneck of existing multi-marker based tag SNP selection methods. FastTagger is a practical and scalable algorithm to solve this problem.</p

    A hierarchical and modular approach to the discovery of robust associations in genome-wide association studies from pooled DNA samples

    Get PDF
    [Background] One of the challenges of the analysis of pooling-based genome wide association studies is to identify authentic associations among potentially thousands of false positive associations. [Results] We present a hierarchical and modular approach to the analysis of genome wide genotype data that incorporates quality control, linkage disequilibrium, physical distance and gene ontology to identify authentic associations among those found by statistical association tests. The method is developed for the allelic association analysis of pooled DNA samples, but it can be easily generalized to the analysis of individually genotyped samples. We evaluate the approach using data sets from diverse genome wide association studies including fetal hemoglobin levels in sickle cell anemia and a sample of centenarians and show that the approach is highly reproducible and allows for discovery at different levels of synthesis. [Conclusion] Results from the integration of Bayesian tests and other machine learning techniques with linkage disequilibrium data suggest that we do not need to use too stringent thresholds to reduce the number of false positive associations. This method yields increased power even with relatively small samples. In fact, our evaluation shows that the method can reach almost 70% sensitivity with samples of only 100 subjects.Supported by NHLBI grants R21 HL080463 (PS); R01 HL68970 (MHS); K-24, AG025727 (TP); K23 AG026754 (D.T.)

    The ASIMOV Prize for scientific publishing - HEP researchers trigger young people toward science

    Get PDF
    This work presents the ASIMOV Prize for scientific publishing, which was launched in Italy in 2016. The prize aims to bring the young generations closer to scientific culture, through the critical reading of popular science books. The books are selected by a committee that includes scientists, professors, Ph.D. and Ph.D. students, writers, journalists and friends of culture, and most importantly, over 800 school teachers. Students are actively involved in the prize, according to the best practices of public engagement: they read, review the books and vote for them, choosing the winner. The experience is quite successful: 12,000 students from 270 schools all over Italy participated in the last edition. The possibility of replicating this experience in other countries is indicated, as was done in Brazil in 2020 with more than encouraging results
    corecore