2,259 research outputs found
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
On thermodynamics second law in the modified Gauss Bonnet gravity
The second law and the generalized second law of thermodynamics in cosmology
in the framework of the modified Gauss-Bonnet theory of gravity are
investigated. The conditions upon which these laws hold are derived and
discussed.Comment: 9pages, typos corrected, references adde
Oscillations of the F(R) dark energy in the accelerating universe
Oscillations of the dark energy around the phantom divide line,
, both during the matter era and also in the de Sitter epoch
are investigated. The analysis during the de Sitter epoch is revisited by
expanding the modified equations of motion around the de Sitter solution. Then,
during the matter epoch, the time dependence of the dark energy perturbations
is discussed by using two different local expansions. For high values of the
red shift, the matter epoch is a stable point of the theory, giving the
possibility to expand the -functions in terms of the dark energy
perturbations. In the late-time matter era, the realistic case is considered
where dark energy tends to a constant. The results obtained are confirmed by
precise numerical computation on a specific model of exponential gravity. A
novel and very detailed discussion is provided on the critical points in the
matter era and on the relation of the oscillations with possible singularities.Comment: 23 pages, 11 figures, version to appear in EPJ
Collective emotions online and their influence on community life
E-communities, social groups interacting online, have recently become an
object of interdisciplinary research. As with face-to-face meetings, Internet
exchanges may not only include factual information but also emotional
information - how participants feel about the subject discussed or other group
members. Emotions are known to be important in affecting interaction partners
in offline communication in many ways. Could emotions in Internet exchanges
affect others and systematically influence quantitative and qualitative aspects
of the trajectory of e-communities? The development of automatic sentiment
analysis has made large scale emotion detection and analysis possible using
text messages collected from the web. It is not clear if emotions in
e-communities primarily derive from individual group members' personalities or
if they result from intra-group interactions, and whether they influence group
activities. We show the collective character of affective phenomena on a large
scale as observed in 4 million posts downloaded from Blogs, Digg and BBC
forums. To test whether the emotions of a community member may influence the
emotions of others, posts were grouped into clusters of messages with similar
emotional valences. The frequency of long clusters was much higher than it
would be if emotions occurred at random. Distributions for cluster lengths can
be explained by preferential processes because conditional probabilities for
consecutive messages grow as a power law with cluster length. For BBC forum
threads, average discussion lengths were higher for larger values of absolute
average emotional valence in the first ten comments and the average amount of
emotion in messages fell during discussions. Our results prove that collective
emotional states can be created and modulated via Internet communication and
that emotional expressiveness is the fuel that sustains some e-communities.Comment: 23 pages including Supporting Information, accepted to PLoS ON
Maximum Satisfiability in Software Analysis: Applications and Techniques
A central challenge in software analysis concerns balancing different competing tradeoffs. To address this challenge, we propose an approach based on the Maximum Satisfiability (MaxSAT) problem, an optimization extension of the Boolean Satisfiability (SAT) problem. We demonstrate the approach on three diverse applications that advance the state-of-the-art in balancing tradeoffs in software analysis. Enabling these applications on real-world programs necessitates solving large MaxSAT instances comprising over 10301030 clauses in a sound and optimal manner. We propose a general framework that scales to such instances by iteratively expanding a subset of clauses while providing soundness and optimality guarantees. We also present new techniques to instantiate and optimize the framework
FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium
<p>Abstract</p> <p>Background</p> <p>Human genome contains millions of common single nucleotide polymorphisms (SNPs) and these SNPs play an important role in understanding the association between genetic variations and human diseases. Many SNPs show correlated genotypes, or linkage disequilibrium (LD), thus it is not necessary to genotype all SNPs for association study. Many algorithms have been developed to find a small subset of SNPs called tag SNPs that are sufficient to infer all the other SNPs. Algorithms based on the <it>r</it><sup>2 </sup>LD statistic have gained popularity because <it>r</it><sup>2 </sup>is directly related to statistical power to detect disease associations. Most of existing <it>r</it><sup>2 </sup>based algorithms use pairwise LD. Recent studies show that multi-marker LD can help further reduce the number of tag SNPs. However, existing tag SNP selection algorithms based on multi-marker LD are both time-consuming and memory-consuming. They cannot work on chromosomes containing more than 100 k SNPs using length-3 tagging rules.</p> <p>Results</p> <p>We propose an efficient algorithm called FastTagger to calculate multi-marker tagging rules and select tag SNPs based on multi-marker LD. FastTagger uses several techniques to reduce running time and memory consumption. Our experiment results show that FastTagger is several times faster than existing multi-marker based tag SNP selection algorithms, and it consumes much less memory at the same time. As a result, FastTagger can work on chromosomes containing more than 100 k SNPs using length-3 tagging rules.</p> <p>FastTagger also produces smaller sets of tag SNPs than existing multi-marker based algorithms, and the reduction ratio ranges from 3%-9% when length-3 tagging rules are used. The generated tagging rules can also be used for genotype imputation. We studied the prediction accuracy of individual rules, and the average accuracy is above 96% when <it>r</it><sup>2 </sup>≥ 0.9.</p> <p>Conclusions</p> <p>Generating multi-marker tagging rules is a computation intensive task, and it is the bottleneck of existing multi-marker based tag SNP selection methods. FastTagger is a practical and scalable algorithm to solve this problem.</p
A hierarchical and modular approach to the discovery of robust associations in genome-wide association studies from pooled DNA samples
[Background]
One of the challenges of the analysis of pooling-based genome wide association studies is to identify authentic associations among potentially thousands of false positive associations.
[Results]
We present a hierarchical and modular approach to the analysis of genome wide genotype data that incorporates quality control, linkage disequilibrium, physical distance and gene ontology to identify authentic associations among those found by statistical association tests. The method is developed for the allelic association analysis of pooled DNA samples, but it can be easily generalized to the analysis of individually genotyped samples. We evaluate the approach using data sets from diverse genome wide association studies including fetal hemoglobin levels in sickle cell anemia and a sample of centenarians and show that the approach is highly reproducible and allows for discovery at different levels of synthesis.
[Conclusion]
Results from the integration of Bayesian tests and other machine learning techniques with linkage disequilibrium data suggest that we do not need to use too stringent thresholds to reduce the number of false positive associations. This method yields increased power even with relatively small samples. In fact, our evaluation shows that the method can reach almost 70% sensitivity with samples of only 100 subjects.Supported by NHLBI grants R21 HL080463 (PS); R01 HL68970 (MHS); K-24, AG025727 (TP); K23 AG026754 (D.T.)
The ASIMOV Prize for scientific publishing - HEP researchers trigger young people toward science
This work presents the ASIMOV Prize for scientific publishing, which was launched in Italy in 2016. The prize aims to bring the young generations closer to scientific culture, through the critical reading of popular science books. The books are selected by a committee that includes scientists, professors, Ph.D. and Ph.D. students, writers, journalists and friends of culture, and most importantly, over 800 school teachers. Students are actively involved in the prize, according to the best practices of public engagement: they read, review the books and vote for them, choosing the winner. The experience is quite successful: 12,000 students from 270 schools all over Italy participated in the last edition.
The possibility of replicating this experience in other countries is indicated, as was done in Brazil in 2020 with more than encouraging results
- …