1,593 research outputs found

    SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods

    Get PDF
    In the last few years thousands of scientific papers have investigated sentiment analysis, several startups that measure opinions on real data have emerged and a number of innovative products related to this theme have been developed. There are multiple methods for measuring sentiments, including lexical-based and supervised machine learning methods. Despite the vast interest on the theme and wide popularity of some methods, it is unclear which one is better for identifying the polarity (i.e., positive or negative) of a message. Accordingly, there is a strong need to conduct a thorough apple-to-apple comparison of sentiment analysis methods, \textit{as they are used in practice}, across multiple datasets originated from different data sources. Such a comparison is key for understanding the potential limitations, advantages, and disadvantages of popular methods. This article aims at filling this gap by presenting a benchmark comparison of twenty-four popular sentiment analysis methods (which we call the state-of-the-practice methods). Our evaluation is based on a benchmark of eighteen labeled datasets, covering messages posted on social networks, movie and product reviews, as well as opinions and comments in news articles. Our results highlight the extent to which the prediction performance of these methods varies considerably across datasets. Aiming at boosting the development of this research area, we open the methods' codes and datasets used in this article, deploying them in a benchmark system, which provides an open API for accessing and comparing sentence-level sentiment analysis methods

    In silico genomic analyses reveal three distinct lineages of Escherichia coli O157:H7, one of which is associated with hyper-virulence

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many approaches have been used to study the evolution, population structure and genetic diversity of <it>Escherichia coli </it>O157:H7; however, observations made with different genotyping systems are not easily relatable to each other. Three genetic lineages of <it>E. coli </it>O157:H7 designated I, II and I/II have been identified using octamer-based genome scanning and microarray comparative genomic hybridization (mCGH). Each lineage contains significant phenotypic differences, with lineage I strains being the most commonly associated with human infections. Similarly, a clade of hyper-virulent O157:H7 strains implicated in the 2006 spinach and lettuce outbreaks has been defined using single-nucleotide polymorphism (SNP) typing. In this study an <it>in silico </it>comparison of six different genotyping approaches was performed on 19 <it>E. coli </it>genome sequences from 17 O157:H7 strains and single O145:NM and K12 MG1655 strains to provide an overall picture of diversity of the <it>E. coli </it>O157:H7 population, and to compare genotyping methods for O157:H7 strains.</p> <p>Results</p> <p><it>In silico </it>determination of lineage, Shiga-toxin bacteriophage integration site, comparative genomic fingerprint, mCGH profile, novel region distribution profile, SNP type and multi-locus variable number tandem repeat analysis type was performed and a supernetwork based on the combination of these methods was produced. This supernetwork showed three distinct clusters of strains that were O157:H7 lineage-specific, with the SNP-based hyper-virulent clade 8 synonymous with O157:H7 lineage I/II. Lineage I/II/clade 8 strains clustered closest on the supernetwork to <it>E. coli </it>K12 and <it>E. coli </it>O55:H7, O145:NM and sorbitol-fermenting O157 strains.</p> <p>Conclusion</p> <p>The results of this study highlight the similarities in relationships derived from multi-locus genome sampling methods and suggest a "common genotyping language" may be devised for population genetics and epidemiological studies. Future genotyping methods should provide data that can be stored centrally and accessed locally in an easily transferable, informative and extensible format based on comparative genomic analyses.</p

    Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The pan-genome of a bacterial species consists of a core and an accessory gene pool. The accessory genome is thought to be an important source of genetic variability in bacterial populations and is gained through lateral gene transfer, allowing subpopulations of bacteria to better adapt to specific niches. Low-cost and high-throughput sequencing platforms have created an exponential increase in genome sequence data and an opportunity to study the pan-genomes of many bacterial species. In this study, we describe a new online pan-genome sequence analysis program, Panseq.</p> <p>Results</p> <p>Panseq was used to identify <it>Escherichia coli </it>O157:H7 and <it>E. coli </it>K-12 genomic islands. Within a population of 60 <it>E. coli </it>O157:H7 strains, the existence of 65 accessory genomic regions identified by Panseq analysis was confirmed by PCR. The accessory genome and binary presence/absence data, and core genome and single nucleotide polymorphisms (SNPs) of six <it>L. monocytogenes </it>strains were extracted with Panseq and hierarchically clustered and visualized. The nucleotide core and binary accessory data were also used to construct maximum parsimony (MP) trees, which were compared to the MP tree generated by multi-locus sequence typing (MLST). The topology of the accessory and core trees was identical but differed from the tree produced using seven MLST loci. The Loci Selector module found the most variable and discriminatory combinations of four loci within a 100 loci set among 10 strains in 1 s, compared to the 449 s required to exhaustively search for all possible combinations; it also found the most discriminatory 20 loci from a 96 loci <it>E. coli </it>O157:H7 SNP dataset.</p> <p>Conclusion</p> <p>Panseq determines the core and accessory regions among a collection of genomic sequences based on user-defined parameters. It readily extracts regions unique to a genome or group of genomes, identifies SNPs within shared core genomic regions, constructs files for use in phylogeny programs based on both the presence/absence of accessory regions and SNPs within core regions and produces a graphical overview of the output. Panseq also includes a loci selector that calculates the most variable and discriminatory loci among sets of accessory loci or core gene SNPs.</p> <p>Availability</p> <p>Panseq is freely available online at <url>http://76.70.11.198/panseq</url>. Panseq is written in Perl.</p

    Force transmission in a packing of pentagonal particles

    Get PDF
    We perform a detailed analysis of the contact force network in a dense confined packing of pentagonal particles simulated by means of the contact dynamics method. The effect of particle shape is evidenced by comparing the data from pentagon packing and from a packing with identical characteristics except for the circular shape of the particles. A counterintuitive finding of this work is that, under steady shearing, the pentagon packing develops a lower structural anisotropy than the disk packing. We show that this weakness is compensated by a higher force anisotropy, leading to enhanced shear strength of the pentagon packing. We revisit "strong" and "weak" force networks in the pentagon packing, but our simulation data provide also evidence for a large class of "very weak" forces carried mainly by vertex-to-edge contacts. The strong force chains are mostly composed of edge-to-edge contacts with a marked zig-zag aspect and a decreasing exponential probability distribution as in a disk packing

    Comparative genomics profiling of clinical isolates of Aeromonas salmonicida using DNA microarrays

    Get PDF
    BACKGROUND: Aeromonas salmonicida has been isolated from numerous fish species and shows wide variation in virulence and pathogenicity. As part of a larger research program to identify virulence genes and candidates for vaccine development, a DNA microarray was constructed using a subset of 2024 genes from the draft genome sequence of A. salmonicida subsp. salmonicida strain A449. The microarray included genes encoding known virulence-associated factors in A. salmonicida and homologs of virulence genes of other pathogens. We used microarray-based comparative genomic hybridizations (M-CGH) to compare selected A. salmonicida sub-species and other Aeromonas species from different hosts and geographic locations. RESULTS: Results showed variable carriage of virulence-associated genes and generally increased variation in gene content across sub-species and species boundaries. The greatest variation was observed among genes associated with plasmids and transposons. There was little correlation between geographic region and degree of variation for all isolates tested. CONCLUSION: We have used the M-CGH technique to identify subsets of conserved genes from amongst this set of A. salmonicida virulence genes for further investigation as potential vaccine candidates. Unlike other bacterial characterization methods that use a small number of gene or DNA-based functions, M-CGH examines thousands of genes and/or whole genomes and thus is a more comprehensive analytical tool for veterinary or even human health research

    Obstrução gástrica por pólipos múltiplos em Akita. Relato de caso

    Get PDF
    O artigo não apresenta resumo

    The Sensitivity of HAWC to High-Mass Dark Matter Annihilations

    Full text link
    The High Altitude Water Cherenkov (HAWC) observatory is a wide field-of-view detector sensitive to gamma rays of 100 GeV to a few hundred TeV. Located in central Mexico at 19 degrees North latitude and 4100 m above sea level, HAWC will observe gamma rays and cosmic rays with an array of water Cherenkov detectors. The full HAWC array is scheduled to be operational in Spring 2015. In this paper, we study the HAWC sensitivity to the gamma-ray signatures of high-mass (multi- TeV) dark matter annihilation. The HAWC observatory will be sensitive to diverse searches for dark matter annihilation, including annihilation from extended dark matter sources, the diffuse gamma-ray emission from dark matter annihilation, and gamma-ray emission from non-luminous dark matter subhalos. Here we consider the HAWC sensitivity to a subset of these sources, including dwarf galaxies, the M31 galaxy, the Virgo cluster, and the Galactic center. We simulate the HAWC response to gamma rays from these sources in several well-motivated dark matter annihilation channels. If no gamma-ray excess is observed, we show the limits HAWC can place on the dark matter cross-section from these sources. In particular, in the case of dark matter annihilation into gauge bosons, HAWC will be able to detect a narrow range of dark matter masses to cross-sections below thermal. HAWC should also be sensitive to non-thermal cross-sections for masses up to nearly 1000 TeV. The constraints placed by HAWC on the dark matter cross-section from known sources should be competitive with current limits in the mass range where HAWC has similar sensitivity. HAWC can additionally explore higher dark matter masses than are currently constrained.Comment: 15 pages, 4 figures, version to be published in PR

    TRIB3 suppresses tumorigenesis by controlling mTORC2/AKT/FOXO signaling.

    Get PDF
    In a recent article, we found that Tribbles pseudokinase 3 (TRIB3) plays a tumor suppressor role and that this effect relies on the dysregulation of the phosphorylation of v-akt murine thymoma viral oncogene homolog (AKT) by the mammalian target of rapamycin complex 2 (mTORC2 complex), and the subsequent hyperphosphorylation and inactivation of the transcription factor Forkhead box O3 (FOXO3)
    corecore