475 research outputs found

    #Bieber + #Blast = #BieberBlast: Early Prediction of Popular Hashtag Compounds

    Full text link
    Compounding of natural language units is a very common phenomena. In this paper, we show, for the first time, that Twitter hashtags which, could be considered as correlates of such linguistic units, undergo compounding. We identify reasons for this compounding and propose a prediction model that can identify with 77.07% accuracy if a pair of hashtags compounding in the near future (i.e., 2 months after compounding) shall become popular. At longer times T = 6, 10 months the accuracies are 77.52% and 79.13% respectively. This technique has strong implications to trending hashtag recommendation since newly formed hashtag compounds can be recommended early, even before the compounding has taken place. Further, humans can predict compounds with an overall accuracy of only 48.7% (treated as baseline). Notably, while humans can discriminate the relatively easier cases, the automatic framework is successful in classifying the relatively harder cases.Comment: 14 pages, 4 figures, 9 tables, published in CSCW (Computer-Supported Cooperative Work and Social Computing) 2016. in Proceedings of 19th ACM conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2016

    Native Speaker Perceptions of Accented Speech: The English Pronunciation of Macedonian EFL Learners

    Get PDF
    The paper reports on the results of a study that aimed to describe the vocalic and consonantal features of the English pronunciation of Macedonian EFL learners as perceived by native speakers of English and to find out whether native speakers who speak different standard variants of English perceive the same segments as non-native. A specially designed computer web application was employed to gather two types of data: a) quantitative (frequency of segment variables and global foreign accent ratings on a 5-point scale), and b) qualitative (open-ended questions). The result analysis points out to three most frequent markers of foreign accent in the English speech of Macedonian EFL learners: final obstruent devoicing, vowel shortening and substitution of English dental fricatives with Macedonian dental plosives. It also reflects additional phonetic aspects poorly explained in the available reference literature such as allophonic distributional differences between the two languages and intonational mismatch

    Convergence towards a European strategic culture? A constructivist framework for explaining changing norms.

    Get PDF
    The article contributes to the debate about the emergence of a European strategic culture to underpin a European Security and Defence Policy. Noting both conceptual and empirical weaknesses in the literature, the article disaggregates the concept of strategic culture and focuses on four types of norms concerning the means and ends for the use of force. The study argues that national strategic cultures are less resistant to change than commonly thought and that they have been subject to three types of learning pressures since 1989: changing threat perceptions, institutional socialization, and mediatized crisis learning. The combined effect of these mechanisms would be a process of convergence with regard to strategic norms prevalent in current EU countries. If the outlined hypotheses can be substantiated by further research the implications for ESDP are positive, especially if the EU acts cautiously in those cases which involve norms that are not yet sufficiently shared across countries

    Fast index based algorithms and software for matching position specific scoring matrices

    Get PDF
    BACKGROUND: In biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs in nucleotide as well as amino acid sequences. Searching with PSSMs in complete genomes or large sequence databases is a common, but computationally expensive task. RESULTS: We present a new non-heuristic algorithm, called ESAsearch, to efficiently find matches of PSSMs in large databases. Our approach preprocesses the search space, e.g., a complete genome or a set of protein sequences, and builds an enhanced suffix array that is stored on file. This allows the searching of a database with a PSSM in sublinear expected time. Since ESAsearch benefits from small alphabets, we present a variant operating on sequences recoded according to a reduced alphabet. We also address the problem of non-comparable PSSM-scores by developing a method which allows the efficient computation of a matrix similarity threshold for a PSSM, given an E-value or a p-value. Our method is based on dynamic programming and, in contrast to other methods, it employs lazy evaluation of the dynamic programming matrix. We evaluated algorithm ESAsearch with nucleotide PSSMs and with amino acid PSSMs. Compared to the best previous methods, ESAsearch shows speedups of a factor between 17 and 275 for nucleotide PSSMs, and speedups up to factor 1.8 for amino acid PSSMs. Comparisons with the most widely used programs even show speedups by a factor of at least 3.8. Alphabet reduction yields an additional speedup factor of 2 on amino acid sequences compared to results achieved with the 20 symbol standard alphabet. The lazy evaluation method is also much faster than previous methods, with speedups of a factor between 3 and 330. CONCLUSION: Our analysis of ESAsearch reveals sublinear runtime in the expected case, and linear runtime in the worst case for sequences not shorter than | [Formula: see text] |(m )+ m - 1, where m is the length of the PSSM and [Formula: see text] a finite alphabet. In practice, ESAsearch shows superior performance over the most widely used programs, especially for DNA sequences. The new algorithm for accurate on-the-fly calculations of thresholds has the potential to replace formerly used approximation approaches. Beyond the algorithmic contributions, we provide a robust, well documented, and easy to use software package, implementing the ideas and algorithms presented in this manuscript

    Faster Algorithms for Algebraic Path Properties in Recursive State Machines with Constant Treewidth

    Get PDF
    Interprocedural analysis is at the heart of numerous applications in programming languages, such as alias analysis, constant propagation, etc. Recursive state machines (RSMs) are standard models for interprocedural analysis. We consider a general framework with RSMs where the transitions are labeled from a semiring, and path properties are algebraic with semiring operations. RSMs with algebraic path properties can model interprocedural dataflow analysis problems, the shortest path problem, the most probable path problem, etc. The traditional algorithms for interprocedural analysis focus on path properties where the starting point is fixed as the entry point of a specific method. In this work, we consider possible multiple queries as required in many applications such as in alias analysis. The study of multiple queries allows us to bring in a very important algorithmic distinction between the resource usage of the one-time preprocessing vs for each individual query. The second aspect that we consider is that the control flow graphs for most programs have constant treewidth. Our main contributions are simple and implementable algorithms that support multiple queries for algebraic path properties for RSMs that have constant treewidth. Our theoretical results show that our algorithms have small additional one-time preprocessing, but can answer subsequent queries significantly faster as compared to the current best-known solutions for several important problems, such as interprocedural reachability and shortest path. We provide a prototype implementation for interprocedural reachability and intraprocedural shortest path that gives a significant speed-up on several benchmarks

    Targeting a Versatile Actuator for EU-DEMO: Real Time Monitoring of Pellet Delivery to Facilitate Burn Control

    Get PDF
    Core particle fueling, an essential task in the European demonstration fusion power plant EU-DEMO, relies on adequate pellet injection. However, pellets are fragile objects, and their delivery efficiency can hardly be assumed to be unity. Exploring kinetic control of the EU-DEMO1 scenario indicates that such missed-out pellets do cause a considerable problem for keeping a burning plasma. Missed-out pellets can cause a severe drop of plasma density that in turn results in a potential drastic loss of burn power. Efforts are under way at the ASDEX Upgrade (AUG) tokamak aiming to provide real-time monitoring of pellet arrival and announcement of missed-out cases to the control systems. To further optimize the controllers, system identification experiments have been performed to identify the dynamic response of the system to the actuator

    Maximum expected accuracy structural neighbors of an RNA secondary structure

    Get PDF
    International audienceBACKGROUND: Since RNA molecules regulate genes and control alternative splicing by allostery, it is important to develop algorithms to predict RNA conformational switches. Some tools, such as paRNAss, RNAshapes and RNAbor, can be used to predict potential conformational switches; nevertheless, no existent tool can detect general (i.e., not family specific) entire riboswitches (both aptamer and expression platform) with accuracy. Thus, the development of additional algorithms to detect conformational switches seems important, especially since the difference in free energy between the two metastable secondary structures may be as large as 15-20 kcal/mol. It has recently emerged that RNA secondary structure can be more accurately predicted by computing the maximum expected accuracy (MEA) structure, rather than the minimum free energy (MFE) structure. RESULTS: Given an arbitrary RNA secondary structure S₀ for an RNA nucleotide sequence a = a₁,..., a(n), we say that another secondary structure S of a is a k-neighbor of S₀, if the base pair distance between S₀ and S is k. In this paper, we prove that the Boltzmann probability of all k-neighbors of the minimum free energy structure S₀ can be approximated with accuracy ε and confidence 1 - p, simultaneously for all 0 ≤ k N(ε,p,K)=Φ⁻¹(p/2K)²/4ε², where Φ(z) is the cumulative distribution function (CDF) for the standard normal distribution. We go on to describe the algorithm RNAborMEA, which for an arbitrary initial structure S₀ and for all values 0 ≤ k < K, computes the secondary structure MEA(k), having maximum expected accuracy over all k-neighbors of S₀. Computation time is O(n³ * K²), and memory requirements are O(n² * K). We analyze a sample TPP riboswitch, and apply our algorithm to the class of purine riboswitches. CONCLUSIONS: The approximation of RNAbor by sampling, with rigorous bound on accuracy, together with the computation of maximum expected accuracy k-neighbors by RNAborMEA, provide additional tools toward conformational switch detection. Results from RNAborMEA are quite distinct from other tools, such as RNAbor, RNAshapes and paRNAss, hence may provide orthogonal information when looking for suboptimal structures or conformational switches. Source code for RNAborMEA can be downloaded from http://sourceforge.net/projects/rnabormea/ or http://bioinformatics.bc.edu/clotelab/RNAborMEA/

    Splitting ‘intervocalic’: Expanding the typology of lenition environments

    Get PDF
    The basic types of lenition environments (‘initial’, ‘intervocalic’, ‘final’) need to be separately evaluated as they differ along parameters like word position (e.g., pre-consonantal vs. final codas) or stress relations. This paper argues that we need to recognise an additional such parameter: the length of the vowel preceding an intervocalic consonant. We show that a number of phenomena from varieties of English and German show lenition patterns which draw a distinction between reflexes found in post-short (vc) and post-long (vvc) environments. The theoretical consequence of our observations is that phonological theory needs to be able to account for the post-short vs. post-long distinction in the form of a parametrically-determined representational difference
    corecore