274 research outputs found

    Towards shared datasets for normalization research

    Get PDF
    In this paper we present a Dutch and English dataset that can serve as a gold standard for evaluating text normalization approaches. With the combination of text messages, message board posts and tweets, these datasets represent a variety of user generated content. All data was manually normalized to their standard form using newly-developed guidelines. We perform automatic lexical normalization experiments on these datasets using statistical machine translation techniques. We focus on both the word and character level and find that we can improve the BLEU score with ca. 20% for both languages. In order for this user generated content data to be released publicly to the research community some issues first need to be resolved. These are discussed in closer detail by focussing on the current legislation and by investigating previous similar data collection projects. With this discussion we hope to shed some light on various difficulties researchers are facing when trying to share social media data

    Stochastic decomposition in discrete-time queues with generalized vacations and applications

    Get PDF
    For several specific queueing models with a vacation policy, the stationary system occupancy at the beginning of a rantdom slot is distributed as the sum of two independent random variables. One of these variables is the stationary number of customers in an equivalent queueing system with no vacations. For models in continuous time with Poissonian arrivals, this result is well-known, and referred to as stochastic decomposition, with proof provided by Fuhrmann and Cooper. For models in discrete time, this result received less attention, with no proof available to date. In this paper, we first establish a proof of the decomposition result in discrete time. When compared to the proof in continuous time, conditions for the proof in discrete time are somewhat more general. Second, we explore four different examples: non-preemptive proirity systems, slot-bound priority systems, polling systems, and fiber delay line (FDL) buffer systems. The first two examples are known results from literature that are given here as an illustration. The third is a new example, and the last one (FDL buffer systems) shows new results. It is shown that in some cases the queueing analysis can be considerably simplified using this property

    Normalization of Dutch user-generated content

    Get PDF
    Abstract This paper describes a phrase-based machine translation approach to normalize Dutch user-generated content (UGC). We compiled a corpus of three different social media genres (text messages, message board posts and tweets) to have a sample of this recent domain. We describe the various characteristics of this noisy text material and explain how it has been manually normalized using newly developed guidelines. For the automatic normalization task we focus on text messages, and find that a cascaded SMT system where a token-based module is followed by a translation at the character level gives the best word error rate reduction. After these initial experiments, we investigate the system's robustness on the complete domain of UGC by testing it on the other two social media genres, and find that the cascaded approach performs best on these genres as well. To our knowledge, we deliver the first proof-of-concept system for Dutch UGC normalization, which can serve as a baseline for future work

    SOFIA : software and control flow integrity architecture

    Get PDF
    Microprocessors used in safety-critical systems are extremely sensitive to software vulnerabilities, as their failure can lead to injury, damage to equipment, or environmental catastrophe. This paper proposes a hardware-based security architecture for microprocessors used in safety-critical systems. The proposed architecture provides protection against code injection and code reuse attacks. It has mechanisms to protect software integrity, perform control flow integrity, prevent execution of tampered code, and enforce copyright protection. We are the first to propose a mechanism to enforce control flow integrity at the finest possible granularity. The proposed architectural features were added to the LEON3 open source soft microprocessor, and were evaluated on an FPGA running a software benchmark. The results show that the hardware area is 28.2% larger and the clock is 84.6% slower, while the software benchmark has a cycle overhead of 13.7% and a total execution time overhead of 110% when compared to an unmodified processor

    SCM : Secure Code Memory Architecture

    Get PDF
    An increasing number of applications implemented on a SoC (System-on-chip) require security features. This work addresses the issue of protecting the integrity of code and read-only data that is stored in memory. To this end, we propose a new architecture called SCM, which works as a standalone IP core in a SoC. To the best of our knowledge, there exist no architectural elements similar to SCM that offer the same strict security guarantees while, at the same time, not requiring any modifications to other IP cores in its SoC design. In addition, SCM has the flexibility to select the parts of the software to be protected, which eases the integration of our solution with existing software. The evaluation of SCM was done on the Zynq platform which features an ARM processor and an FPGA. The design was evaluated by executing a number of different benchmarks from memory protected by SCM, and we found that it introduces minimal overhead to the system

    Delay analysis of a discrete-time multiclass slot-bound priority system

    Get PDF
    This paper introduces a new priority mechanism in discrete-time queueing systems that compromises between rst-come-rst-served (FCFS) and head-of-line (HoL) priority. In this scheduling discipline - which we dubbed slot-bound priority - customers of dierent priority classes entering the system during the same time-slot are served in order of their respective priority class. Customers entering during dierent slots are served on a FCFS basis. In this paper we study the delay in anN-class discrete-time queueing system under slot-bound priority. General independent arrivals and class-specic general service time distributions are assumed. Expressions for the probability generating function of the delay of a random type-jcustomer are derived, from which the respective moments are easily obtained. The tail behaviour of these distributions is analyzed as well, and some numerical examples show the eect slot-bound priority can have on the performance measures

    Noise or music? Investigating the usefulness of normalisation for robust sentiment analysis on social media data

    Get PDF
    In the past decade, sentiment analysis research has thrived, especially on social media. While this data genre is suitable to extract opinions and sentiment, it is known to be noisy. Complex normalisation methods have been developed to transform noisy text into its standard form, but their effect on tasks like sentiment analysis remains underinvestigated. Sentiment analysis approaches mostly include spell checking or rule-based normalisation as preprocess- ing and rarely investigate its impact on the task performance. We present an optimised sentiment classifier and investigate to what extent its performance can be enhanced by integrating SMT-based normalisation as preprocessing. Experiments on a test set comprising a variety of user-generated content genres revealed that normalisation improves sentiment classification performance on tweets and blog posts, showing the model’s ability to generalise to other data genres

    Frame-bound priority scheduling in discrete-time queueing systems

    Get PDF
    A well-known problem with priority policies is starvation of delay-tolerant traffic. Additionally, insufficient control over delay differentiation (which is needed for modern network applications) has incited the development of sophisticated scheduling disciplines. The priority policy we present here has the benefit of being open to rigorous analysis. We study a discrete time queueing system with a single server and single queue, in which N types of customers enter pertaining to different priorities. A general i.i.d. arrival process is assumed and service times are generally distributed. We divide the time axis into 'frames' of fixed size (counted as a number of time-slots), and re-order the customers that enter the system during the same frame such that the high-priority customers are served first. This paper gives an analytic approach to studying such a system, and in particular focuses on the system content (meaning the customers of each type in the system at random slot marks)in stationary regime, and the delay distribution of a random customer. Clearly, in such a system the frame's size is the key factor in the delay differentiation between the N priority classes. The numerical results at the end of this paper illustrate this observation
    • …
    corecore