74 research outputs found

    Sentiment Analysis for micro-blogging platforms in Arabic

    Get PDF
    Sentiment Analysis (SA) concerns the automatic extraction and classification of sentiments conveyed in a given text, i.e. labelling a text instance as positive, negative or neutral. SA research has attracted increasing interest in the past few years due to its numerous real-world applications. The recent interest in SA is also fuelled by the growing popularity of social media platforms (e.g. Twitter), as they provide large amounts of freely available and highly subjective content that can be readily crawled. Most previous SA work has focused on English with considerable success. In this work, we focus on studying SA in Arabic, as a less-resourced language. This work reports on a wide set of investigations for SA in Arabic tweets, systematically comparing three existing approaches that have been shown successful in English. Specifically, we report experiments evaluating fully-supervised-based (SL), distantsupervision- based (DS), and machine-translation-based (MT) approaches for SA. The investigations cover training SA models on manually-labelled (i.e. in SL methods) and automatically-labelled (i.e. in DS methods) data-sets. In addition, we explored an MT-based approach that utilises existing off-the-shelf SA systems for English with no need for training data, assessing the impact of translation errors on the performance of SA models, which has not been previously addressed for Arabic tweets. Unlike previous work, we benchmark the trained models against an independent test-set of >3.5k instances collected at different points in time to account for topic-shifts issues in the Twitter stream. Despite the challenging noisy medium of Twitter and the mixture use of Dialectal and Standard forms of Arabic, we show that our SA systems are able to attain performance scores on Arabic tweets that are comparable to the state-of-the-art SA systems for English tweets. The thesis also investigates the role of a wide set of features, including syntactic, semantic, morphological, language-style and Twitter-specific features. We introduce a set of affective-cues/social-signals features that capture information about the presence of contextual cues (e.g. prayers, laughter, etc.) to correlate them with the sentiment conveyed in an instance. Our investigations reveal a generally positive impact for utilising these features for SA in Arabic. Specifically, we show that a rich set of morphological features, which has not been previously used, extracted using a publicly-available morphological analyser for Arabic can significantly improve the performance of SA classifiers. We also demonstrate the usefulness of languageindependent features (e.g. Twitter-specific) for SA. Our feature-sets outperform results reported in previous work on a previously built data-set

    Evaluating the Impact of Java Virtual Machines on Energy Consumption

    Get PDF
    International audienceBackground. The Java Virtual Machine (JVM) platforms have known multiple evolutions along the last decades to enhance both the performance they exhibit and the features they offer. With regards to energy consumption, few studies have investigated the energy consumption of code and data structures. Yet, we keep missing an evaluation of the energy efficiency of existing JVM platforms and an identification of the configurations that minimize the energy consumption of software hosted on the JVM. Aims. The purpose of this paper is to investigate the variations in energy consumption between different JVM distributions and parameters to help developers configuring the least consuming environment for their Java application. Method. We thus assess the energy consumption of some of the most popular and supported JVM platforms using 12 Java benchmarks that explore different performance objectives. Moreover, we investigate the impact of the different JVM parameters and configurations on the energy consumption of software. Results. Our results show that some JVM platforms can exhibit up to 100% more energy consumption. JVM configurations can also play a substantial role to reduce the energy consumption during the software execution. Interestingly, the default configuration of the garbage collector was energy efficient in only 50% of our experiments. Conclusion. Finally, we provide an OSS tool, named J-Referral that recommends an energy-efficient JVM distribution and configuration for any Java application

    Montana Journalism Review, 1968

    Get PDF
    A Study of the \u27Orthodox\u27 Press: The Reporting of Dissent -- Dean A. L. Stone Address: Toward a Two-Newspaper Town -- A Publisher\u27s Statement: Anatomy of a Failing Newspaper -- A Lee Executive\u27s Response: The Economics of Success -- Let\u27s Hear It for Shigella: The Science Story Shuffle -- No Fudging in Missoula: A Newspaper Laid Out -- In Memoriam-W.J.B.: Reflections on Mencken\u27s Style -- Montana\u27s \u27Vile Scribbler\u27: The Post\u27s Mysterious Franklin -- A Professor Looks Back: Education for Journalism -- More Than \u27Fruit Arranging\u27: The Case for Public Relations -- Craighead\u27s New Northwest: The Defense of Louis Levine -- From Bog to Gridiron: Happy Years on a Weekly -- The Poetic Image: Mansfield of Montana -- Strident Critic of the U.S.: The Vietnam Courier in 196

    Building cliques and alliances as practices to 'make things happen' in complex networks

    Get PDF
    The Roma population has become a policy issue highly debated in the European Union (EU). The EU acknowledges that this ethnic minority faces extreme poverty and complex social and economic problems. 52% of the Roma population live in extreme poverty, 75% in poverty (Soros Foundation, 2007, p. 8), with a life expectancy at birth of about ten years less than the majority population. As a result, Romania has received a great deal of policy attention and EU funding, being eligible for 19.7 billion Euros from the EU for 2007-2013. Yet progress is slow; it is debated whether Romania's government and companies were capable to use these funds (EurActiv.ro, 2012). Analysing three case studies, this research looks at policy implementation in relation to the role of Roma networks in different geographical regions of Romania. It gives insights about how to get things done in complex settings and it explains responses to the Roma problem as a „wicked‟ policy issue. This longitudinal research was conducted between 2008 and 2011, comprising 86 semi-structured interviews, 15 observations, and documentary sources and using a purposive sample focused on institutions responsible for implementing social policies for Roma: Public Health Departments, School Inspectorates, City Halls, Prefectures, and NGOs. Respondents included: governmental workers, academics, Roma school mediators, Roma health mediators, Roma experts, Roma Councillors, NGOs workers, and Roma service users. By triangulating the data collected with various methods and applied to various categories of respondents, a comprehensive and precise representation of Roma network practices was created. The provisions of the 2001 „Governmental Strategy to Improve the Situation of the Roma Population‟ facilitated forming a Roma network by introducing special jobs in local and central administration. In different counties, resources, people, their skills, and practices varied. As opposed to the communist period, a new Roma elite emerged: social entrepreneurs set the pace of change by creating either closed cliques or open alliances and by using more or less transparent practices. This research deploys the concept of social/institutional entrepreneurs to analyse how key actors influence clique and alliance formation and functioning. Significantly, by contrasting three case studies, it shows that both closed cliques and open alliances help to achieve public policy network objectives, but that closed cliques can also lead to failure to improve the health and education of Roma people in a certain region

    The Use of Firewalls in an Academic Environment

    No full text

    Efficient and private distance approximation in the communication and streaming models

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 109-114).This thesis studies distance approximation in two closely related models - the streaming model and the two-party communication model. In the streaming model, a massive data stream is presented in an arbitrary order to a randomized algorithm that tries to approximate certain statistics of tile data with only a few (usually one) passes over the data. For instance, the data may be a flow of packets on the internet or a set of records in a large database. The size of the data necessitates the use of extremely efficient randomized approximation algorithms. Problems of interest include approximating the number of distinct elements, approximating the surprise index of a stream, or more generally, approximating the norm of a dynamically-changing vector in which coordinates are updated multiple times in an arbitrary order. In the two-party communication model, there are two parties who wish to efficiently compute a relation of their inputs. We consider the problem of approximating Lp distances for any p > 0. It turns out that lower bounds on the communication complexity of these relations yield lower bounds on the memory required of streaming algorithms for the problems listed above. Moreover, upper bounds in the streaming model translate to constant-round protocols in the communication model with communication proportional to the memory required of the streaming algorithm. The communication model also hias its own applications, such as secure datamining, where in addition to low communication, the goal is not to allow either party to learn more about the other's input other than what follows from the output and his/her private input.(cont.) We develop new algorithms and lower bounds that resolve key open questions in both of these models. The highlights of the results are as follows. 1. We give an Q(1/E2) lower bound for approximating the number of distinct elements of a data stream in one pass to within a (1 ± c) factor with constant probability, as well as the p-th frequency moment Fp for any p Ž 0. This is tight up to very small factors, and greatly improves upon the earlier Q(1/E) lower bound for these problems. It also gives the same quadratic improvement for the communication complexity of 1-round protocols for approximating the Lp distance for any p 2 0. 2. We give a 1-pass O(ml-2/p)-space streaming algorithm for (1 ± 6)-approximating the Lp norm of an m-dimensional vector presented as a data stream for any p 2 2. This algorithm improves the previous ((m1-1/(P-')) bound, and is optimal up to polylogarithmic factors. As a special case our algorithm can be used to approximate the frequency moments Fp of a data stream with the same optimal amount of space. This resolves the main open question of the 1996 paper by Alon, Matias, and Szegedy. 3. In the two-party communication model, we give a protocol for privately approximating the Euclidean distance (L2) between two m-dimensional vectors, held by different parties, with only polylog m communication and 0(1) rounds. This tremendously improves upon the earlier protocol of Feigenbaum, Ishai, Malkin, Nissim, Strauss, and Wright, which achieved O(vm) communication for privately approximating the Hamming distance only. This thesis also contains several previously unpublished results concerning the first item above, including new lower bounds for the communication complexity of approximating the Lp distances when the vectors are uniformly distributed and the protocol is only correct for most inputs, as well as tight lower bounds for the multiround complexity for a restricted class of protocols that we call linear.by David P. Woodruff.Ph.D

    The Development of a bi-level geographic information systems (GIS) database model for informal settlement upgrading

    Get PDF
    Bibliography : leaves 348-369.Existing Urban GIS models are faced with several limitations. Firstly, these models tend to be single-scale in nature. They are usually designed to operate at either metropolitan- or at the local-level. Secondly, they are generally designed to cater only for the needs of the formal and environmental sectors of the city system. These models do not cater for the "gaps" of data that exist in digital cadastres throughout the world. In the developed countries, these gaps correspond to areas of physical decay or economic decline. In the developing countries, they correspond to informal settlement areas. In this thesis, a new two-scale urban GIS database model, termed the "Bi-Ievel model" is proposed. This model has been specifically designed to address these gaps in the digital cadastre. Furthermore, the model addresses the short-comings facing current informal settlement upgrading models by providing mechanisms for community participation, project management, creating linkages to formal and environmental sectoral models, and for co-ordinating initiatives at a global-level. The Bi-Ievel model is comprised of a metropolitan-level and a series of local-level database components. These components are inter-linked through bi-directional database warehouse connections. While the model requires Internet-connectivity to achieve its full potential across a metropolitan region, it recognises the need for community participation-based methods at a local-level. Members of the community are actually involved in capturing and entering informal settlement data into the local-level database
    corecore