180 research outputs found

    MultiLexNorm: A Shared Task on Multilingual Lexical Normalization

    Get PDF
    Lexical normalization is the task of transforming an utterance into its standardized form. This task is beneficial for downstream analysis, as it provides a way to harmonize (often spontaneous) linguistic variation. Such variation is typical for social media on which information is shared in a multitude of ways, including diverse languages and code-switching. Since the seminal work of Han and Baldwin (2011) a decade ago, lexical normalization has attracted attention in English and multiple other languages. However, there exists a lack of a common benchmark for comparison of systems across languages with a homogeneous data and evaluation setup. The MULTILEXNORM shared task sets out to fill this gap. We provide the largest publicly available multilingual lexical normalization benchmark including 12 language variants. We propose a homogenized evaluation setup with both intrinsic and extrinsic evaluation. As extrinsic evaluation, we use dependency parsing and part-of-speech tagging with adapted evaluation metrics (a-LAS, a-UAS, and a-POS) to account for alignment discrepancies. The shared task hosted at W-NUT 2021 attracted 9 participants and 18 submissions. The results show that neural normalization systems outperform the previous state-of-the-art system by a large margin. Downstream parsing and part-of-speech tagging performance is positively affected but to varying degrees, with improvements of up to 1.72 a-LAS, 0.85 a-UAS, and 1.54 a-POS for the winning system

    Authenticating the Query Results of Text Search Engines

    Get PDF
    The number of successful attacks on the Internet shows that it is very difficult to guarantee the security of online search engines. A breached server that is not detected in time may return incorrect results to the users. To prevent that, we introduce a methodology for generating an integrity proof for each search result. Our solution is targeted at search engines that perform similarity-based document retrieval, and utilize an inverted list implementation (as most search engines do). We formulate the properties that define a correct result, map the task of processing a text search query to adaptations of existing threshold-based algorithms, and devise an authentication scheme for checking the validity of a result. Finally, we confirm the efficiency and practicality of our solution through an empirical evaluation with real documents and benchmark queries. 1

    Direct, Indirect and Collider Detection of Neutralino Dark Matter In SUSY Models with Non-universal Higgs Masses

    Full text link
    In supersymmetric models with gravity-mediated SUSY breaking, universality of soft SUSY breaking sfermion masses m_0 is motivated by the need to suppress unwanted flavor changing processes. The same motivation, however, does not apply to soft breaking Higgs masses, which may in general have independent masses from matter scalars at the GUT scale. We explore phenomenological implications of both the one-parameter and two-parameter non-universal Higgs mass models (NUHM1 and NUHM2), and examine the parameter ranges compatible with Omega_CDM h^2, BF(b --> s,gamma) and (g-2)_mu constraints. In contrast to the mSUGRA model, in both NUHM1 and NUHM2 models, the dark matter A-annihilation funnel can be reached at low values of tan(beta), while the higgsino dark matter annihilation regions can be reached for low values of m_0. We show that there may be observable rates for indirect and direct detection of neutralino cold dark matter in phenomenologically aceptable ranges of parameter space. We also examine implications of the NUHM models for the Fermilab Tevatron, the CERN LHC and a Sqrt(s)=0.5-1 TeV e+e- linear collider. Novel possibilities include: very light s-top_R, s-charm_R squark and slepton_L masses as well as light charginos and neutralinos and H, A and H^+/- Higgs bosons.Comment: LaTeX, 48pages, 26 Figures. The version with high resolution Figures is available at http://hep.pa.msu.edu/belyaev/public/projects/nuhm/nuhm.p

    Embellishing Text Search Queries to Protect User Privacy

    Get PDF
    Users of text search engines are increasingly wary that their activities may disclose confidential information about their business or personal profiles. It would be desirable for a search engine to perform document retrieval for users while protecting their intent. In this paper, we identify the privacy risks arising from semantically related search terms within a query, and from recurring highspecificity query terms in a search session. To counter the risks, we propose a solution for a similarity text retrieval system to offer anonymity and plausible deniability for the query terms, and hence the user intent, without degrading the system’s precision-recall performance. The solution comprises a mechanism that embellishes each user query with decoy terms that exhibit similar specificity spread as the genuine terms, but point to plausible alternative topics. We also provide an accompanying retrieval scheme that enables the search engine to compute the encrypted document relevance scores from only the genuine search terms, yet remain oblivious to their distinction from the decoys. Empirical evaluation results are presented to substantiate the effectiveness of our solution. 1

    Managing Spoilers in a Hybrid War: The Democratic Republic of Congo (1996-2010)

    Get PDF
    Scholarship on the management of spoilers in a hybrid type of conflict is almost non-existent. Through an examination of the recent Congolese wars and peace efforts (1996–2010), we develop an understanding of how spoilers are managed in a conflict characterised by both interstate and intrastate dynamics. Certainly, more strategies of dealing with spoiler behaviours in this type of conflict are likely to emerge as similar cases are investigated, but our discussion recommends these non-related, but strongly interacting principles: the practice of inclusivity, usually preferred in the management of spoilers, is more complex, and in fact ineffective, particularly when concerned groups’ internal politics and supportive alliances are unconventional. Because holding elections is often deemed indispensable in peacemaking efforts, it is vital that total spoilers be prevented from winning or disrupting them. The toughest challenge is the protection of civilians, especially when the state lacks a monopoly on the use of violence and governance remains partitioned across the country
    • …
    corecore