29 research outputs found

    Randomized Maximum Entropy Language Models

    Get PDF
    Abstract—We address the memory problem of maximum entropy language models(MELM) with very large feature sets. Randomized techniques are employed to remove all large, exact data structures in MELM implementations. To avoid the dictionary structure that maps each feature to its corresponding weight, the feature hashing trick [1] [2] can be used. We also replace the explicit storage of features with a Bloom filter. We show with extensive experiments that false positive errors of Bloom filters and random hash collisions do not degrade model performance. Both perplexity and WER improvements are demonstrated by building MELM that would otherwise be prohibitively large to estimate or store. I

    NETWORK SANDBOX FOR CLOSED-SOURCE COMPONENTS WITH ACCESS TO SENSITIVE DATA

    Get PDF
    A computing device (e.g., a smartphone, a laptop computer, a tablet computer, a smartwatch, etc.) may include a system application for managing both the ability of software (e.g., an application, a program, a widget, etc.) to access a network and the type of information that can be transmitted to a computing system via the network. Rather than use a permission-based model (e.g., a model in which a user manually permits an application to access the network), which may grant the application unconstrained network access, the system application may use a dataflow model (e.g., a model in which a framework defines a policy for how an application may access the network) that results in more granular network access. In some examples, the system application may comprise a first component (e.g., an application package (APK)) that delegates all requests for network access to a second component (e.g., an application programming interface (API)) to ensure policy enforcement (e.g., limiting data exfiltration from the first component). Source code for the second component may be made available for inspection or review by anyone (e.g., open sourced) to provide a means for auditing the operation of the system application. In addition, the system application may provide a ledger to enable a user of the computing device to monitor dataflows and network usage. In this way, the system application may increase trust in applications executing at the computing device (e.g., by enabling researchers to ensure that no party is receiving preferential treatment with regards to data retention policies) and may increase transparency in how applications are using and sharing data (e.g., by allowing interested parties to verify network usage)

    Efficiency-Effectiveness Trade-offs in Recommendation Systems

    Get PDF
    Throughout the years, numerous recommendation algorithms have been developed to address the information filtering problem by leveraging users’ tastes through implicit or explicit feedback. In this paper, we present the work undertaken as part of a PhD thesis focused on exploring new evaluation dimensions centred around the efficiency-effectiveness trade-offs present in state-of-the-art recommendation systems. Firstly, we highlight the lack of efficiency-oriented studies and we formulate the research problem. Then, we propose a mapping of the design space and a classification of the recommendation algorithms/models with respect to salient attributes and characteristics. At the same time, we explain why and how assessing the recommendations on an accuracy versus training cost curve would advance the current knowledge in the area of evaluation, as well as open new research avenues for exploring parameter configurations within well-known algorithms. Finally, we make the case for a comprehensive methodology that incorporates predictive efficiency-effectiveness models, which illustrate the performance and behaviour of the recommendation systems under different recommendation tasks, while satisfying user-defined quality of service constraints and goals

    Discounted likelihood linear regression for rapid speaker adaptation

    Full text link
    Rapid adaptation schemes that employ the EM algorithm may suffer from overtraining problems when used with small amounts of adaptation data. An algorithm to alleviate this problem is derived within the information geometric framework of Csiszár and Tusnády, and is used to improve MLLR adaptation on NAB and Switchboard adaptation tasks. It is shown how this algorithm approximately optimizes a discounted likelihood criterion. 1
    corecore