9 research outputs found

    Explicit and Implicit Semantic Ranking Framework

    Full text link
    The core challenge in numerous real-world applications is to match an inquiry to the best document from a mutable and finite set of candidates. Existing industry solutions, especially latency-constrained services, often rely on similarity algorithms that sacrifice quality for speed. In this paper we introduce a generic semantic learning-to-rank framework, Self-training Semantic Cross-attention Ranking (sRank). This transformer-based framework uses linear pairwise loss with mutable training batch sizes and achieves quality gains and high efficiency, and has been applied effectively to show gains on two industry tasks at Microsoft over real-world large-scale data sets: Smart Reply (SR) and Ambient Clinical Intelligence (ACI). In Smart Reply, sRanksRank assists live customers with technical support by selecting the best reply from predefined solutions based on consumer and support agent messages. It achieves 11.7% gain in offline top-one accuracy on the SR task over the previous system, and has enabled 38.7% time reduction in composing messages in telemetry recorded since its general release in January 2021. In the ACI task, sRank selects relevant historical physician templates that serve as guidance for a text summarization model to generate higher quality medical notes. It achieves 35.5% top-one accuracy gain, along with 46% relative ROUGE-L gain in generated medical notes

    Data-driven prototyping via natural-language-based GUI retrieval

    Get PDF
    Rapid GUI prototyping has evolved into a widely applied technique in early stages of software development to facilitate the clarification and refinement of requirements. Especially high-fidelity GUI prototyping has shown to enable productive discussions with customers and mitigate potential misunderstandings, however, the benefits of applying high-fidelity GUI prototypes are accompanied by the disadvantage of being expensive and time-consuming in development and requiring experience to create. In this work, we show RaWi, a data-driven GUI prototyping approach that effectively retrieves GUIs for reuse from a large-scale semi-automatically created GUI repository for mobile apps on the basis of Natural Language (NL) searches to facilitate GUI prototyping and improve its productivity by leveraging the vast GUI prototyping knowledge embodied in the repository. Retrieved GUIs can directly be reused and adapted in the graphical editor of RaWi. Moreover, we present a comprehensive evaluation methodology to enable (i) the systematic evaluation of NL-based GUI ranking methods through a novel high-quality gold standard and conduct an in-depth evaluation of traditional IR and state-of-the-art BERT-based models for GUI ranking, and (ii) the assessment of GUI prototyping productivity accompanied by an extensive user study in a practical GUI prototyping environment

    IST Austria Thesis

    Get PDF
    Because of the increasing popularity of machine learning methods, it is becoming important to understand the impact of learned components on automated decision-making systems and to guarantee that their consequences are beneficial to society. In other words, it is necessary to ensure that machine learning is sufficiently trustworthy to be used in real-world applications. This thesis studies two properties of machine learning models that are highly desirable for the sake of reliability: robustness and fairness. In the first part of the thesis we study the robustness of learning algorithms to training data corruption. Previous work has shown that machine learning models are vulnerable to a range of training set issues, varying from label noise through systematic biases to worst-case data manipulations. This is an especially relevant problem from a present perspective, since modern machine learning methods are particularly data hungry and therefore practitioners often have to rely on data collected from various external sources, e.g. from the Internet, from app users or via crowdsourcing. Naturally, such sources vary greatly in the quality and reliability of the data they provide. With these considerations in mind, we study the problem of designing machine learning algorithms that are robust to corruptions in data coming from multiple sources. We show that, in contrast to the case of a single dataset with outliers, successful learning within this model is possible both theoretically and practically, even under worst-case data corruptions. The second part of this thesis deals with fairness-aware machine learning. There are multiple areas where machine learning models have shown promising results, but where careful considerations are required, in order to avoid discrimanative decisions taken by such learned components. Ensuring fairness can be particularly challenging, because real-world training datasets are expected to contain various forms of historical bias that may affect the learning process. In this thesis we show that data corruption can indeed render the problem of achieving fairness impossible, by tightly characterizing the theoretical limits of fair learning under worst-case data manipulations. However, assuming access to clean data, we also show how fairness-aware learning can be made practical in contexts beyond binary classification, in particular in the challenging learning to rank setting

    Search Among Sensitive Content

    Get PDF
    Current search engines are designed to find what we want. But many collections can not be made available for search engines because they contain sensitive content that needs to be protected. Before release, such content needs to be examined through a sensitivity review process, which can be difficult and time-consuming. To address this challenge, search technology should be capable of providing access to relevant content while protecting sensitive content. In this dissertation, we present an approach that leverages evaluation-driven information retrieval (IR) techniques. These techniques optimize an objective function that balances the value of finding relevant content with the imperative to protect sensitive content. This requires evaluation measures that balance between relevance and sensitivity. Baselines are introduced for addressing the problem, and a proposed approach that is based on building a listwise learning to rank model is described. The model is trained with a modified loss function to optimize for the evaluation measure. Initial experiments re-purpose a LETOR benchmark dataset, OHSUMED, by using Medical Subject Heading (MeSH) labels to represent the sensitivity. A second test collection is based on the Avocado Research Email Collection. Search topics were developed as a basis for assessing relevance, and two personas describing the sensitivities of representative (but fictional) content creators were created as a basis for assessing sensitivity. These personas were based on interviews with potential donors of historically significant email collections and with archivists who currently manage access to such collections. Two annotators then created relevance and sensitivity judgments for 65 topics for one or both personas. Experiment results show the efficacy of the learning to rank approach. The dissertation also includes four extensions to increase the quality of retrieved results with respect to relevance and sensitivity. First, the use of alternative optimization measures is explored. Second, transformer-based rankers are compared with rankers based on hand-crafted features. Third, a cluster-based replacement strategy that can further improve the score of our evaluation measures is introduced. Fourth, a policy that truncates the ranked list according to the query's expected difficulty is investigated. Results show improvements in each case

    Unfairness Assessment, Explanation and Mitigation in Machine Learning Models for Personalization

    Get PDF
    The last decade has been pervaded by the automatic applications leveraging Artificial Intelligence technologies. Novel systems have been adopted to automatically solve relevant tasks, from scanning passengers during border controls to suggesting the groceries to buy to fill the fridge. One of the most captivating applications of Artificial Intelligence is represented by voice assistants, like Alexa. They enable people to use their voice to perform simple tasks, such as setting an alarm or saving an appointment in an online calendar. Due to their worldwide usage, voice assistants are required to aid a diverse range of individuals encompassing various cultures, languages, accents, and preferences. It is then crucial for these systems to function fairly across different groups of people to ensure reliability and provide assistance without being influenced by sensitive attributes that may vary among them. This thesis deals with the design, implementation, and evaluation of Artificial Intelligence models that are optimized to operate fairly in the context of voice assistant systems. Assessing the level of performance of existing fairness-aware solutions is an essential step towards comprehending how much effort should be put to provide fair and reliable technologies. The contributions result in extensive analyses of existing methods to counteract unfairness, and in novel techniques to mitigate and explain unfairness that capitalize on Data Balancing, Counterfactuality, and Graph Neural Networks Explainability. The proposed solutions aim to support system designers and decision makers over several fairness requirements. Specifically, over methodologies to evaluate fairness of models outcomes, techniques aimed to improve users’ trustworthiness by mitigating unfairness, and strategies that generate explanations of the potential causes behind the estimated unfairness. Through our studies, we explore opportunities and challenges introduced by the latest advancements in Fair Artificial Intelligence, a relevant and timely topic in literature. Supported by extensive experiments, our findings illustrate the feasibility of designing Artificial Intelligence solutions for the mitigation and explanation of unfairness issues in the models adopted in voice assistants. Our results provide guidelines on fairness evaluation, and design of methods to counteract unfairness concerning the voice assistant scenario. Researchers can use our findings to follow a schematic protocol for fairness assessment, to discover the data aspects affecting the model fairness, and to mitigate the outcomes unfairness, among others. We expect that this thesis can support the adoption of fairness-aware solutions in the voice assistant pipeline, from the voice authentication to the requested task resolution
    corecore