173 research outputs found

    A Stochastic Model for Estimating the Number of Offenders and Targets on Snapchat Platform

    Full text link
    Snapchat, like many other social platforms, provides mechanisms for its users to report content and private/public interactions that violate their sense of safety and decency. From our experience and common sense, we can safely assume that not everybody makes an effort to report, leaving potentially a large number of offending users and content unnoticed. The goal of this work is to directly estimate the probability of someone reporting on Snapchat using current in-app reporting options and, thereby, to provide estimates of the total prevalence (count) of offenders and users subjected to their unwanted, unwelcome or unsafe interactions

    Query Processing at Snapchat: How we Handle Query Completion, Suggestion and Localization

    Full text link
    From the Publisher:Software is a commodity being sold across diverse language and cultural groups, whether in the commercial marketplace, or as customized applications. Developers must structure their applications so that they can be readily and cheaply localized for sale in this range of markets. Obvious differences such as scripts and languages must be understood as well as a range of more subtle cultural conventions. Further topics covered include: the overall architecture for internationalized products and an outline of an internationalization API; the use of computational linguistics methods; quality assurance, testing and documentation. Appendices contain summaries of the facilities available for localization on major platforms, characteristics of European languages, commercial tools and further reading. The book is aimed at small and medium sized software producers, and the IT departments of multinational corporations

    clValid: An R Package for Cluster Validation

    Get PDF
    The R package clValid contains functions for validating the results of a clustering analysis. There are three main types of cluster validation measures available, "internal", "stability", and "biological". The user can choose from nine clustering algorithms in existing R packages, including hierarchical, K-means, self-organizing maps (SOM), and model-based clustering. In addition, we provide a function to perform the self-organizing tree algorithm (SOTA) method of clustering. Any combination of validation measures and clustering methods can be requested in a single function call. This allows the user to simultaneously evaluate several clustering algorithms while varying the number of clusters, to help determine the most appropriate method and number of clusters for the dataset of interest. Additionally, the package can automatically make use of the biological information contained in the Gene Ontology (GO) database to calculate the biological validation measures, via the annotation packages available in Bioconductor. The function returns an object of S4 class "clValid", which has summary, plot, print, and additional methods which allow the user to display the optimal validation scores and extract clustering results.

    Fuzzy Substring Matching: On-device Fuzzy Friend Search at Snapchat

    Full text link
    About 50% of all queries on Snapchat app are targeted at finding the right friend to interact with. Since everyone has a unique list of friends and that list is not very large (maximum a few thousand), it makes sense to perform this search locally, on users' devices. In addition, the friend list is already available for other purposes, such as showing the chat feed, and the latency savings can be significant by avoiding a server round-trip call. Historically, we resorted to substring matching, ranking prefix matches at the top of the result list. Introducing the ability to perform fuzzy search on a resource-constrained device and in the environment where typo's are prevalent is both prudent and challenging. In this paper, we describe our efficient and accurate two-step approach to fuzzy search, characterized by a skip-bigram retrieval layer and a novel local Levenshtein distance computation used for final ranking

    RankAggreg, an R package for weighted rank aggregation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Researchers in the field of bioinformatics often face a challenge of combining several ordered lists in a proper and efficient manner. Rank aggregation techniques offer a general and flexible framework that allows one to objectively perform the necessary aggregation. With the rapid growth of high-throughput genomic and proteomic studies, the potential utility of rank aggregation in the context of meta-analysis becomes even more apparent. One of the major strengths of rank-based aggregation is the ability to combine lists coming from different sources and platforms, for example different microarray chips, which may or may not be directly comparable otherwise.</p> <p>Results</p> <p>The <it>RankAggreg </it>package provides two methods for combining the ordered lists: the Cross-Entropy method and the Genetic Algorithm. Two examples of rank aggregation using the package are given in the manuscript: one in the context of clustering based on gene expression, and the other one in the context of meta-analysis of prostate cancer microarray experiments.</p> <p>Conclusion</p> <p>The two examples described in the manuscript clearly show the utility of the <it>RankAggreg </it>package in the current bioinformatics context where ordered lists are routinely produced as a result of modern high-throughput technologies.</p

    clValid: An R Package for Cluster Validation

    Get PDF
    The R package clValid contains functions for validating the results of a clustering analysis. There are three main types of cluster validation measures available, "internal", "stability", and "biological". The user can choose from nine clustering algorithms in existing R packages, including hierarchical, K-means, self-organizing maps (SOM), and model-based clustering. In addition, we provide a function to perform the self-organizing tree algorithm (SOTA) method of clustering. Any combination of validation measures and clustering methods can be requested in a single function call. This allows the user to simultaneously evaluate several clustering algorithms while varying the number of clusters, to help determine the most appropriate method and number of clusters for the dataset of interest. Additionally, the package can automatically make use of the biological information contained in the Gene Ontology (GO) database to calculate the biological validation measures, via the annotation packages available in Bioconductor. The function returns an object of S4 class "clValid", which has summary, plot, print, and additional methods which allow the user to display the optimal validation scores and extract clustering results
    • …
    corecore