58 research outputs found
A Stochastic Model for Estimating the Number of Offenders and Targets on Snapchat Platform
Snapchat, like many other social platforms, provides mechanisms for its users
to report content and private/public interactions that violate their sense of
safety and decency. From our experience and common sense, we can safely assume
that not everybody makes an effort to report, leaving potentially a large
number of offending users and content unnoticed. The goal of this work is to
directly estimate the probability of someone reporting on Snapchat using
current in-app reporting options and, thereby, to provide estimates of the
total prevalence (count) of offenders and users subjected to their unwanted,
unwelcome or unsafe interactions
Query Processing at Snapchat: How we Handle Query Completion, Suggestion and Localization
From the Publisher:Software is a commodity being sold across diverse language
and cultural groups, whether in the commercial marketplace, or as customized
applications. Developers must structure their applications so that they can be
readily and cheaply localized for sale in this range of markets. Obvious
differences such as scripts and languages must be understood as well as a range
of more subtle cultural conventions. Further topics covered include: the
overall architecture for internationalized products and an outline of an
internationalization API; the use of computational linguistics methods; quality
assurance, testing and documentation. Appendices contain summaries of the
facilities available for localization on major platforms, characteristics of
European languages, commercial tools and further reading. The book is aimed at
small and medium sized software producers, and the IT departments of
multinational corporations
Fuzzy Substring Matching: On-device Fuzzy Friend Search at Snapchat
About 50% of all queries on Snapchat app are targeted at finding the right
friend to interact with. Since everyone has a unique list of friends and that
list is not very large (maximum a few thousand), it makes sense to perform this
search locally, on users' devices. In addition, the friend list is already
available for other purposes, such as showing the chat feed, and the latency
savings can be significant by avoiding a server round-trip call. Historically,
we resorted to substring matching, ranking prefix matches at the top of the
result list. Introducing the ability to perform fuzzy search on a
resource-constrained device and in the environment where typo's are prevalent
is both prudent and challenging. In this paper, we describe our efficient and
accurate two-step approach to fuzzy search, characterized by a skip-bigram
retrieval layer and a novel local Levenshtein distance computation used for
final ranking
clValid: An R Package for Cluster Validation
The R package clValid contains functions for validating the results of a clustering analysis. There are three main types of cluster validation measures available, "internal", "stability", and "biological". The user can choose from nine clustering algorithms in existing R packages, including hierarchical, K-means, self-organizing maps (SOM), and model-based clustering. In addition, we provide a function to perform the self-organizing tree algorithm (SOTA) method of clustering. Any combination of validation measures and clustering methods can be requested in a single function call. This allows the user to simultaneously evaluate several clustering algorithms while varying the number of clusters, to help determine the most appropriate method and number of clusters for the dataset of interest. Additionally, the package can automatically make use of the biological information contained in the Gene Ontology (GO) database to calculate the biological validation measures, via the annotation packages available in Bioconductor. The function returns an object of S4 class "clValid", which has summary, plot, print, and additional methods which allow the user to display the optimal validation scores and extract clustering results.
RankAggreg, an R package for weighted rank aggregation
<p>Abstract</p> <p>Background</p> <p>Researchers in the field of bioinformatics often face a challenge of combining several ordered lists in a proper and efficient manner. Rank aggregation techniques offer a general and flexible framework that allows one to objectively perform the necessary aggregation. With the rapid growth of high-throughput genomic and proteomic studies, the potential utility of rank aggregation in the context of meta-analysis becomes even more apparent. One of the major strengths of rank-based aggregation is the ability to combine lists coming from different sources and platforms, for example different microarray chips, which may or may not be directly comparable otherwise.</p> <p>Results</p> <p>The <it>RankAggreg </it>package provides two methods for combining the ordered lists: the Cross-Entropy method and the Genetic Algorithm. Two examples of rank aggregation using the package are given in the manuscript: one in the context of clustering based on gene expression, and the other one in the context of meta-analysis of prostate cancer microarray experiments.</p> <p>Conclusion</p> <p>The two examples described in the manuscript clearly show the utility of the <it>RankAggreg </it>package in the current bioinformatics context where ordered lists are routinely produced as a result of modern high-throughput technologies.</p
clValid: An R Package for Cluster Validation
The R package clValid contains functions for validating the results of a clustering analysis. There are three main types of cluster validation measures available, "internal", "stability", and "biological". The user can choose from nine clustering algorithms in existing R packages, including hierarchical, K-means, self-organizing maps (SOM), and model-based clustering. In addition, we provide a function to perform the self-organizing tree algorithm (SOTA) method of clustering. Any combination of validation measures and clustering methods can be requested in a single function call. This allows the user to simultaneously evaluate several clustering algorithms while varying the number of clusters, to help determine the most appropriate method and number of clusters for the dataset of interest. Additionally, the package can automatically make use of the biological information contained in the Gene Ontology (GO) database to calculate the biological validation measures, via the annotation packages available in Bioconductor. The function returns an object of S4 class "clValid", which has summary, plot, print, and additional methods which allow the user to display the optimal validation scores and extract clustering results
- β¦