Search CORE

173 research outputs found

A Stochastic Model for Estimating the Number of Offenders and Targets on Snapchat Platform

Author: Pihur Vasyl
Publication venue
Publication date: 07/11/2022
Field of study

Snapchat, like many other social platforms, provides mechanisms for its users to report content and private/public interactions that violate their sense of safety and decency. From our experience and common sense, we can safely assume that not everybody makes an effort to report, leaving potentially a large number of offending users and content unnoticed. The goal of this work is to directly estimate the probability of someone reporting on Snapchat using current in-app reporting options and, thereby, to provide estimates of the total prevalence (count) of offenders and users subjected to their unwanted, unwelcome or unsafe interactions

arXiv.org e-Print Archive

Query Processing at Snapchat: How we Handle Query Completion, Suggestion and Localization

Author: Pihur Vasyl
Publication venue
Publication date: 04/11/2022
Field of study

From the Publisher:Software is a commodity being sold across diverse language and cultural groups, whether in the commercial marketplace, or as customized applications. Developers must structure their applications so that they can be readily and cheaply localized for sale in this range of markets. Obvious differences such as scripts and languages must be understood as well as a range of more subtle cultural conventions. Further topics covered include: the overall architecture for internationalized products and an outline of an internationalization API; the use of computational linguistics methods; quality assurance, testing and documentation. Appendices contain summaries of the facilities available for localization on major platforms, characteristics of European languages, commercial tools and further reading. The book is aimed at small and medium sized software producers, and the IT departments of multinational corporations

arXiv.org e-Print Archive

clValid: An R Package for Cluster Validation

Author: Guy Brock
Somnath Datta
Susmita Datta
Vasyl Pihur
Publication venue
Publication date
Field of study

The R package clValid contains functions for validating the results of a clustering analysis. There are three main types of cluster validation measures available, "internal", "stability", and "biological". The user can choose from nine clustering algorithms in existing R packages, including hierarchical, K-means, self-organizing maps (SOM), and model-based clustering. In addition, we provide a function to perform the self-organizing tree algorithm (SOTA) method of clustering. Any combination of validation measures and clustering methods can be requested in a single function call. This allows the user to simultaneously evaluate several clustering algorithms while varying the number of clusters, to help determine the most appropriate method and number of clusters for the dataset of interest. Additionally, the package can automatically make use of the biological information contained in the Gene Ontology (GO) database to calculate the biological validation measures, via the annotation packages available in Bioconductor. The function returns an object of S4 class "clValid", which has summary, plot, print, and additional methods which allow the user to display the optimal validation scores and extract clustering results.

Research Papers in Economics

Fuzzy Substring Matching: On-device Fuzzy Friend Search at Snapchat

Author: Pihur Vasyl
Thompson Scott
Publication venue
Publication date: 08/11/2022
Field of study

About 50% of all queries on Snapchat app are targeted at finding the right friend to interact with. Since everyone has a unique list of friends and that list is not very large (maximum a few thousand), it makes sense to perform this search locally, on users' devices. In addition, the friend list is already available for other purposes, such as showing the chat feed, and the latency savings can be significant by avoiding a server round-trip call. Historically, we resorted to substring matching, ranking prefix matches at the top of the result list. Introducing the ability to perform fuzzy search on a resource-constrained device and in the environment where typo's are prevalent is both prudent and challenging. In this paper, we describe our efficient and accurate two-step approach to fuzzy search, characterized by a skip-bigram retrieval layer and a novel local Levenshtein distance computation used for final ranking

arXiv.org e-Print Archive

RankAggreg, an R package for weighted rank aggregation

Author: Datta Somnath
Datta Susmita
Pihur Vasyl
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Researchers in the field of bioinformatics often face a challenge of combining several ordered lists in a proper and efficient manner. Rank aggregation techniques offer a general and flexible framework that allows one to objectively perform the necessary aggregation. With the rapid growth of high-throughput genomic and proteomic studies, the potential utility of rank aggregation in the context of meta-analysis becomes even more apparent. One of the major strengths of rank-based aggregation is the ability to combine lists coming from different sources and platforms, for example different microarray chips, which may or may not be directly comparable otherwise. Results The <it>RankAggreg </it>package provides two methods for combining the ordered lists: the Cross-Entropy method and the Genetic Algorithm. Two examples of rank aggregation using the package are given in the manuscript: one in the context of clustering based on gene expression, and the other one in the context of meta-analysis of prostate cancer microarray experiments. Conclusion The two examples described in the manuscript clearly show the utility of the <it>RankAggreg </it>package in the current bioinformatics context where ordered lists are routinely produced as a result of modern high-throughput technologies.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central