25 research outputs found
Catching a Viral Video
The sharing and re-sharing of videos on social sites, blogs e-mail, and other means has given rise to the phenomenon of viral videos - videos that become popular through internet sharing. In this paper we seek to better understand viral videos on YouTube by analyzing sharing and its relationship to video popularity using millions of YouTube videos. The socialness of a video is quantified by classifying the referrer sources for video views as social (e.g. an emailed link, Facebook referral) or non-social (e.g. a link from related videos). We find that viewership patterns of highly social videos are very different from less social videos. For example, the highly social videos rise to, and fall from, their peak popularity more quickly than less social videos. We also find that not all highly social videos become popular, and not all popular videos are highly social. By using our insights on viral videos we are able develop a method for ranking blogs and websites on their ability to spread viral videos
Recommended from our members
Expert-augmented machine learning.
Machine learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption is limited by the level of trust afforded by given models. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of humans and machines. Here, we present expert-augmented machine learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We used a large dataset of intensive-care patient data to derive 126 decision rules that predict hospital mortality. Using an online platform, we asked 15 clinicians to assess the relative risk of the subpopulation defined by each rule compared to the total sample. We compared the clinician-assessed risk to the empirical risk and found that, while clinicians agreed with the data in most cases, there were notable exceptions where they overestimated or underestimated the true risk. Studying the rules with greatest disagreement, we identified problems with the training data, including one miscoded variable and one hidden confounder. Filtering the rules based on the extent of disagreement between clinician-assessed risk and empirical risk, we improved performance on out-of-sample data and were able to train with less data. EAML provides a platform for automated creation of problem-specific priors, which help build robust and dependable machine-learning models in critical applications
Approximation algorithm for random MAX-k-SAT
Abstract. We provide a rigorous analysis of a greedy approximation algorithm for the maximum random k-SAT (MAX-R-kSAT) problem. The algorithm assigns variables one at a time in a predefined order. A variable is assigned TRUE if it occurs more often positively than negatively; otherwise, it is assigned FALSE. After each variable assignment, problem instance is simplified and a new variable is selected. We show that this algorithm gives a 10/9.5-approximation, improving over the 9/8-approximation given by de la Vega and Karpinski [7]. The new approximation ratio is achieved by using a different algorithm than the one proposed in [7], along with a new upper bound on the maximum number of clauses that can be satisfied in a random k-SAT formula [2].
Computing Genomic Midpoints
This paper proposes a new algorithm for the genomic median problem that combines greedy and stochastic search. Our computational experiments suggest that for more complex problems our algorithm finds better solutions than previous approaches. In particular we find an improved midpoint for a human-mouse-rat comparison with 424 markers. In order to understand why such problems are hard, we explore a phase transition in the complexity of the median problem for random data, associated with the emergence of a giant component in the breakpoint graph