63 research outputs found

    Statistical Aggregation: Theory and Applications

    Get PDF
    Due to their size and complexity, massive data sets bring many computational challenges for statistical analysis, such as overcoming the memory limitation and improving computational efficiency of traditional statistical methods. In the dissertation, I propose the statistical aggregation strategy to conquer such challenges posed by massive data sets. Statistical aggregation partitions the entire data set into smaller subsets, compresses each subset into certain low-dimensional summary statistics and aggregates the summary statistics to approximate the desired computation based on the entire data. Results from statistical aggregation are required to be asymptotically equivalent. Statistical aggregation processes the entire data set part by part, and hence overcomes memory limitation. Moreover, statistical aggregation can also improve the computational efficiency of statistical algorithms with computational complexity at the order of O(Nm): m \u3e 1) or even higher, where N is the size of the data. Statistical aggregation is particularly useful for online analytical processing: OLAP) in data cubes and stream data, where fast response to queries is the top priority. The &ldquo partition-compression-aggregation&rdquo strategy in statistical aggregation actually has been considered previously for OLAP computing in data cubes. But existing research in this area tends to overlook the statistical property of the analysis and aims to obtain identical results from aggregation, which has limited the application of this strategy to very simple analyses. Statistical aggregation instead can support OLAP in more sophisticated statistical analyses. In this dissertation, I apply statistical aggregation to two large families of statistical methods, estimating equation: EE) estimation and U-statistics, develop proper compression-aggregation schemes and show that the statistical aggregation tremendously reduces their computational burden while maintaining their efficiency. I further apply statistical aggregation to U-statistic based estimating equations and propose new estimating equations that need much less computational time but give asymptotically equivalent estimators

    Economics of ‘Tipping’ Button in Social Media: An Empirical Analysis of Content Monetization

    Get PDF
    As the success of social media platforms heavily depends on the amount and the nature of user-generated content, content monetization has been introduced as a mechanism to incentivize users to generate content. In particular, content contributors can be paid (i.e. tipped) by readers who like the story. We adopted difference-in-differences approach with robustness matching estimator to examine the impact of content monetization. Our results confirm that the content monetization effectively motivate content demand and supply and also improves content quality. Furthermore, such economic incentives have a spillover effect on ordinary weibo users before they are eligible to adopt “tipping” function. However, the verified users who have already been the experts or celebrities in teh society may be depressed after open application of the program. This result suggests that start-ups are able to survive and earn profit even in markets that are dominated by famous celebrities because of the monetization mechanism

    Spillover Effect of Content Marketing in E-commerce Platform under the Fan Economy Era

    Get PDF
    As the proliferation of social media and live streaming, online celebrity endorsement is a common practice of content marketing in e-commerce platform. Despite the prevalent use of social media and online community, empirical research investigating the economic values of user-generated-content (UGC) and marketer-generated-content (MGC) still lags. This study seeks to contribute theoretically and practically to an understanding of how online celebrity endorsement and fans interaction behaviors affect e-commerce sales. We adopt cross-sectional regression to assess the economic value of online celebrity endorsement, and we employ panel vector autoregressive model to explain the dynamic relationship between marketers’ and consumers’ content marketing behaviors and e-commerce product sales. Empirical results highlight that the interaction within fans community has spillover effect on content marketing under “Fan Economy” era

    Internet Celebrity Endorsement: How Internet Celebrities Bring Referral Traffic to E-commerce Sites?

    Get PDF
    Endorsement marketing has been widely used to generate consumer attention, interest, and purchase behaviors among targeted audience of celebrities. Internet celebrities who become famous by means of the Internet are more dependent on strategy intimacy to appeal to their followers. Limited studies have addressed the new business models in Internet celebrities economy: content advertising and online retailing. Our study aims to examine how Internet celebrity endorsement influencing the consumers’ clickon behaviors and purchase behaviors in the context of e-commerce business. Results suggest that content marketing using Internet celebrity endorsement exhibit a significant role in bringing referral traffic to e-commerce sites but less helpful to boost sales. The impact of Internet celebrity endorsement on consumers’ click-on decisions is U-shaped, but the role of Internet celebrities as online retailers will “shape-flip” such relationship to a negative linear relation. Therefore, Internet celebrity endorsement provides effective ways to bring referral traffic to e-commerce sites

    Feature screening for clustering analysis

    Full text link
    In this paper, we consider feature screening for ultrahigh dimensional clustering analyses. Based on the observation that the marginal distribution of any given feature is a mixture of its conditional distributions in different clusters, we propose to screen clustering features by independently evaluating the homogeneity of each feature's mixture distribution. Important cluster-relevant features have heterogeneous components in their mixture distributions and unimportant features have homogeneous components. The well-known EM-test statistic is used to evaluate the homogeneity. Under general parametric settings, we establish the tail probability bounds of the EM-test statistic for the homogeneous and heterogeneous features, and further show that the proposed screening procedure can achieve the sure independent screening and even the consistency in selection properties. Limiting distribution of the EM-test statistic is also obtained for general parametric distributions. The proposed method is computationally efficient, can accurately screen for important cluster-relevant features and help to significantly improve clustering, as demonstrated in our extensive simulation and real data analyses

    Online Bayesian Analysis

    Get PDF
    In the last few years, there has been active research on aggregating advanced statistical measures in multidimensional data cubes from partitioned subsets of data. In this paper, we propose an online compression and aggregation scheme to support Bayesian estimations in data cubes based on the asymptotic properties of Bayesian statistics. In the proposed approach, we compress each data segment by retaining only the model parameters and a small amount of auxiliary measures. We then develop an aggregation formula that allows us to reconstruct the Bayesian estimation from partitioned segments with a small approximation error. We show that the Bayesian estimates and the aggregated Bayesian estimates are asymptotically equivalent

    BIC-seq: a fast algorithm for detection of copy number alterations based on high-throughput sequencing data

    Get PDF
    DNA copy number alterations (CNA), which are amplifications and deletions of certain regions in the genome, play an important role in the pathogenesis of cancer and have been shown to be associated with other diseases such as autism, schizophrenia and obesity

    Elongated Physiological Structure Segmentation via Spatial and Scale Uncertainty-aware Network

    Full text link
    Robust and accurate segmentation for elongated physiological structures is challenging, especially in the ambiguous region, such as the corneal endothelium microscope image with uneven illumination or the fundus image with disease interference. In this paper, we present a spatial and scale uncertainty-aware network (SSU-Net) that fully uses both spatial and scale uncertainty to highlight ambiguous regions and integrate hierarchical structure contexts. First, we estimate epistemic and aleatoric spatial uncertainty maps using Monte Carlo dropout to approximate Bayesian networks. Based on these spatial uncertainty maps, we propose the gated soft uncertainty-aware (GSUA) module to guide the model to focus on ambiguous regions. Second, we extract the uncertainty under different scales and propose the multi-scale uncertainty-aware (MSUA) fusion module to integrate structure contexts from hierarchical predictions, strengthening the final prediction. Finally, we visualize the uncertainty map of final prediction, providing interpretability for segmentation results. Experiment results show that the SSU-Net performs best on cornea endothelial cell and retinal vessel segmentation tasks. Moreover, compared with counterpart uncertainty-based methods, SSU-Net is more accurate and robust
    • 

    corecore