942,795 research outputs found

    Normalization of Dutch user-generated content

    Get PDF
    Abstract This paper describes a phrase-based machine translation approach to normalize Dutch user-generated content (UGC). We compiled a corpus of three different social media genres (text messages, message board posts and tweets) to have a sample of this recent domain. We describe the various characteristics of this noisy text material and explain how it has been manually normalized using newly developed guidelines. For the automatic normalization task we focus on text messages, and find that a cascaded SMT system where a token-based module is followed by a translation at the character level gives the best word error rate reduction. After these initial experiments, we investigate the system's robustness on the complete domain of UGC by testing it on the other two social media genres, and find that the cascaded approach performs best on these genres as well. To our knowledge, we deliver the first proof-of-concept system for Dutch UGC normalization, which can serve as a baseline for future work

    User generated content як особливість інтернет-культури

    Get PDF

    An Equilibrium Model of User Generated Content

    Get PDF
    This paper considers the joint creation and consumption of content on user generated content platforms (e.g., reviews or articles, chat, videos, etc.). On these platforms, users’ utilities depend upon the participation of others; hence, users’ expectations regarding the participation of others on the site becomes germane to their own involvement levels. Yet these beliefs are often assumed to be fixed. Accordingly, we develop a dynamic rational expectations equilibrium model of joint consumption and generation of information. We estimate the model on a novel data set from a large Internet forum site and use the model to offer recommendations regarding site strategy. Results indicate that beliefs play a major role in UGC, ignoring these beliefs leads to erroneous inferences about consumer behavior, and that these beliefs have an important implications for the marketing strategy of UGC sites. We find that user and site generated content can be either strategic complements or substitutes depending on whether the competition for existing readers exceeds the potential to attract new ones. In our data, the competitive effect substantially dilutes the market expansion effect of site generated content. Likewise, past and current content can also be either strategic substitutes or complements. Results indicate more durable content increases overall site participation, suggesting that the site should invest in making past information easier to find (via better search or page design). Third, because content consumption and generation interact, it is unclear which factor dominates in network growth. We find that decreasing content consumption costs (perhaps by changing site design or via search tools) enhances site engagement more than decreasing content generating costs. Overall, enhancing content durability and reducing content consumption cost appear to be the most effective strategies for increasing site visitation

    Online Terrorist Speech, Direct Government Regulation, and the Communications Decency Act

    Get PDF
    The Communications Decency Act (CDA) provides Internet platforms complete liability protection from user-generated content. This Article discusses the costs of this current legal framework and several potential solutions. It proposes three modifications to the CDA that would use a carrot and stick to incentivize companies to take a more active role in addressing some of the most blatant downsides of user-generated content on the Internet. Despite the modest nature of these proposed changes, they would have a significant impact
    corecore