942,795 research outputs found
Normalization of Dutch user-generated content
Abstract This paper describes a phrase-based machine translation approach to normalize Dutch user-generated content (UGC). We compiled a corpus of three different social media genres (text messages, message board posts and tweets) to have a sample of this recent domain. We describe the various characteristics of this noisy text material and explain how it has been manually normalized using newly developed guidelines. For the automatic normalization task we focus on text messages, and find that a cascaded SMT system where a token-based module is followed by a translation at the character level gives the best word error rate reduction. After these initial experiments, we investigate the system's robustness on the complete domain of UGC by testing it on the other two social media genres, and find that the cascaded approach performs best on these genres as well. To our knowledge, we deliver the first proof-of-concept system for Dutch UGC normalization, which can serve as a baseline for future work
An Equilibrium Model of User Generated Content
This paper considers the joint creation and consumption of content on
user generated content platforms (e.g., reviews or articles, chat,
videos, etc.). On these platforms, users’ utilities depend upon
the participation of others; hence, users’ expectations regarding
the participation of others on the site becomes germane to their own
involvement levels. Yet these beliefs are often assumed to be fixed.
Accordingly, we develop a dynamic rational expectations equilibrium
model of joint consumption and generation of information. We estimate
the model on a novel data set from a large Internet forum site and use
the model to offer recommendations regarding site strategy. Results
indicate that beliefs play a major role in UGC, ignoring these beliefs
leads to erroneous inferences about consumer behavior, and that these
beliefs have an important implications for the marketing strategy of UGC
sites. We find that user and site generated content can be either
strategic complements or substitutes depending on whether the
competition for existing readers exceeds the potential to attract new
ones. In our data, the competitive effect substantially dilutes the
market expansion effect of site generated content. Likewise, past and
current content can also be either strategic substitutes or complements.
Results indicate more durable content increases overall site
participation, suggesting that the site should invest in making past
information easier to find (via better search or page design). Third,
because content consumption and generation interact, it is unclear which
factor dominates in network growth. We find that decreasing content
consumption costs (perhaps by changing site design or via search tools)
enhances site engagement more than decreasing content generating costs.
Overall, enhancing content durability and reducing content consumption
cost appear to be the most effective strategies for increasing site visitation
Online Terrorist Speech, Direct Government Regulation, and the Communications Decency Act
The Communications Decency Act (CDA) provides Internet platforms complete liability protection from user-generated content. This Article discusses the costs of this current legal framework and several potential solutions. It proposes three modifications to the CDA that would use a carrot and stick to incentivize companies to take a more active role in addressing some of the most blatant downsides of user-generated content on the Internet. Despite the modest nature of these proposed changes, they would have a significant impact
- …
