106 research outputs found

    Thread Reconstruction in Conversational Data using Neural Coherence Models

    Get PDF
    Discussion forums are an important source of information. They are often used to answer specific questions a user might have and to discover more about a topic of interest. Discussions in these forums may evolve in intricate ways, making it difficult for users to follow the flow of ideas. We propose a novel approach for automatically identifying the underlying thread structure of a forum discussion. Our approach is based on a neural model that computes coherence scores of possible reconstructions and then selects the highest scoring, i.e., the most coherent one. Preliminary experiments demonstrate promising results outperforming a number of strong baseline methods.Comment: Neu-IR: Workshop on Neural Information Retrieval 201

    Adjacency Pair Recognition in Wikipedia Discussions using Lexical Pairs

    Get PDF

    Online community search using thread structure

    Full text link

    Template Induction over Unstructured Email Corpora

    Get PDF
    Unsupervised template induction over email data is a central component in applications such as information extraction, document classification, and auto-reply. The benefits of automatically generating such templates are known for structured data, e.g. machine generated HTML emails. However much less work has been done in performing the same task over unstructured email data. We propose a technique for inducing high quality templates from plain text emails at scale based on the suffix array data structure. We evaluate this method against an industry-standard approach for finding similar content based on shingling, running both algorithms over two corpora: a synthetically created email corpus for a high level of experimental control, as well as user-generated emails from the well-known Enron email corpus. Our experimental results show that the proposed method is more robust to variations in cluster quality than the baseline and templates contain more text from the emails, which would benefit extraction tasks by identifying transient parts of the emails. Our study indicates templates induced using suffix arrays contain approximately half as much noise (measured as entropy) as templates induced using shingling. Furthermore, the suffix array approach is substantially more scalable, proving to be an order of magnitude faster than shingling even for modestly-sized training clusters. Public corpus analysis shows that email clusters contain on average 4 segments of common phrases, where each of the segments contains on average 9 words, thus showing that templatization could help users reduce the email writing effort by an average of 35 words per email in an assistance or auto-reply related task

    Internet... the final frontier: an ethnographic account: exploring the cultural space of the Net from the inside

    Get PDF
    The research project The Internet as a space for interaction, which completed its mission in Autumn 1998, studied the constitutive features of network culture and network organisation. Special emphasis was given to the dynamic interplay of technical and social conventions regarding both the Net’s organisation as well as its change. The ethnographic perspective chosen studied the Internet from the inside. Research concentrated upon three fields of study: the hegemonial operating technology of net nodes (UNIX) the network’s basic transmission technology (the Internet Protocol IP) and a popular communication service (Usenet). The project’s final report includes the results of the three branches explored. Drawing upon the development in the three fields it is shown that changes that come about on the Net are neither anarchic nor arbitrary. Instead, the decentrally organised Internet is based upon technically and organisationally distributed forms of coordination within which individual preferences collectively attain the power of developing into definitive standards. --

    Constructing collective identities in the Internet age: a case study of Taiwanese-based internet forums

    Get PDF
    The thesis presents a case study of asynchronous Taiwanese-based internet forums, aimed at exploring new perspectives in the question of collective identity construction via intemet-forum participation. It develops a discursive-constructivist approach that incorporates the theories and models of Goffman, Butler, Laclau & Mouffe and Melucci, investigating the performative, antagonistic and negotiated dimensions of identities. Methodologically, it deploys a series of analytic tools from linguistics and micro sociology, as well as the methods of content analysis and online ethnography. Focusing on the questions of gay identities and national identities, the case study tracks down ten years of archives of the local gay forums and political forums, examining the ways in which collective identities take form through speech performance and social interactions in cyberspace. The case analysis of the gay forums finds that the internet gives rise to networked online gay communities, where individual gays’ subject-positions are performed. Meanwhile, the forums permit the reconstruction of the Other of the gay community, which ironically results in the creation of an internal Other among the community. Furthermore, the forums allow their grassroots participants to engage in the local gay movement, which eventually leads to change in the public identity of the movement. The case of national identity shows that antagonism between the two oppositional nationalisms in Taiwan penetrates identity practices in this domain; cyberspace is no exception. The local political forums become the space for marking, creating and stigmatising the Other. Nevertheless, they also provide the space for negotiated interactions concerning identity-oriented national projects, as well as facilitate dialogues between Chinese and Taiwanese online participants on the question of Taiwan’s future. To conclude, internet forums do not necessarily lead to the devolution of symbolic and political power of their participants. Mainstream discourses still deeply influence the discourses in cyberspace. Grassroots participation in debates concerning social projects may intervene in decision-making; however, this is dependent on the participants’ access to valid information and the decision makers’ attitudes towards the grassroots forums. Finally, while connecting people together, the internet is also disuniting people in spreading antagonisms and animosity

    A Supervised Approach to Predict the Hierarchical Structure of Conversation Threads for Comments

    Get PDF
    User-generated texts such as comments in social media are rich sources of information. In general, the reply structure of comments is not publicly accessible on the web. Websites present comments as a list in chronological order. This way, some information is lost. A solution for this problem is to reconstruct the thread structure (RTS) automatically. RTS predicts a semantic tree for the reply structure, useful for understanding users’ behaviours and facilitating follow of the actual conversation streams. This paper works on RTS task in blogs, online news agencies, and news websites. These types of websites cover various types of articles reflecting the real-world events. People with different views participate in arguments by writing comments. Comments express opinions, sentiments, or ideas about articles. The reply structure of threads in these types of websites is basically different from threads in the forums, chats, and emails. To perform RTS, we define a set of textual and nontextual features. Then, we use supervised learning to combine these features. The proposed method is evaluated on five different datasets. The accuracy of the proposed method is compared with baselines. The results reveal higher accuracy for our method in comparison with baselines in all datasets
    • …
    corecore