731 research outputs found

    Community based Question Answer Detection

    Get PDF
    Each day, millions of people ask questions and search for answers on the World Wide Web. Due to this, the Internet has grown to a world wide database of questions and answers, accessible to almost everyone. Since this database is so huge, it is hard to find out whether a question has been answered or even asked before. As a consequence, users are asking the same questions again and again, producing a vicious circle of new content which hides the important information. One platform for questions and answers are Web forums, also known as discussion boards. They present discussions as item streams where each item contains the contribution of one author. These contributions contain questions and answers in human readable form. People use search engines to search for information on such platforms. However, current search engines are neither optimized to highlight individual questions and answers nor to show which questions are asked often and which ones are already answered. In order to close this gap, this thesis introduces the \\emph{Effingo} system. The Effingo system is intended to extract forums from around the Web and find question and answer items. It also needs to link equal questions and aggregate associated answers. That way it is possible to find out whether a question has been asked before and whether it has already been answered. Based on these information it is possible to derive the most urgent questions from the system, to determine which ones are new and which ones are discussed and answered frequently. As a result, users are prevented from creating useless discussions, thus reducing the server load and information overload for further searches. The first research area explored by this thesis is forum data extraction. The results from this area are intended be used to create a database of forum posts as large as possible. Furthermore, it uses question-answer detection in order to find out which forum items are questions and which ones are answers and, finally, topic detection to aggregate questions on the same topic as well as discover duplicate answers. These areas are either extended by Effingo, using forum specific features such as the user graph, forum item relations and forum link structure, or adapted as a means to cope with the specific problems created by user generated content. Such problems arise from poorly written and very short texts as well as from hidden or distributed information

    A Corpus-assisted Discourse Analysis of Music-related Practices Discussed within Chipmusic.org

    Get PDF
    abstract: This study examined discussion forum posts within a website dedicated to a medium and genre of music (chiptunes) with potential for music-centered making, a phrase I use to describe maker culture practices that revolve around music-related purposes. Three research questions guided this study: (1) What chiptune-related practices did members of chipmusic.org discuss between December 30th, 2009 and November 13th, 2017? (2) What do chipmusic.org discussion forum posts reveal about the multidisciplinary aspects of chiptunes? (3) What import might music-centered making evident within chipmusic.org discussion forum posts hold for music education? To address these research questions, I engaged in corpus-assisted discourse analysis tools and techniques to reveal and analyze patterns of discourse within 245,098 discussion forum posts within chipmusic.org. The analysis cycle consisted of (a) using corpus analysis techniques to reveal patterns of discourse across and within data consisting of 10,892,645 words, and (b) using discourse analysis techniques for a close reading of revealed patterns. Findings revealed seven interconnected themes of chiptune-related practices: (a) composition practices, (b) performance practices, (c) maker practices, (d) coding practices, (e) entrepreneurial practices, (f), visual art practices, and (g) community practices. Members of chipmusic.org primarily discussed composing and performing chiptunes on a variety of instruments, as well as through retro computer and video game hardware. Members also discussed modifying and creating hardware and software for a multitude of electronic devices. Some members engaged in entrepreneurial practices to promote, sell, buy, and trade with other members. Throughout each of the revealed themes, members engaged in visual art practices, as well as community practices such as collective learning, collaborating, constructive criticism, competitive events, and collective efficacy. Findings suggest the revealed themes incorporated practices from a multitude of academic disciplines or fields of study for music-related purposes. However, I argue that many of the music-related practices people discussed within chipmusic.org are not apparent within music education discourse, curricula, or standards. I call for an expansion of music education discourse and practices to include additional ways of being musical through practices that might borrow from multiple academic disciplines or fields of study for music-related purposes.Dissertation/ThesisDoctoral Dissertation Music Education 201

    Investigating and extending the methods in automated opinion analysis through improvements in phrase based analysis

    Get PDF
    Opinion analysis is an area of research which deals with the computational treatment of opinion statement and subjectivity in textual data. Opinion analysis has emerged over the past couple of decades as an active area of research, as it provides solutions to the issues raised by information overload. The problem of information overload has emerged with the advancements in communication technologies which gave rise to an exponential growth in user generated subjective data available online. Opinion analysis has a rich set of applications which are used to enable opportunities for organisations such as tracking user opinions about products, social issues in communities through to engagement in political participation etc.The opinion analysis area shows hyperactivity in recent years and research at different levels of granularity has, and is being undertaken. However it is observed that there are limitations in the state-of-the-art, especially as dealing with the level of granularities on their own does not solve current research issues. Therefore a novel sentence level opinion analysis approach utilising clause and phrase level analysis is proposed. This approach uses linguistic and syntactic analysis of sentences to understand the interdependence of words within sentences, and further uses rule based analysis for phrase level analysis to calculate the opinion at each hierarchical structure of a sentence. The proposed opinion analysis approach requires lexical and contextual resources for implementation. In the context of this Thesis the approach is further presented as part of an extended unifying framework for opinion analysis resulting in the design and construction of a novel corpus. The above contributions to the field (approach, framework and corpus) are evaluated within the Thesis and are found to make improvements on existing limitations in the field, particularly with regards to opinion analysis automation. Further work is required in integrating a mechanism for greater word sense disambiguation and in lexical resource development

    Sticks and Stones May Break My Bones but Words Will Never Hurt Me...Until I See Them: A Qualitative Content Analysis of Trolls in Relation to the Gricean Maxims and (IM)Polite Virtual Speech Acts

    Get PDF
    The troll is one of the most obtrusive and disruptive bad actors on the internet. Unlike other bad actors, the troll interacts on a more personal and intimate level with other internet users. Social media platforms, online communities, comment boards, and chatroom forums provide them with this opportunity. What distinguishes these social provocateurs from other bad actors are their virtual speech acts and online behaviors. These acts aim to incite anger, shame, or frustration in others through the weaponization of words, phrases, and other rhetoric. Online trolls come in all forms and use various speech tactics to insult and demean their target audiences. The goal of this research is to investigate trolls\u27 virtual speech acts and the impact of troll-like behaviors on online communities. Using Gricean maxims and politeness theory, this study seeks to identify common vernacular, word usage, and other language behaviors that trolls use to divert the conversation, insult others, and possibly affect fellow internet users’ mental health and well-being

    Capitalization of Feminine Beauty on Chinese Social Media

    Get PDF

    4th. International Conference on Advanced Research Methods and Analytics (CARMA 2022)

    Full text link
    Research methods in economics and social sciences are evolving with the increasing availability of Internet and Big Data sources of information. As these sources, methods, and applications become more interdisciplinary, the 4th International Conference on Advanced Research Methods and Analytics (CARMA) is a forum for researchers and practitioners to exchange ideas and advances on how emerging research methods and sources are applied to different fields of social sciences as well as to discuss current and future challenges. Due to the covid pandemic, CARMA 2022 is planned as a virtual and face-to-face conference, simultaneouslyDoménech I De Soria, J.; Vicente Cuervo, MR. (2022). 4th. International Conference on Advanced Research Methods and Analytics (CARMA 2022). Editorial Universitat Politècnica de València. https://doi.org/10.4995/CARMA2022.2022.1595

    Grounding event references in news

    Get PDF
    Events are frequently discussed in natural language, and their accurate identification is central to language understanding. Yet they are diverse and complex in ontology and reference; computational processing hence proves challenging. News provides a shared basis for communication by reporting events. We perform several studies into news event reference. One annotation study characterises each news report in terms of its update and topic events, but finds that topic is better consider through explicit references to background events. In this context, we propose the event linking task which—analogous to named entity linking or disambiguation—models the grounding of references to notable events. It defines the disambiguation of an event reference as a link to the archival article that first reports it. When two references are linked to the same article, they need not be references to the same event. Event linking hopes to provide an intuitive approximation to coreference, erring on the side of over-generation in contrast with the literature. The task is also distinguished in considering event references from multiple perspectives over time. We diagnostically evaluate the task by first linking references to past, newsworthy events in news and opinion pieces to an archive of the Sydney Morning Herald. The intensive annotation results in only a small corpus of 229 distinct links. However, we observe that a number of hyperlinks targeting online news correspond to event links. We thus acquire two large corpora of hyperlinks at very low cost. From these we learn weights for temporal and term overlap features in a retrieval system. These noisy data lead to significant performance gains over a bag-of-words baseline. While our initial system can accurately predict many event links, most will require deep linguistic processing for their disambiguation

    Developing and Validating the Secondary Literacy Professionals Needs Assessment Matrix

    Get PDF
    The purpose of this study was to develop and validate a needs assessment matrix for secondary specialized literacy professionals that identified the professional learning needs of literacy coaches. This tool was developed in order to inform school districts and secondary specialized literacy professionals about the types of professional learning support they will need for them to effectively meet the literacy needs of teachers in secondary schools. The Secondary Literacy Professionals Needs Assessment Matrix (SLPNAM) was created using a variety of methods. A synthesis of literature regarding school improvement, adolescent literacy, 21st century skills, adult learning, literacy coaching and the 2017 International Literacy Association\u27s Standards for Specialized Literacy Professionals was used to provide the conceptual framework for the SLPNAM. The SLPNAM items were developed by interviewing coaching and content experts, going through several iterations before the final instrument was developed. Construct validity was established through exploratory factor analysis, and internal reliability was determined through Cronbach\u27s Alpha. Sixty-four participants from 18 school districts in Florida responded to the SLPNAM. Data analysis indicated that the SLPNAM had a high level of internal reliability, and data reduction was used to ensure that items correlated with constructs it was intended to correlate with. Data from the exploratory factor analysis of the SLPNAM confirmed that construct validity was established. The results from this study provide opportunities for school districts to differentiate professional learning for literacy professionals. It also provides data for school administrators to define the role of the coach and assists secondary literacy professionals in setting professional learning goals specific to their roles

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF

    Enhanced web-based summary generation for search.

    Get PDF
    After a user types in a search query on a major search engine, they are presented with a number of search results. Each search result is made up of a title, brief text summary and a URL. It is then the user\u27s job to select documents for further review. Our research aims to improve the accuracy of users selecting relevant documents by improving the way these web pages are summarized. Improvements in accuracy will lead to time improvements and user experience improvements. We propose ReClose, a system for generating web document summaries. ReClose generates summary content through combining summarization techniques from query-biased and query-independent summary generation. Query-biased summaries generally provide query terms in context. Query-independent summaries focus on summarizing documents as a whole. Combining these summary techniques led to a 10% improvement in user decision making over Google generated summaries. Color-coded ReClose summaries provide keyword usage depth at a glance and also alert users to topic departures. Color-coding further enhanced ReClose results and led to a 20% improvement in user decision making over Google generated summaries. Many online documents include structure and multimedia of various forms such as tables, lists, forms and images. We propose to include this structure in web page summaries. We found that the expert user was insignificantly slowed in decision making while the majority of average users made decisions more quickly using summaries including structure without any decrease in decision accuracy. We additionally extended ReClose for use in summarizing large numbers of tweets in tracking flu outbreaks in social media. The resulting summaries have variable length and are effective at summarizing flu related trends. Users of the system obtained an accuracy of 0.86 labeling multi-tweet summaries. This showed that the basis of ReClose is effective outside of web documents and that variable length summaries can be more effective than fixed length. Overall the ReClose system provides unique summaries that contain more informative content than current search engines produce, highlight the results in a more meaningful way, and add structure when meaningful. The applications of ReClose extend far beyond search and have been demonstrated in summarizing pools of tweets
    • …
    corecore