5 research outputs found
Using Social Media to Explore Mental Health-Related Behaviors and Discussions among Young Adults
Abstract There have been recurring reports of online harassment and abuse among adolescents and young adults through Anonymous Social Networking websites (ASNs). We explored discussions related to social and mental health behaviors among college students, including cyberbullying on the popular ASN, Yik Yak. From April 6, 2016, to May 7, 2016, we collected anonymous conversations posted on Yik Yak at 19 universities in four different states. We found that prosocial messages were approximately five times as prevalent as bullying messages. Frequency of cyberbullying messages was positively associated with messages seeking emotional help. We found significant geographic variation in the frequency of messages offering supportive versus bullying messages. Across campuses bullying and political discussion were positively associated. Results suggest that ASN sites can be mined for real-time data about studentsâ mental health-related attitudes and behaviors. We discuss the implications for using this information in education and healthcare services
Domain-Specific Analysis and Search on User-Generated Content
User-generated content on the Internet has been explosively growing in the current Web 2.0 era. This has been facilitated through widespread user access to the web through mobile devices, the rapid growth of social media applications, and review-based provider websites. The majority of this data is in the form of free text, as in social posts. Storing and querying this massive unstructured textual data is a challenging task that has been studied extensively recently. Current search solutions, such as Google, Bing and Amazonâs internal search, are effective in allowing users to find relevant documents in large collections. Those solutions rely on several content and reputation-based factors including document relevance to the user query. However, capturing and exploiting user intent particularly, in a domain-specific setting, remains an open problem with a variety of research challenges. In this thesis, we study several such settings where existing search techniques are inadequate. In particular, we studied the following subproblems where we are showcasing the benefit of leveraging domain-specific knowledge and user-generated content: 1) We argue for more effective item ranking for crowd-sourced review platforms and provide efficient algorithms to support it. 2) We provide a practical high-quality solution to build domain-specific ontologies from unstructured text documents. We describe our approach and provide fast and simple algorithms to use the generated ontology in extracting domain-specific features from the textual data. In particular, we describe our approach using a real-estate agency case study where domain agents are interested in evaluating the textual property descriptions. 3) We study how to search for similar documents, given a set of input documents, when the data source can only be accessed through a query interface (such as Google search). We propose a ranking model to extract effective query keywords from the input documents to retrieve similar documents through keyword-based search APIs. 4) We use data mining techniques to classify user-generated content on online forums in terms of its characteristics, such as bullying behavior. In particular, we crawl Yik Yak, an anonymous social media, to detect potentially harmful behaviors
Querying Documents Annotated by Interconnected Entities
In a large number of applications, from biomedical literature to social networks, there are collections of text documents that are annotated by interconnected entities, which are related to each other through association graphs. For example, social posts are related through the friendship graph of their authors, and PubMed articles area annotated by Mesh terms, which are related through ontological relationships. To effectively query such collections, in addition to the text content relevance of a document, the semantic distance between the entities of a document and the query must be taken into account. In this paper, we propose a novel query framework, which we refer as keyword querying on graph-annotated documents, and query techniques to answer such queries. Our methods automatically balance the impact of the graph entities and the text content in the ranking. Our qualitative evaluation on real dataset shows that our methods improve the ranking quality compared to baseline ranking systems