2 research outputs found
CLOUD-BASED MACHINE LEARNING AND SENTIMENT ANALYSIS
The role of a Data Scientist is becoming increasingly ubiquitous as companies and institutions see the need to gain additional insights and information from data to make better decisions to improve the quality-of-service delivery to customers. This thesis document contains three aspects of data science projects aimed at improving tools and techniques used in analyzing and evaluating data. The first research study involved the use of a standard cybersecurity dataset and cloud-based auto-machine learning algorithms were applied to detect vulnerabilities in the network traffic data. The performance of the algorithms was measured and compared using standard evaluation metrics. The second research study involved the use of text-mining social media, specifically Reddit. We mined up to 100,000 comments in multiple subreddits and tested for hate speech via a custom designed version of the Python Vader sentiment analysis package. Our work integrated standard sentiment analysis with Hatebase.org and we demonstrate our new method can better detect hate speech in social media. Following sentiment analysis and hate speech detection, in the third research project, we applied statistical techniques in evaluating the significant difference in text analytics, specifically the sentiment-categories for both lexicon-based software and cloud-based tools. We compared the three big cloud providers, AWS, Azure, and GCP with the standard python Vader sentiment analysis library. We utilized statistical analysis to determine a significant difference between the cloud platforms utilized as well as Vader and demonstrated that each platform is unique in its analysis scoring mechanism
Privacy Intelligence: A Survey on Image Sharing on Online Social Networks
Image sharing on online social networks (OSNs) has become an indispensable
part of daily social activities, but it has also led to an increased risk of
privacy invasion. The recent image leaks from popular OSN services and the
abuse of personal photos using advanced algorithms (e.g. DeepFake) have
prompted the public to rethink individual privacy needs when sharing images on
OSNs. However, OSN image sharing itself is relatively complicated, and systems
currently in place to manage privacy in practice are labor-intensive yet fail
to provide personalized, accurate and flexible privacy protection. As a result,
an more intelligent environment for privacy-friendly OSN image sharing is in
demand. To fill the gap, we contribute a systematic survey of 'privacy
intelligence' solutions that target modern privacy issues related to OSN image
sharing. Specifically, we present a high-level analysis framework based on the
entire lifecycle of OSN image sharing to address the various privacy issues and
solutions facing this interdisciplinary field. The framework is divided into
three main stages: local management, online management and social experience.
At each stage, we identify typical sharing-related user behaviors, the privacy
issues generated by those behaviors, and review representative intelligent
solutions. The resulting analysis describes an intelligent privacy-enhancing
chain for closed-loop privacy management. We also discuss the challenges and
future directions existing at each stage, as well as in publicly available
datasets.Comment: 32 pages, 9 figures. Under revie