19 research outputs found

    Parallel Methods for Evidence and Trust based Selection and Recommendation of Software Apps from Online Marketplaces

    Get PDF
    With the popularity of various online software marketplaces, third-party vendors are creating many instances of software applications ('apps') for mobile and desktop devices targeting the same set of requirements. This abundance makes the task of selecting and recommending (S&R) apps, with a high degree of assurance, for a specific scenario a significant challenge. The S&R process is a precursor for composing any trusted system made out of such individually selected apps. In addition to feature-based information, about these apps, these marketplaces contain large volumes of user reviews. These reviews contain unstructured user sentiments about app features and the onus of using these reviews in the S&R process is put on the user. This approach is ad-hoc, laborious and typically leads to a superficial incorporation of the reviews in the S&R process by the users. However, due to the large volumes of such reviews and associated computing, these two techniques are not able to provide expected results in real-time or near real-time. Therefore, in this paper, we present two parallel versions (i.e., batch processing and stream processing) of these algorithms and empirically validate their performance using publically available datasets from the Amazon and Android marketplaces. The results of our study show that these parallel versions achieve near real-time performance, when measured as the end-to-end response time, while selecting and recommending apps for specific queries

    ART: A Large Scale Microblogging Data Management System

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Understanding and Optimizing Flash-based Key-value Systems in Data Centers

    Get PDF
    Flash-based key-value systems are widely deployed in today’s data centers for providing high-speed data processing services. These systems deploy flash-friendly data structures, such as slab and Log Structured Merge(LSM) tree, on flash-based Solid State Drives(SSDs) and provide efficient solutions in caching and storage scenarios. With the rapid evolution of data centers, there appear plenty of challenges and opportunities for future optimizations. In this dissertation, we focus on understanding and optimizing flash-based key-value systems from the perspective of workloads, software, and hardware as data centers evolve. We first propose an on-line compression scheme, called SlimCache, considering the unique characteristics of key-value workloads, to virtually enlarge the cache space, increase the hit ratio, and improve the cache performance. Furthermore, to appropriately configure increasingly complex modern key-value data systems, which can have more than 50 parameters with additional hardware and system settings, we quantitatively study and compare five multi-objective optimization methods for auto-tuning the performance of an LSM-tree based key-value store in terms of throughput, the 99th percentile tail latency, convergence time, real-time system throughput, and the iteration process, etc. Last but not least, we conduct an in-depth, comprehensive measurement work on flash-optimized key-value stores with recently emerging 3D XPoint SSDs. We reveal several unexpected bottlenecks in the current key-value store design and present three exemplary case studies to showcase the efficacy of removing these bottlenecks with simple methods on 3D XPoint SSDs. Our experimental results show that our proposed solutions significantly outperform traditional methods. Our study also contributes to providing system implications for auto-tuning the key-value system on flash-based SSDs and optimizing it on revolutionary 3D XPoint based SSDs

    Mining Social Media to Understand Consumers' Health Concerns and the Public's Opinion on Controversial Health Topics.

    Full text link
    Social media websites are increasingly used by the general public as a venue to express health concerns and discuss controversial medical and public health issues. This information could be utilized for the purposes of public health surveillance as well as solicitation of public opinions. In this thesis, I developed methods to extract health-related information from multiple sources of social media data, and conducted studies to generate insights from the extracted information using text-mining techniques. To understand the availability and characteristics of health-related information in social media, I first identified the users who seek health information online and participate in online health community, and analyzed their motivations and behavior by two case studies of user-created groups on MedHelp and a diabetes online community on Twitter. Through a review of tweets mentioning eye-related medical concepts identified by MetaMap, I diagnosed the common reasons of tweets mislabeled by natural language processing tools tuned for biomedical texts, and trained a classifier to exclude non medically-relevant tweets to increase the precision of the extracted data. Furthermore, I conducted two studies to evaluate the effectiveness of understanding public opinions on controversial medical and public health issues from social media information using text-mining techniques. The first study applied topic modeling and text summarization to automatically distill users' key concerns about the purported link between autism and vaccines. The outputs of two methods cover most of the public concerns of MMR vaccines reported in previous survey studies. In the second study, I estimated the public's view on the ac{ACA} by applying sentiment analysis to four years of Twitter data, and demonstrated that the the rates of positive/negative responses measured by tweet sentiment are in general agreement with the results of Kaiser Family Foundation Poll. Finally, I designed and implemented a system which can automatically collect and analyze online news comments to help researchers, public health workers, and policy makers to better monitor and understand the public's opinion on issues such as controversial health-related topics.PhDInformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120714/1/owenliu_1.pd

    Privacy engineering for social networks

    Get PDF
    In this dissertation, I enumerate several privacy problems in online social networks (OSNs) and describe a system called Footlights that addresses them. Footlights is a platform for distributed social applications that allows users to control the sharing of private information. It is designed to compete with the performance of today's centralised OSNs, but it does not trust centralised infrastructure to enforce security properties. Based on several socio-technical scenarios, I extract concrete technical problems to be solved and show how the existing research literature does not solve them. Addressing these problems fully would fundamentally change users' interactions with OSNs, providing real control over online sharing. I also demonstrate that today's OSNs do not provide this control: both user data and the social graph are vulnerable to practical privacy attacks. Footlights' storage substrate provides private, scalable, sharable storage using untrusted servers. Under realistic assumptions, the direct cost of operating this storage system is less than one US dollar per user-year. It is the foundation for a practical shared filesystem, a perfectly unobservable communications channel and a distributed application platform. The Footlights application platform allows third-party developers to write social applications without direct access to users' private data. Applications run in a confined environment with a private-by-default security model: applications can only access user information with explicit user consent. I demonstrate that practical applications can be written on this platform. The security of Footlights user data is based on public-key cryptography, but users are able to log in to the system without carrying a private key on a hardware token. Instead, users authenticate to a set of authentication agents using a weak secret such as a user-chosen password or randomly-assigned 4-digit number. The protocol is designed to be secure even in the face of malicious authentication agents.This work was supported by the Rothermere Foundation and the Natural Sciences and Engineering Research Council of Canada (NSERC)

    Graph database management systems: storage, management and query processing

    Get PDF
    The proliferation of graph data, generated from diverse sources, have given rise to many research efforts concerning graph analysis. Interactions in social networks, publication networks, protein networks, software code dependencies and transportation systems are all examples of graph-structured data originating from a variety of application domains and demonstrating different characteristics. In recent years, graph database management systems (GDBMS) have been introduced for the management and analysis of graph data. Motivated by the growing number of real-life applications making use of graph database systems, this thesis focuses on the effectiveness and efficiency aspects of such systems. Specifically, we study the following topics relevant to graph database systems: (i) modeling large-scale applications in GDBMS; (ii) storage and indexing issues in GDBMS, and (iii) efficient query processing in GDBMS. In this thesis, we adopt two different application scenarios to examine how graph database systems can model complex features and perform relevant queries on each of them. Motivated by the popular application of social network analytics, we selected Twitter, a microblogging platform, to conduct our detailed analysis. Addressing limitations of existing models, we pro- pose a data model for the Twittersphere that proactively captures Twitter-specific interactions. We examine the feasibility of running analytical queries on GDBMS and offer empirical analysis of the performance of the proposed approach. Next, we consider a use case of modeling software code dependencies in a graph database system, and investigate how these systems can support capturing the evolution of a codebase overtime. We study a code comprehension tool that extracts software dependencies and stores them in a graph database. On a versioned graph built using a very large codebase, we demonstrate how existing code comprehension queries can be efficiently processed and also show the benefit of running queries across multiple versions. Another important aspect of this thesis is the study of storage aspects of graph systems. Throughput of many graph queries can be significantly affected by disk I/O performance; therefore graph database systems need to focus on effective graph storage for optimising disk operations. We observe that the locality of edges plays an important role and we address the edge-labeling problem which aims to label both incoming and outgoing edges of a graph maximizing the ‘edge-consecutiveness’ metric. By achieving a better layout and locality of edges on disk, we show that our proposed algorithms result in significantly improved disk I/O performance leading to faster execution of neighbourhood queries. Some applications require the integrated processing of queries from graph and the textual domains within a graph database system. Aggregation of these dimensions facilitates gaining key insights in several application scenarios. For example, in a social network setting, one may want to find the closest k users in the network (graph traversal) who talk about a particular topic A (textual search). Motivated by such practical use cases, in this thesis we study the top-k social-textual ranking query that essentially requires efficient combination of a keyword search query with a graph traversal. We propose algorithms that leverage graph partitioning techniques, based on the premise that socially close users will be placed within the same partition, allowing more localised computations. We show that our proposed approaches are able to achieve significantly better results compared to standard baselines and demonstrating robust behaviour under changing parameters

    Deep learning approach to sentiment analysis in health and well-being

    Get PDF
    Sentiment analysis, also known as opinion mining, is an area of natural language processing which focuses on the classification of the sentiment that is expressed in a written document. Sentiment analysis has found applications in various domains including finance, politics, and health. This thesis is focused on sentiment analysis in the domain of health and well-being. An extensive systematic literature review was carried out to establish the state of the art in sentiment analysis in this domain. This systematic review provides evidence that the state-of-the-art results in sentiment analysis in the domain of health and well-being lags behind that in other domains. Additionally, it revealed that deep learning has not been used to classify the sentiment within the aforementioned domain. Furthermore, we performed a study and showed that the language that is used within the health and well-being domain is biased towards the negative sentiment. Aspect-based sentiment analysis refines the focus of sentiment analysis by classifying the sentiment associated with a specific aspect. Subsequently, we focus specifically on aspect-based sentiment analysis. To support it within the domain of health and well-being we created a dataset consisting of drug reviews, where the aspects were automatically annotated by matching concepts from the Unified Medical Language System. We have successfully shown that graph convolution can effectively utilise the context, represented with syntactic dependencies, to determine the intended sentiment of inherently negative aspects and consequently close the performance gap regardless of the domain. The advent of transformer-based architectures initiated a breakthrough in various tasks in natural language processing, including sentiment analysis. There-fore, we presented an approach to fine-tuning a transformer-based language model for the specific task of aspect-based sentiment analysis. The findings show the evidence that transformer-based models account for syntactic dependencies when classifying the sentiment of the given aspect
    corecore