23 research outputs found

    Social Media Operationalized for GIS: The Prequel

    Get PDF
    With social media a de facto global communication channel used to disseminate news, entertainment, and one’s self-revelations, the latter contains double-talk, peculiar insight, and contextual observation about real-world events. The primary objective is to propose a novel pipeline to classify a tweet as either “useful” or “not useful” by using widely-accepted Natural Language Processing (NLP) techniques, and measure the effect of such method based on the change in performance of a Geographical Information System (GIS) artifact. A 1,000 tweet sample is manually tagged and compared to an innovative social media grammar applied by a rule-based social media NLP pipeline. Evaluation underpins answering, prior to content analysis of a tweet, does a method exist to support identifying a tweet as “useful” for subsequent processing? Indeed, “useful” tweet identification via NLP returned precision of 0.9256, recall of 0.6590, and F-measure of 0.7699; consequently GIS social media processing increased 0.2194 over baseline

    Veer: Verifying Equivalence of Workflow Versions in Iterative Data Analytics

    Full text link
    Data analytics using GUI-based workflows is an iterative process in which an analyst makes many iterations of changes to refine the workflow, generating a different version at each iteration. In many cases, the result of executing a workflow version is equivalent to a result of a prior executed version. Identifying such equivalence between the execution results of different workflow versions is important for optimizing the performance of a workflow by reusing results from a previous run. The size of the workflows and the complexity of their operators often make existing equivalence verifiers (EVs) not able to solve the problem. In this paper, we present "Veer," which leverages the fact that two workflow versions can be very similar except for a few changes. The solution divides the workflow version pair into small parts, called windows, and verifies the equivalence within each window by using an existing EV as a black box. We develop solutions to efficiently generate windows and verify the equivalence within each window. Our thorough experiments on real workflows show that Veer is able to not only verify the equivalence of workflows that cannot be supported by existing EVs but also do the verification efficiently

    Big Social Data and GIS: Visualize Predictive Crime

    Get PDF
    Social media is a desirable Big Data source used to examine the relationship between crime and social behavior. Observation of this connection is enriched within a geographic information system (GIS) rooted in environmental criminology theory, and produces several different results to substantiate such a claim. This paper presents the construction and implementation of a GIS artifact producing visualization and statistical outcomes to develop evidence that supports predictive crime analysis. An information system research prototype guides inquiry and uses crime as the dependent variable and a social media tweet corpus, operationalized via natural language processing, as the independent variable. This inescapable realization of social media as a predictive crime variable is prudent; researchers and practitioners will better appreciate its capability. Inclusive visual and statistical results are novel, represent state-of-the-art predictive analysis, increase the baseline R2 value by 7.26%, and support future predictive crime-based research when front-run with real-time social media

    Quantifying the Offline Interactions between Hosts and Guests of Airbnb

    No full text
    In this paper, the offline interactions between hosts and guests of Airbnb are investigated. While the platform-supported communications between hosts and guests are easily tracked, new solutions are required to quantify the offline interactions. These interactions were investigated through the development of an IT artifact that determines if a review written by a guest includes a mention of a host. Manual labeling of 1,024 randomly selected reviews indicated that 85% of reviews include a reference to a host. Two primary patterns in which hosts are mentioned were discovered. A new method to detect if a host is referenced in a review is proposed. The method is based on automatically detecting these patterns using Word Embeddings and Named Entity Recognition. The method achieved an accuracy score of 91.5% and was applied on thousands of reviews from Airbnb. Results demonstrated that over 80% of reviews include references to hosts

    Efficient Approaches for Homing Complex Network Services

    No full text
    Network service providers (NSPs) offer a wide array of network services to their customers at a global scale. In recent years, NSPs have been migrating their infrastructures to a virtualized software-based one enabled by the network functions virtualization (NFV) paradigm. One critical aspect in operating NFVbased network services is homing. Homing (or placement) of virtual network functions (VNFs) on cloud and network service provider (NSP) infrastructures is a crucial step in the orchestration of network services, involving complex interactions with the cloud, SDN and service controllers. Large NSPs process thousands of homing requests submitted by their customers on a daily basis. At such large scale, it is imperative that homing of network services is performed efficiently. In this dissertation, and guided by our extensive discussions and collaboration with a Tier-1 NSP, we identify limitations and challenges across multiple layers of the homing stack. Starting at the bottom of the stack, we look at how it is extremely challenging to provision VNFs in a truly elastic manner – hindering the ability to efficiently manage them. At upper layers, we identify limitations with current approaches that are used for service and cloud controllers to aggregate data from end nodes. We analyze why such approaches are not efficient when deployed at large scale. Finally, we identify several dependency problems that result from deploying distributed instances of the homing service. Accordingly, we design systems that efficiently address such challenges across the homing stack. Specifically, we design a novel stateless architecture for VNFs to provide true elasticity when deployed at NSP cloud sites – allowing VNFs to seamlessly scale and failover. In addition, we propose and design a peer-to-peer search service that offers real-time data retrieval at global scale in an efficient manner. We also design and evaluate a novel homing service that provides quality homing solutions while significantly reducing load on the service and cloud controllers. We extensively evaluate each of these solutions and demonstrate their efficiency in addressing the different challenges across the homing stack

    GIS, Big Data, and a Tweet Corpus Operationalized via Natural Language Processing

    No full text
    Whereas ad hoc single domain Big Data inquiry is successful, observation of a multi-domain GIS artifact needs consideration. A GIS solution for multi-domain data analysis must provide visualization and overt statistical analysis tools, e.g., regression capabilities of constituent data streams, in order to enable large-scale dataset processing and evaluation. Such guidelines direct inquiry and creation of a robust GIS artifact considering a social media tweet corpus and a domain specific crime dataset. The tweet corpus is operationalized via natural language processing treatments and used in GIS artifact construction and evaluation. Although results are not statistically significant and visualizing crime data is not novel, learning how to combine the two in predictive ways via GIS is. As such, extensions and possible future work support social media natural language processing techniques and Big Data processing for predictive crime-based incident interactions as front-run by real-time social media analysis
    corecore