23 research outputs found
Social Media Operationalized for GIS: The Prequel
With social media a de facto global communication channel used to disseminate news, entertainment, and one’s self-revelations, the latter contains double-talk, peculiar insight, and contextual observation about real-world events. The primary objective is to propose a novel pipeline to classify a tweet as either “useful” or “not useful” by using widely-accepted Natural Language Processing (NLP) techniques, and measure the effect of such method based on the change in performance of a Geographical Information System (GIS) artifact. A 1,000 tweet sample is manually tagged and compared to an innovative social media grammar applied by a rule-based social media NLP pipeline. Evaluation underpins answering, prior to content analysis of a tweet, does a method exist to support identifying a tweet as “useful” for subsequent processing? Indeed, “useful” tweet identification via NLP returned precision of 0.9256, recall of 0.6590, and F-measure of 0.7699; consequently GIS social media processing increased 0.2194 over baseline
Recommended from our members
GIS Investigation of Crime Prediction with an Operationalized Tweet Corpus
Social media as the de facto communication channel is being used to disseminate one’s diurnal self-revelations. This profound discovery often contains double-talk, peculiar insights, or contextual information about real-world events. Natural language processing is regularly used to uncover both obvious and latent knowledge claims within disclosures published amid the complex environment. For example, a perpetrator with first-hand knowledge of their criminal incident uses social media to post critical information about it. A geographic information system (GIS) is capable of large-scale point data analysis and possesses methods that enable dataset processing, evaluation, and automatic spatial visualization. Such an artifact—fused with traditional environmental criminology theory and social media—erects guidelines, tools, and models for substantive construction and evaluation of GIS crime analysis solutions. Provided the social media stream is timely and correctly processed, corrective action can be taken. The construction of a natural language processing social media annotation pipe identifies latent indicators extracted from a social media corpus and is an integral part of societal mishap prediction. Spatial visualizations and regression analyses were used to describe and evaluate project artifacts. As a result, a social media corpus was operationalized, and subsequently used as a proxy for a traditional environmental criminology risk layer in construction of a social media GIS crime analysis artifact. Using such multi-domain collaboration, the artifact was able to increase the predictive crime incident outcome with an overall R-squared increase of 21.94%. This result is the state-of-the-art; there are no other results to compare it to
Veer: Verifying Equivalence of Workflow Versions in Iterative Data Analytics
Data analytics using GUI-based workflows is an iterative process in which an
analyst makes many iterations of changes to refine the workflow, generating a
different version at each iteration. In many cases, the result of executing a
workflow version is equivalent to a result of a prior executed version.
Identifying such equivalence between the execution results of different
workflow versions is important for optimizing the performance of a workflow by
reusing results from a previous run. The size of the workflows and the
complexity of their operators often make existing equivalence verifiers (EVs)
not able to solve the problem. In this paper, we present "Veer," which
leverages the fact that two workflow versions can be very similar except for a
few changes. The solution divides the workflow version pair into small parts,
called windows, and verifies the equivalence within each window by using an
existing EV as a black box. We develop solutions to efficiently generate
windows and verify the equivalence within each window. Our thorough experiments
on real workflows show that Veer is able to not only verify the equivalence of
workflows that cannot be supported by existing EVs but also do the verification
efficiently
Big Social Data and GIS: Visualize Predictive Crime
Social media is a desirable Big Data source used to examine the relationship between crime and social behavior. Observation of this connection is enriched within a geographic information system (GIS) rooted in environmental criminology theory, and produces several different results to substantiate such a claim. This paper presents the construction and implementation of a GIS artifact producing visualization and statistical outcomes to develop evidence that supports predictive crime analysis. An information system research prototype guides inquiry and uses crime as the dependent variable and a social media tweet corpus, operationalized via natural language processing, as the independent variable. This inescapable realization of social media as a predictive crime variable is prudent; researchers and practitioners will better appreciate its capability. Inclusive visual and statistical results are novel, represent state-of-the-art predictive analysis, increase the baseline R2 value by 7.26%, and support future predictive crime-based research when front-run with real-time social media
Quantifying the Offline Interactions between Hosts and Guests of Airbnb
In this paper, the offline interactions between hosts and guests of Airbnb are investigated. While the platform-supported communications between hosts and guests are easily tracked, new solutions are required to quantify the offline interactions. These interactions were investigated through the development of an IT artifact that determines if a review written by a guest includes a mention of a host. Manual labeling of 1,024 randomly selected reviews indicated that 85% of reviews include a reference to a host. Two primary patterns in which hosts are mentioned were discovered. A new method to detect if a host is referenced in a review is proposed. The method is based on automatically detecting these patterns using Word Embeddings and Named Entity Recognition. The method achieved an accuracy score of 91.5% and was applied on thousands of reviews from Airbnb. Results demonstrated that over 80% of reviews include references to hosts
Efficient Approaches for Homing Complex Network Services
Network service providers (NSPs) offer a wide array of network services to their customers at a global scale. In recent years, NSPs have been migrating their infrastructures to a virtualized software-based one enabled by the network functions virtualization (NFV) paradigm. One critical aspect in operating NFVbased network services is homing. Homing (or placement) of virtual network functions (VNFs) on cloud and network service provider (NSP) infrastructures is a crucial step in the orchestration of network services, involving complex interactions with the cloud, SDN and service controllers. Large NSPs process thousands of homing requests submitted by their customers on a daily basis. At such large scale, it is imperative that homing of network services is performed efficiently. In this dissertation, and guided by our extensive discussions and collaboration with a Tier-1 NSP, we identify limitations and challenges across multiple layers of the homing stack. Starting at the bottom of the stack, we look at how it is extremely challenging to provision VNFs in a truly elastic manner – hindering the ability to efficiently manage them. At upper layers, we identify limitations with current approaches that are used for service and cloud controllers to aggregate data from end nodes. We analyze why such approaches are not efficient when deployed at large scale. Finally, we identify several dependency problems that result from deploying distributed instances of the homing service. Accordingly, we design systems that efficiently address such challenges across the homing stack. Specifically, we design a novel stateless architecture for VNFs to provide true elasticity when deployed at NSP cloud sites – allowing VNFs to seamlessly scale and failover. In addition, we propose and design a peer-to-peer search service that offers real-time data retrieval at global scale in an efficient manner. We also design and evaluate a novel homing service that provides quality homing solutions while significantly reducing load on the service and cloud controllers. We extensively evaluate each of these solutions and demonstrate their efficiency in addressing the different challenges across the homing stack
GIS, Big Data, and a Tweet Corpus Operationalized via Natural Language Processing
Whereas ad hoc single domain Big Data inquiry is successful, observation of a multi-domain GIS artifact needs consideration. A GIS solution for multi-domain data analysis must provide visualization and overt statistical analysis tools, e.g., regression capabilities of constituent data streams, in order to enable large-scale dataset processing and evaluation. Such guidelines direct inquiry and creation of a robust GIS artifact considering a social media tweet corpus and a domain specific crime dataset. The tweet corpus is operationalized via natural language processing treatments and used in GIS artifact construction and evaluation. Although results are not statistically significant and visualizing crime data is not novel, learning how to combine the two in predictive ways via GIS is. As such, extensions and possible future work support social media natural language processing techniques and Big Data processing for predictive crime-based incident interactions as front-run by real-time social media analysis