93 research outputs found
Harnessing the power of the general public for crowdsourced business intelligence: a survey
International audienceCrowdsourced business intelligence (CrowdBI), which leverages the crowdsourced user-generated data to extract useful knowledge about business and create marketing intelligence to excel in the business environment, has become a surging research topic in recent years. Compared with the traditional business intelligence that is based on the firm-owned data and survey data, CrowdBI faces numerous unique issues, such as customer behavior analysis, brand tracking, and product improvement, demand forecasting and trend analysis, competitive intelligence, business popularity analysis and site recommendation, and urban commercial analysis. This paper first characterizes the concept model and unique features and presents a generic framework for CrowdBI. It also investigates novel application areas as well as the key challenges and techniques of CrowdBI. Furthermore, we make discussions about the future research directions of CrowdBI
Multivariate Spatiotemporal Hawkes Processes and Network Reconstruction
There is often latent network structure in spatial and temporal data and the
tools of network analysis can yield fascinating insights into such data. In
this paper, we develop a nonparametric method for network reconstruction from
spatiotemporal data sets using multivariate Hawkes processes. In contrast to
prior work on network reconstruction with point-process models, which has often
focused on exclusively temporal information, our approach uses both temporal
and spatial information and does not assume a specific parametric form of
network dynamics. This leads to an effective way of recovering an underlying
network. We illustrate our approach using both synthetic networks and networks
constructed from real-world data sets (a location-based social media network, a
narrative of crime events, and violent gang crimes). Our results demonstrate
that, in comparison to using only temporal data, our spatiotemporal approach
yields improved network reconstruction, providing a basis for meaningful
subsequent analysis --- such as community structure and motif analysis --- of
the reconstructed networks
Relating Group Size and Posting Activity of an Online Community of Financial Investors: Regularities and Seasonal Patterns
Group size can potentially affect collective activity and individual propensity to contribute to collective goods. Mancur Olson, in his Logic of Collective Action, argued that individual contribution to a collective good tends to be lower in groups of large size. Today, online communication platforms represent an interesting ground to study such collaborative dynamics under possibly different conditions (e.g., lower costs related to gather and share information). This paper examines the relationship between group size and activity in an online financial forum, where users invest time in sharing news, analysis and comments with other investors. We looked at about 24 million messages shared in more than ten years in the finanzaonline.com online forum. We found that the relationship between the number of active users and the number of posts shared by those users is of the power type (with exponent α\u3e1) and is subject to periodic fluctuations, mostly driven by hour-of-the-day and day-of-the-week effects. The daily patterns of the exponent showed a divergence between working week and weekend days. In general, the exponent was lower before noon, where investors are typically interested in market news, higher in the late afternoon, where markets are closing and investors need better understanding of the situation. Further research is needed, especially at the micro level, to dissect the mechanisms behind these regularities
A Tensor-based eLSTM Model to Predict Stock Price Using Financial News
Stock market prediction has attracted much attention from both academia and business. Both traditional finance and behavioral finance believe that market information affects stock movements. Typically, market information consists of fundamentals and news information. To study how information shapes stock markets, common strategies are to concatenate various information into one compound vector. However, such concatenating ignores the interlinks between fundamentals and news information. In addition, the fundamental data are continuous values sampled at fixed time intervals, while news information occurred randomly. Such heterogeneity leads to miss valuable information partially or twist the feature spaces. In this article, we propose a tensor-based event-LSTM (eLSTM) to solve these two challenges. In particular, we model the market information space with tensors instead of concatenated vectors and balance the heterogeneity of different data types with event-driven mechanism in LSTM. Experiments performed on an entire year data of China Securities markets demonstrate the supreme of the proposed approach over the state-of-the-art algorithms including AZfinText, eMAQT, and TeSIA
Social media and GIScience: Collection, analysis, and visualization of user-generated spatial data
Over the last decade, social media platforms have eclipsed the height of popular culture and communication technology, which, in combination with widespread access to GIS-enabled hardware (i.e. mobile phones), has resulted in the continuous creation of massive amounts of user-generated spatial data. This thesis explores how social media data have been utilized in GIS research and provides a commentary on the impacts of this next iteration of technological change with respect to GIScience. First, the roots of GIS technology are traced to set the stage for the examination of social media as a technological catalyst for change in GIScience. Next, a scoping review is conducted to gather and synthesize a summary of methods used to collect, analyze, and visualize this data. Finally, a case study exploring the spatio-temporality of crowdfunding behaviours in Canada during the COVID-19 pandemic is presented to demonstrate the utility of social media data in spatial research
Hybrid intelligence for data mining
Today, enormous amount of data are being recorded in all kinds of activities. This sheer size provides an excellent opportunity for data scientists to retrieve valuable information using data mining techniques. Due
to the complexity of data in many neoteric problems, one-size-fits-all solutions are seldom able to provide satisfactory answers. Although the studies of data mining have been active, hybrid techniques are rarely scrutinized in detail. Currently, not many techniques can handle time-varying properties while performing their core functions, neither do they retrieve and combine information from heterogeneous dimensions, e.g., textual and numerical horizons. This thesis summarizes our investigations on hybrid methods to provide data mining solutions to problems involving non-trivial datasets, such as trajectories, microblogs, and financial data. First,
time-varying dynamic Bayesian networks are extended to consider both causal and dynamic regularization requirements.
Combining with density-based clustering, the enhancements overcome the difficulties in modeling spatial-temporal data where heterogeneous patterns, data sparseness and distribution skewness are common.
Secondly, topic-based methods are proposed for emerging outbreak and virality predictions on microblogs. Complicated models that consider structural details are popular while others might have taken overly simplified
assumptions to sacrifice accuracy for efficiency. Our proposed virality prediction solution delivers the benefits of both worlds. It considers the important characteristics of a structure yet without the burden of fine
details to reduce complexity. Thirdly, the proposed topic-based approach for microblog mining is extended for sentiment prediction problems in finance. Sentiment-of-topic models are learned from both commentaries
and prices for better risk management. Moreover, previously proposed, supervised topic model provides an avenue to associate market volatility with financial news yet it displays poor resolutions at extreme regions.
To overcome this problem, extreme topic model is proposed to predict volatility in financial markets by using supervised learning. By mapping extreme events into Poisson point processes, volatile regions are magnified
to reveal their hidden volatility-topic relationships. Lastly, some of the proposed hybrid methods are applied to service computing to verify that they are sufficiently generic for wider applications
Recommended from our members
Large-scale and Deep Spatiotemporal Point-Process Models
Many accurate spatiotemporal data sets have recently become available for research. Real-world applications create strong demands for a better multivariate point-process modeling. In this thesis, we develop new multivariate models with generalization ability and scalability. The first two chapters provide a research background, real-world problems and a mathematical introduction to point-process models. In chapter 3, we develop a nonparametric method for multivariate spatiotemporal Hawkes processes with applications on network reconstruction. In contrast to prior work, which has often focused on exclusively temporal information, our approach uses spatiotemporal information and does not assume a specific parametric form. Our results demonstrate that, in comparison to using only temporal data, our approach yields improved network reconstruction, providing a basis for meaningful subsequent analysis---such as examinations of community structure and motifs---of the reconstructed networks. In chapter 4, we present a fast and accurate estimation method for multivariate Hawkes processes. Our method, with guaranteed consistency, combines two estimation approaches. Extensive numerical experiments, with synthetic data and real-world social network data, show that our method improves the accuracy, scalability and computational efficiency of prevailing estimation approaches. Moreover, it greatly boosts the performance of Hawkes process-based models on social network reconstruction and helps to understand the spatiotemporal triggering dynamics over social media.In chapter 5, we focus on multivariate spatial point processes, which can describe heterotopic data over space. However, highly multivariate intensities are computationally challenging due to the curse of dimensionality. To bridge this gap, we introduce a declustering-based hidden-variable model that leads to an efficient inference via a variational autoencoder (VAE). We also prove that this model is a generalization of the VAE-based model for collaborative filtering. This leads to an interesting application of spatial point-process models to recommender systems. Experimental results show the method's utility on both synthetic data and real-world data. Finally, in chapter 6, we show how multivariate point processes can be applied to opioid overdose events and real-time prediction of the hourly crime rate. In chapter 7, we discuss future directions and conclude the thesis
- âŠ