7,859 research outputs found
Tracking time evolving data streams for short-term traffic forecasting
YesData streams have arisen as a relevant topic during the last few years as an efficient method for extracting knowledge from big data. In the robust layered ensemble model (RLEM) proposed in this paper for short-term traffic flow forecasting, incoming traffic flow data of all connected road links are organized in chunks corresponding to an optimal time lag. The RLEM model is composed of two layers. In the first layer, we cluster the chunks by using the Graded Possibilistic c-Means method. The second layer is made up by an ensemble of forecasters, each of them trained for short-term traffic flow forecasting on the chunks belonging to a specific cluster. In the operational phase, as a new chunk of traffic flow data presented as input to the RLEM, its memberships to all clusters are evaluated, and if it is not recognized as an outlier, the outputs of all forecasters are combined in an ensemble, obtaining in this a way a forecasting of traffic flow for a short-term time horizon. The proposed RLEM model is evaluated on a synthetic data set, on a traffic flow data simulator and on two real-world traffic flow data sets. The model gives an accurate forecasting of the traffic flow rates with outlier detection and shows a good adaptation to non-stationary traffic regimes. Given its characteristics of outlier detection, accuracy, and robustness, RLEM can be fruitfully integrated in traffic flow management systems
Unsupervised tracking of time-evolving data streams and an application to short-term urban traffic flow forecasting
I am indebted to many people for their help and support I receive during my Ph.D. study and research at DIBRIS-University of Genoa. First and foremost, I would like to express my sincere thanks to my supervisors Prof.Dr. Masulli, and Prof.Dr. Rovetta for the invaluable guidance, frequent meetings, and discussions, and the encouragement and support on my way of research. I thanks all the members of the DIBRIS for their support and kindness during my 4 years Ph.D. I would like also to acknowledge the contribution of the projects Piattaforma per la mobili\ue0 Urbana con Gestione delle INformazioni da sorgenti eterogenee (PLUG-IN) and COST Action IC1406 High Performance Modelling and Simulation for Big Data Applications (cHiPSet). Last and most importantly, I wish to thanks my family: my wife Shaimaa who stays with me through the joys and pains; my daughter and son whom gives me happiness every-day; and my parents for their constant love and encouragement
Scalable Bayesian modeling, monitoring and analysis of dynamic network flow data
Traffic flow count data in networks arise in many applications, such as
automobile or aviation transportation, certain directed social network
contexts, and Internet studies. Using an example of Internet browser traffic
flow through site-segments of an international news website, we present
Bayesian analyses of two linked classes of models which, in tandem, allow fast,
scalable and interpretable Bayesian inference. We first develop flexible
state-space models for streaming count data, able to adaptively characterize
and quantify network dynamics efficiently in real-time. We then use these
models as emulators of more structured, time-varying gravity models that allow
formal dissection of network dynamics. This yields interpretable inferences on
traffic flow characteristics, and on dynamics in interactions among network
nodes. Bayesian monitoring theory defines a strategy for sequential model
assessment and adaptation in cases when network flow data deviates from
model-based predictions. Exploratory and sequential monitoring analyses of
evolving traffic on a network of web site-segments in e-commerce demonstrate
the utility of this coupled Bayesian emulation approach to analysis of
streaming network count data.Comment: 29 pages, 16 figure
Analysis and Forecasting of Trending Topics in Online Media Streams
Among the vast information available on the web, social media streams capture
what people currently pay attention to and how they feel about certain topics.
Awareness of such trending topics plays a crucial role in multimedia systems
such as trend aware recommendation and automatic vocabulary selection for video
concept detection systems.
Correctly utilizing trending topics requires a better understanding of their
various characteristics in different social media streams. To this end, we
present the first comprehensive study across three major online and social
media streams, Twitter, Google, and Wikipedia, covering thousands of trending
topics during an observation period of an entire year. Our results indicate
that depending on one's requirements one does not necessarily have to turn to
Twitter for information about current events and that some media streams
strongly emphasize content of specific categories. As our second key
contribution, we further present a novel approach for the challenging task of
forecasting the life cycle of trending topics in the very moment they emerge.
Our fully automated approach is based on a nearest neighbor forecasting
technique exploiting our assumption that semantically similar topics exhibit
similar behavior.
We demonstrate on a large-scale dataset of Wikipedia page view statistics
that forecasts by the proposed approach are about 9-48k views closer to the
actual viewing statistics compared to baseline methods and achieve a mean
average percentage error of 45-19% for time periods of up to 14 days.Comment: ACM Multimedia 201
Recommended from our members
Graded possibilistic clustering of non-stationary data streams
YesMultidimensional data streams are a major paradigm in data science. This work focuses on possibilistic clustering algorithms as means to perform clustering of multidimensional streaming data. The proposed approach exploits fuzzy outlier analysis to provide good learning and tracking abilities in both concept shift and concept drift
Using big data for customer centric marketing
This chapter deliberates on “big data” and provides a short overview of business intelligence and emerging analytics. It underlines the importance of data for customer-centricity in marketing. This contribution contends that businesses ought to engage in marketing automation tools and apply them to create relevant, targeted customer experiences. Today’s business increasingly rely on digital media and mobile technologies as on-demand, real-time marketing has become more personalised than ever. Therefore, companies and brands are striving to nurture fruitful and long lasting relationships with customers. In a nutshell, this chapter explains why companies should recognise the value of data analysis and mobile applications as tools that drive consumer insights and engagement. It suggests that a strategic approach to big data could drive consumer preferences and may also help to improve the organisational performance.peer-reviewe
Clustering of nonstationary data streams: a survey of fuzzy partitional methods
YesData streams have arisen as a relevant research topic during the past decade. They are real‐time, incremental in nature, temporally ordered, massive, contain outliers, and the objects in a data stream may evolve over time (concept drift). Clustering is often one of the earliest and most important steps in the streaming data analysis workflow. A comprehensive literature is available about stream data clustering; however, less attention is devoted to the fuzzy clustering approach, even though the nonstationary nature of many data streams makes it especially appealing. This survey discusses relevant data stream clustering algorithms focusing mainly on fuzzy methods, including their treatment of outliers and concept drift and shift.Ministero dell‘Istruzione, dell‘Universitá e della Ricerca
Scaling forecasting algorithms using clustered modeling
Cataloged from PDF version of article.Research on forecasting has traditionally focused on building more accurate statistical models for a given time series. The models are mostly applied to limited data due to efficiency and scalability problems. However, many enterprise applications require scalable forecasting on large number of data series. For example, telecommunication companies need to forecast each of their customers' traffic load to understand their usage behavior and to tailor targeted campaigns. Forecasting models are typically applied on aggregate data to estimate the total traffic volume for revenue estimation and resource planning. However, they cannot be easily applied to each user individually as building accurate models for large number of users would be time consuming. The problem is exacerbated when the forecasting process is continuous and the models need to be updated periodically. This paper addresses the problem of building and updating forecasting models continuously for multiple data series. We propose dynamic clustered modeling for forecasting by utilizing representative models as an analogy to cluster centers. We apply the models to each individual series through iterative nonlinear optimization. We develop two approaches: The Integrated Clustered Modeling integrates clustering and modeling simultaneously, and the Sequential Clustered Modeling applies them sequentially. Our findings indicate that modeling an individual's behavior using its segment can be more scalable and accurate than the individual model itself. The grouped models avoid overfits and capture common motifs even on noisy data. Experimental results from a telco CRM application show the method is efficient and scalable, and also more accurate than having separate individual models
- …