Search CORE

2,138 research outputs found

Explicit probabilistic models for databases and networks

Author: De Bie Tijl
Publication venue
Publication date: 01/01/2009
Field of study

Recent work in data mining and related areas has highlighted the importance of the statistical assessment of data mining results. Crucial to this endeavour is the choice of a non-trivial null model for the data, to which the found patterns can be contrasted. The most influential null models proposed so far are defined in terms of invariants of the null distribution. Such null models can be used by computation intensive randomization approaches in estimating the statistical significance of data mining results. Here, we introduce a methodology to construct non-trivial probabilistic models based on the maximum entropy (MaxEnt) principle. We show how MaxEnt models allow for the natural incorporation of prior information. Furthermore, they satisfy a number of desirable properties of previously introduced randomization approaches. Lastly, they also have the benefit that they can be represented explicitly. We argue that our approach can be used for a variety of data types. However, for concreteness, we have chosen to demonstrate it in particular for databases and networks.Comment: Submitte

arXiv.org e-Print Archive

Explore Bristol Research

An Improved Technique for Multi-Dimensional Constrained Gradient Mining

Author: Elugbadebo O. J.
Folorunso O
Sodiya A. S.
Publication venue: Federal University of Agriculture, Abeokuta (FUNAAB)
Publication date: 26/02/2013
Field of study

Multi-dimensional Constrained Gradient Mining, which is an aspect of data mining, is based on mining constrained frequent gradient pattern pairs with significant difference in their measures in transactional database. Top-k Fp-growth with Gradient Pruning and Top-k Fp-growth with No Gradient Pruning were the two algorithms used for Multi-dimensional Constrained Gradient Mining in previous studies. However, these algorithms have their shortcomings. The first requires construction of Fp-tree before searching through the database and the second algorithm requires searching of database twice in finding frequent pattern pairs. These cause the problems of using large amount of time and memory space, which retrogressively make mining of database cumbersome.  Based on this anomaly, a new algorithm that combines Top-k Fp-growth with Gradient pruning and Top-k Fp-growth with No Gradient pruning is designed to eliminate these drawbacks. The new algorithm called Top-K Fp-growth with support Gradient pruning (SUPGRAP) employs the method of scanning the database once, by searching for the node and all the descendant of the node of every task at each level. The idea is to form projected Multidimensional Database and then find the Multidimensional patterns within the projected databases. The evaluation of the new algorithm shows significant improvement in terms of time and space required over the existing algorithms.  &nbsp

Federal University of Agriculture, Abeokuta: FUNAAB Journal

SLOTH: Structured Learning and Task-based Optimization for Time Series Forecasting on Hierarchies

Author: Hu Xuanwei
Hu Yun
Hu Yunhua
Lei Lei
Liu Yu
Ma Lintao
Pan Chen
Wang Shiyu
Zhang James
Zheng Yangfei
Zhou Fan
Zhu Xinxin
Publication venue
Publication date: 26/02/2023
Field of study

Multivariate time series forecasting with hierarchical structure is widely used in real-world applications, e.g., sales predictions for the geographical hierarchy formed by cities, states, and countries. The hierarchical time series (HTS) forecasting includes two sub-tasks, i.e., forecasting and reconciliation. In the previous works, hierarchical information is only integrated in the reconciliation step to maintain coherency, but not in forecasting step for accuracy improvement. In this paper, we propose two novel tree-based feature integration mechanisms, i.e., top-down convolution and bottom-up attention to leverage the information of the hierarchical structure to improve the forecasting performance. Moreover, unlike most previous reconciliation methods which either rely on strong assumptions or focus on coherent constraints only,we utilize deep neural optimization networks, which not only achieve coherency without any assumptions, but also allow more flexible and realistic constraints to achieve task-based targets, e.g., lower under-estimation penalty and meaningful decision-making loss to facilitate the subsequent downstream tasks. Experiments on real-world datasets demonstrate that our tree-based feature integration mechanism achieves superior performances on hierarchical forecasting tasks compared to the state-of-the-art methods, and our neural optimization networks can be applied to real-world tasks effectively without any additional effort under coherence and task-based constraint

arXiv.org e-Print Archive

Challenges of Big Data Analysis

Author: Fan Jianqing
Han Fang
Liu Han
Publication venue: 'Oxford University Press (OUP)'
Publication date: 06/02/2014
Field of study

Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

Frequent pattern mining: current status and future directions

Author: A Nanopoulos
Dong Xin
E Omiecinski
H Mannila
Hong Cheng
J Wang
J Yang
Jiawei Han
M Eirinaki
M Zaki
MJ Zaki
MJ Zaki
R Agrawal
RM Karp
T Imielinski
Xifeng Yan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

05051 Abstracts Collection -- Probabilistic, Logical and Relational Learning - Towards a Synthesis

Author: De Raedt Luc
Dietterich Tom
Getoor Lise
Muggleton Stephen H.
Publication venue: Dagstuhl Seminar Proceedings. 05051 - Probabilistic, Logical and Relational Learning - Towards a Synthesis
Publication date: 01/01/2006
Field of study

From 30.01.05 to 04.02.05, the Dagstuhl Seminar 05051 ``Probabilistic, Logical and Relational Learning - Towards a Synthesis\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

Dagstuhl Research Online Publication Server

Sensing and making sense of crowd dynamics using Bluetooth tracking : an application-oriented approach

Author: Versichele Mathias
Publication venue: Ghent University. Faculty of Sciences
Publication date: 01/01/2014
Field of study

Ghent University Academic Bibliography