10,389 research outputs found
Personal Income and Population in Arkansas Counties - 1950 - 1962
Personal income and population estimates in Arkansas countie
Fast and Accurate Mining of Correlated Heavy Hitters
The problem of mining Correlated Heavy Hitters (CHH) from a two-dimensional
data stream has been introduced recently, and a deterministic algorithm based
on the use of the Misra--Gries algorithm has been proposed by Lahiri et al. to
solve it. In this paper we present a new counter-based algorithm for tracking
CHHs, formally prove its error bounds and correctness and show, through
extensive experimental results, that our algorithm outperforms the Misra--Gries
based algorithm with regard to accuracy and speed whilst requiring
asymptotically much less space
Personal Income and Population in Iowa Counties - 1950-1962 - Iowa Appendix to Methodological Summary Report, 1 June 1966
Allocation methods of state personal income to Iowa counties, and tabulated results including population figure
A survey of outlier detection methodologies
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review
Approximate Parallel High Utility Itemset Mining
High utility itemset mining discovers itemsets whose utility is above a given threshold, where utilities measure the importance of itemsets. In high utility itemset mining, memory and time performance limitations cause scalability issues, when the dataset is very large. In this thesis, the problem is addressed by proposing a distributed parallel algorithm, PHUI-Miner, and a sampling strategy, which can be used either separately or simultaneously. PHUI-Miner parallelizes the state-of-the-art high utility itemset mining algorithm HUI-Miner. The sampling strategy investigates the required sample size of a dataset, in order to achieve a given accuracy. We also propose an approach combining sampling with PHUI-Miner, which provides better time performance. In our experiments, we show that PHUI-Miner has high performance and outperforms the state-of-the-art non-parallel algorithm. The sampling strategy achieves accuracies much higher than the guarantee. Extensive experiments are also conducted to compare the time performance of PHUI-Miner with and without sampling
- …