4,699 research outputs found
Algorithms for the Problems of Length-Constrained Heaviest Segments
We present algorithms for length-constrained maximum sum segment and maximum
density segment problems, in particular, and the problem of finding
length-constrained heaviest segments, in general, for a sequence of real
numbers. Given a sequence of n real numbers and two real parameters L and U (L
<= U), the maximum sum segment problem is to find a consecutive subsequence,
called a segment, of length at least L and at most U such that the sum of the
numbers in the subsequence is maximum. The maximum density segment problem is
to find a segment of length at least L and at most U such that the density of
the numbers in the subsequence is the maximum. For the first problem with
non-uniform width there is an algorithm with time and space complexities in
O(n). We present an algorithm with time complexity in O(n) and space complexity
in O(U). For the second problem with non-uniform width there is a combinatorial
solution with time complexity in O(n) and space complexity in O(U). We present
a simple geometric algorithm with the same time and space complexities.
We extend our algorithms to respectively solve the length-constrained k
maximum sum segments problem in O(n+k) time and O(max{U, k}) space, and the
length-constrained maximum density segments problem in O(n min{k, U-L})
time and O(U+k) space. We present extensions of our algorithms to find all the
length-constrained segments having user specified sum and density in O(n+m) and
O(nlog (U-L)+m) times respectively, where m is the number of output.
Previously, there was no known algorithm with non-trivial result for these
problems. We indicate the extensions of our algorithms to higher dimensions.
All the algorithms can be extended in a straight forward way to solve the
problems with non-uniform width and non-uniform weight.Comment: 21 pages, 12 figure
Data Mining Techniques for Fraud Detection
The paper presents application of data mining techniques to fraud analysis. We present some classification and prediction data mining techniques which we consider important to handle fraud detection. There exist a number of data mining algorithms and we present statistics-based algorithm, decision tree-based algorithm and rule-based algorithm. We present Bayesian classification model to detect fraud in automobile insurance. Naïve Bayesian visualization is selected to analyze and interpret the classifier predictions. We illustrate how ROC curves can be deployed for model assessment in order to provide a more intuitive analysis of the models.
Keywords: Data Mining, Decision Tree, Bayesian Network, ROC Curve, Confusion Matri
Data Mining Techniques in Fraud Detection
The paper presents application of data mining techniques to fraud analysis. We present some classification and prediction data mining techniques which we consider important to handle fraud detection. There exist a number of data mining algorithms and we present statistics-based algorithm, decision treebased algorithm and rule-based algorithm. We present Bayesian classification model to detect fraud in automobile insurance. Naïve Bayesian visualization is selected to analyze and interpret the classifier predictions. We illustrate how ROC curves can be deployed for model assessment in order to provide a more intuitive analysis of the models
Cloud-Scale Entity Resolution: Current State and Open Challenges
Entity resolution (ER) is a process to identify records in information systems, which refer to the same real-world entity. Because in the two recent decades the data volume has grown so large, parallel techniques are called upon to satisfy the ER requirements of high performance and scalability. The development of parallel ER has reached a relatively prosperous stage, and has found its way into several applications. In this work, we first comprehensively survey the state of the art of parallel ER approaches. From the comprehensive overview, we then extract the classification criteria of parallel ER, classify and compare these approaches based on these criteria. Finally, we identify open research questions and challenges and discuss potential solutions and further research potentials in this field
- …