6,792 research outputs found

    Data Mining

    Get PDF
    The availability of big data due to computerization and automation has generated an urgent need for new techniques to analyze and convert big data into useful information and knowledge. Data mining is a promising and leading-edge technology for mining large volumes of data, looking for hidden information, and aiding knowledge discovery. It can be used for characterization, classification, discrimination, anomaly detection, association, clustering, trend or evolution prediction, and much more in fields such as science, medicine, economics, engineering, computers, and even business analytics. This book presents basic concepts, ideas, and research in data mining

    Matching Vehicle License Plate Numbers Using License Plate Recognition and Text Mining Techniques

    Get PDF
    License plate recognition (LPR) technology has been widely applied in many different transportation applications such as enforcement, vehicle monitoring and access control. In most applications involving enforcement (e.g. cashless toll collection, congestion charging) and access control (e.g. car parking) a plate is recognized at one location (or checkpoint) and compared against a list of authorized vehicles. In this research I dealt with applications where a vehicle is detected at two locations and there is no list of reference for vehicle identification. There seems to be very little effort in the past to exploit all information generated by LPR systems. In nowadays, LPR machines have the ability to recognize most characters on the vehicle plates even under the harshest practical conditions. Therefore, even though the equipment are not perfect in terms of plate reading, it is still possible to judge with certain confidence if a pair of imperfect readings, in the form of sequenced characters (strings), most likely belong to the same vehicle. The challenge here is to design a matching procedure in order to decide whether or not they belong to same vehicle. In view of the aforementioned problem, this research intended to design and assess a matching procedure that takes advantage of a similarity measure called edit distance (ED) between two strings. The ED measure the minimum editing cost to convert a string to another. The study first attempted to assess a simple case of a dual LPR setup using the traditional ED formulation with 0 or 1 cost assignments (i.e. 0 if a pair-wise character is the same, and 1 otherwise). For this dual setup, this research has further proposed a symbol-based weight function using a probabilistic approach having as input parameters the conditional probability matrix of character association. As a result, this new formulation outperformed the original ED formulation. Lastly, the research sought to incorporate the passage time information into the procedure. With this, the performance of the matching procedure improved considerably resulting in a high positive matching rate and much lower (about 2%) false matching rate

    Comparison of different algorithms for exploting the hidden trends in data sources

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 2003Includes bibliographical references (leaves: 92-97)Text in English; Abstract: Turkish and English97 leavesThe growth of large-scale transactional databases, time-series databases and other kinds of databases has been giving rise to the development of several efficient algorithms that cope with the computationally expensive task of association rule mining.In this study, different algorithms, Apriori, FP-tree and CHARM, for exploiting the hidden trends such as frequent itemsets, frequent patterns, closed frequent itemsets respectively, were discussed and their performances were evaluated. The perfomances of the algorithms were measured at different support levels, and the algorithms were tested on different data sets (on both synthetic and real data sets). The algorihms were compared according to their, data preparation performances, mining performance, run time performances and knowledge extraction capabilities.The Apriori algorithm is the most prevalent algorithm of association rule mining which makes multiple passes over the database aiming at finding the set of frequent itemsets for each level. The FP-Tree algorithm is a scalable algorithm which finds the crucial information as regards the complete set of prefix paths, conditional pattern bases and frequent patterns by using a compact FP-Tree based mining method. The CHARM is a novel algorithm which brings remarkable improvements over existing association rule mining algorithms by proving the fact that mining the set of closed frequent itemsets is adequate instead of mining the set of all frequent itemsets.Related to our experimental results, we conclude that the Apriori algorithm demonstrates a good performance on sparse data sets. The Fp-tree algorithm extracts less association in comparison to Apriori, however it is completelty a feasable solution that facilitates mining dense data sets at low support levels. On the other hand, the CHARM algorithm is an appropriate algorithm for mining closed frequent itemsets (a substantial portion of frequent itemsets) on both sparse and dense data sets even at low levels of support

    Knowledge Modelling and Learning through Cognitive Networks

    Get PDF
    One of the most promising developments in modelling knowledge is cognitive network science, which aims to investigate cognitive phenomena driven by the networked, associative organization of knowledge. For example, investigating the structure of semantic memory via semantic networks has illuminated how memory recall patterns influence phenomena such as creativity, memory search, learning, and more generally, knowledge acquisition, exploration, and exploitation. In parallel, neural network models for artificial intelligence (AI) are also becoming more widespread as inferential models for understanding which features drive language-related phenomena such as meaning reconstruction, stance detection, and emotional profiling. Whereas cognitive networks map explicitly which entities engage in associative relationships, neural networks perform an implicit mapping of correlations in cognitive data as weights, obtained after training over labelled data and whose interpretation is not immediately evident to the experimenter. This book aims to bring together quantitative, innovative research that focuses on modelling knowledge through cognitive and neural networks to gain insight into mechanisms driving cognitive processes related to knowledge structuring, exploration, and learning. The book comprises a variety of publication types, including reviews and theoretical papers, empirical research, computational modelling, and big data analysis. All papers here share a commonality: they demonstrate how the application of network science and AI can extend and broaden cognitive science in ways that traditional approaches cannot

    Unsupervised learning for anomaly detection in Australian medical payment data

    Full text link
    Fraudulent or wasteful medical insurance claims made by health care providers are costly for insurers. Typically, OECD healthcare organisations lose 3-8% of total expenditure due to fraud. As Australia’s universal public health insurer, Medicare Australia, spends approximately A34billionperannumontheMedicareBenefitsSchedule(MBS)andPharmaceuticalBenefitsScheme,wastedspendingofA 34 billion per annum on the Medicare Benefits Schedule (MBS) and Pharmaceutical Benefits Scheme, wasted spending of A1–2.7 billion could be expected.However, fewer than 1% of claims to Medicare Australia are detected as fraudulent, below international benchmarks. Variation is common in medicine, and health conditions, along with their presentation and treatment, are heterogenous by nature. Increasing volumes of data and rapidly changing patterns bring challenges which require novel solutions. Machine learning and data mining are becoming commonplace in this field, but no gold standard is yet available. In this project, requirements are developed for real-world application to compliance analytics at the Australian Government Department of Health and Aged Care (DoH), covering: unsupervised learning; problem generalisation; human interpretability; context discovery; and cost prediction. Three novel methods are presented which rank providers by potentially recoverable costs. These methods used association analysis, topic modelling, and sequential pattern mining to provide interpretable, expert-editable models of typical provider claims. Anomalous providers are identified through comparison to the typical models, using metrics based on costs of excess or upgraded services. Domain knowledge is incorporated in a machine-friendly way in two of the methods through the use of the MBS as an ontology. Validation by subject-matter experts and comparison to existing techniques shows that the methods perform well. The methods are implemented in a software framework which enables rapid prototyping and quality assurance. The code is implemented at the DoH, and further applications as decision-support systems are in progress. The developed requirements will apply to future work in this fiel
    • …
    corecore