9 research outputs found

    Data Mining using MLC++: A Machine Learning Library in C++

    No full text
    Data mining algorithmsincluding machine learning, statistical analysis, and pattern recognition techniques can greatly improve our understanding of data warehouses that are now becoming more widespread. In this paper, we focus on classification algorithms and review the need for multiple classification algorithms. We describe a system called MLC++ , which was designed to help choose the appropriate classification algorithm for a given dataset by making it easy to compare the utility of different algorithms on a specific dataset of interest. MLC ++ not only provides a workbench for such comparisons, but also provides a library of C ++ classes to aid in the development of new algorithms, especially hybrid algorithms and multi-strategy algorithms. Such algorithms are generally hard to code from scratch. We discuss design issues, interfaces to other programs, and visualization of the resulting classifiers. 1 Introduction Data warehouses containing massive amounts of data have been b..

    Improving Simple Bayes

    No full text
    The simple Bayesian classifier (SBC), sometimes called Naive-Bayes, is built based on a conditional independence model of each attribute given the class. The model was previously shown to be surprisingly robust to obvious violations of this independence assumption, yielding accurate classification models even when there are clear conditional dependencies. We examine different approaches for handling unknowns and zero counts when estimating probabilities. Large scale experiments on 37 datasets were conducted to determine the effects of these approaches and several interesting insights are given, including a new variant of the Laplace estimator that outperforms other methods for dealing with zero counts. Using the bias-variance decomposition [15, 10], we show that while the SBC has performed well on common benchmark datasets, its accuracy will not scale up as the dataset sizes grow. Even with these limitations in mind, the SBC can serve as an excellenttool for initial exp..

    Visualizing the Simple Bayesian Classifier

    No full text
    The simple Bayesian classifier (SBC), sometimes called Naive-Bayes, is built based on a conditional independence model of each attribute given the class. The model was previously shown to be surprisingly robust to obvious violations of this independence assumption, yielding accurate classification models even when there are clear conditional dependencies. The SBC can serve as an excellent tool for initial exploratory data analysis when coupled with a visualizer that makes its structure comprehensible. We describe such a visual representation of the SBC model that has been successfully implemented. We describe the requirements we had for such a visualization and the design decisions we made to satisfy them. Keywords:Classification, simple/naive-Bayes, visualization

    How are subaqueous sediment density flows triggered, what is their internal structure and how does it evolve? Direct observations from monitoring of active flows

    Get PDF
    Subaqueous sediment density flows are one of the volumetrically most important processes for moving sediment across our planet, and form the largest sediment accumulations on Earth (submarine fans). They are also arguably the most sparely monitored major sediment transport processes on our planet. Significant advances have been made in documenting their timing and triggers, especially within submarine canyons and delta-fronts, and freshwater lakes and reservoirs, but the sediment concentration of flows that run out beyond the continental slope has never been measured directly. This limited amount of monitoring data contrasts sharply with other major types of sediment flow, such as river systems, and ensure that understanding submarine sediment density flows remains a major challenge for Earth science. The available monitoring data define a series of flow types whose character and deposits differ significantly. Large (> 100 km3) failures on the continental slope can generate fast-moving (up to 19 m/s) flows that reach the deep ocean, and deposit thick layers of sand across submarine fans. Even small volume (0.008 km3) canyon head failures can sometimes generate channelised flows that travel at > 5 m/s for several hundred kilometres. A single event off SE Taiwan shows that river floods can generate powerful flows that reach the deep ocean, in this case triggered by failure of recently deposited sediment in the canyon head. Direct monitoring evidence of powerful oceanic flows produced by plunging hyperpycnal flood water is lacking, although this process has produced shorter and weaker oceanic flows. Numerous flows can occur each year on river-fed delta fronts, where they can generate up-slope migrating crescentic bedforms. These flows tend to occur during the flood season, but are not necessarily associated with individual flood discharge peaks, suggesting that they are often triggered by delta-front slope failures. Powerful flows occur several times each year in canyons fed by sand from the shelf, associated with strong wave action. These flows can also generate up-slope migrating crescentic bedforms that most likely originate due to retrogressive breaching associated with a dense near-bed layer of sediment. Expanded dilute flows that are supercritical and fully turbulent are also triggered by wave action in canyons. Sediment density flows in lakes and reservoirs generated by plunging river flood water have been monitored in much greater detail. They are typically very dilute (< 0.01 vol.% sediment) and travel at < 50 cm/s, and are prone to generating interflows within the density stratified freshwater. A key objective for future work is to develop measurement techniques for seeing through overlying dilute clouds of sediment, to determine whether dense near-bed layers are present. There is also a need to combine monitoring of flows with detailed analyses of flow deposits, in order to understand how flows are recorded in the rock record. Finally, a source-to-sink approach is needed because the character of submarine flows can change significantly along their flow path
    corecore