Search CORE

9 research outputs found

Controlled experiments on the web: survey and practical guide

Author: C Hopkins
Dan Sommerfield
DD Boos
H Manning
M Burns
OL Davies
Randal M. Henne
RL Plackett
Roger Longbotham
Ron Kohavi
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Data Mining using MLC++: A Machine Learning Library in C++

Author: Dan Sommerfield
James Dougherty
Ron Kohavi
Publication venue
Publication date: 01/01/1996
Field of study

Data mining algorithmsincluding machine learning, statistical analysis, and pattern recognition techniques can greatly improve our understanding of data warehouses that are now becoming more widespread. In this paper, we focus on classification algorithms and review the need for multiple classification algorithms. We describe a system called MLC++ , which was designed to help choose the appropriate classification algorithm for a given dataset by making it easy to compare the utility of different algorithms on a specific dataset of interest. MLC ++ not only provides a workbench for such comparisons, but also provides a library of C ++ classes to aid in the development of new algorithms, especially hybrid algorithms and multi-strategy algorithms. Such algorithms are generally hard to code from scratch. We discuss design issues, interfaces to other programs, and visualization of the resulting classifiers. 1 Introduction Data warehouses containing massive amounts of data have been b..

CiteSeerX

Data Mining Using $\mathcal{MLC}++$ a Machine Learning Library in C++

Author: Dan Sommerfield
James Dougherty
Ron Kohavi
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date
Field of study

Crossref

Improving Simple Bayes

Author: Barry Becker
Dan Sommerfield
Ron Kohavi
Publication venue
Publication date
Field of study

The simple Bayesian classifier (SBC), sometimes called Naive-Bayes, is built based on a conditional independence model of each attribute given the class. The model was previously shown to be surprisingly robust to obvious violations of this independence assumption, yielding accurate classification models even when there are clear conditional dependencies. We examine different approaches for handling unknowns and zero counts when estimating probabilities. Large scale experiments on 37 datasets were conducted to determine the effects of these approaches and several interesting insights are given, including a new variant of the Laplace estimator that outperforms other methods for dealing with zero counts. Using the bias-variance decomposition [15, 10], we show that while the SBC has performed well on common benchmark datasets, its accuracy will not scale up as the dataset sizes grow. Even with these limitations in mind, the SBC can serve as an excellenttool for initial exp..

CiteSeerX

Visualizing the Simple Bayesian Classifier

Author: Barry Becker
Dan Sommerfield
Ron Kohavi
Publication venue
Publication date
Field of study

The simple Bayesian classifier (SBC), sometimes called Naive-Bayes, is built based on a conditional independence model of each attribute given the class. The model was previously shown to be surprisingly robust to obvious violations of this independence assumption, yielding accurate classification models even when there are clear conditional dependencies. The SBC can serve as an excellent tool for initial exploratory data analysis when coupled with a visualizer that makes its structure comprehensible. We describe such a visual representation of the SBC model that has been successfully implemented. We describe the requirements we had for such a visualization and the design decisions we made to satisfy them. Keywords:Classification, simple/naive-Bayes, visualization

CiteSeerX

How are subaqueous sediment density flows triggered, what is their internal structure and how does it evolve? Direct observations from monitoring of active flows

Author: Amy
Arzola
Atwater
Ayranci
Barley
Bell
Benjamin
Best
Bird
Bornhold
Bouma
Bourcart
Bowen
Brucker
Canals
Canals
Cantero
Carter
Carter
Cartigny
Cattaneo
Cattaneo
Charles K. Paull
Chikita
Chikita
Chikita
Chikita
Christian
Colantoni
Conway
Cooper
Coussot
Crookshanks
Crookshanks
Dan
David J.W. Piper
De Cesare
De Cesare
de Stiger
Dengler
Dill
Dill
Dill
El-Robrini
Fan
Feng
Fernandez
Forel
Forel
Freundt
Galy
Garfield
Gavey
Gennesseaux
Gilbert
Gilbert
Girardclos
Girardclos
Goldfinger
Gorsline
Gould
Grover
Guerrero
Hale
Hart
Hart
Haughton
Hay
Hay
Hay
Heezen
Heezen
Heezen
Heezen
Heezen
Heezen
Heezen
Hill
Hsu
Hu
Hughes Clarke
Hughes Clarke
Hughes Clarke
Hughes Clarke
Hughes Clarke
Hunt
Hurther
Hurzeler
Iatrou
Ikehara
Inman
Iverson
Iverson
Iverson
Johnson
Kao
Khripounoff
Khripounoff
Khripounoff
Komar
Komar
Komar
Kostaschuk
Kostaschuk
Kostaschuk
Kremer
Kuenen
Lambert
Lambert
Lambert
Lambert
Le Friant
Le Friant
Levine
Lintern
Liu
Liu
Liu
Lowe
Lowe
Marshall
Marti
Martin
Mas
Masson
Mastbergen
McClung
Middleton
Mikada
Mitsuzawa
Monecke
Mosher
Mulder
Mulder
Mullenbach
Nielsen
Normark
Normark
Normark
Nougués
Ogston
Palanques
Palanques
Parker
Parsons
Pasqual
Paull
Paull
Paull
Paull
Paull
Paull
Paull
Peter. J. Talling
Pharo
Piper
Piper
Piper
Piper
Piper
Piper
Piper
Piper
Prior
Prior
Puig
Puig
Puig
Puig
Ren
Romero-Otero
Rothé
Savoye
Sequeiros
Shepard
Shepard
Shepard
Shor
Smith
Smith
Sommerfield
Sovilla
Sparks
Sternberg
Stow
Strasser
Sumner
Sumner
Sumner
Sun
Syvitski
Taki
Talling
Talling
Talling
Talling
Talling
Talling
Thunnell
Traykovski
Traykovski
Trofimovs
Trofimovs
Tsutsui
Umeda
Umeda
Underwood
Van Tassell
Vangriesheim
Weber
Wei
Weirich
Weirich
Weirich
Wheatcroft
Winterwerp
Wright
Wright
Wright
Wynn
Xu
Xu
Xu
Xu
Xu
Xu
Xu
Zeng
Zeng
Publication venue: 'Elsevier BV'
Publication date: 01/10/2013
Field of study

Subaqueous sediment density flows are one of the volumetrically most important processes for moving sediment across our planet, and form the largest sediment accumulations on Earth (submarine fans). They are also arguably the most sparely monitored major sediment transport processes on our planet. Significant advances have been made in documenting their timing and triggers, especially within submarine canyons and delta-fronts, and freshwater lakes and reservoirs, but the sediment concentration of flows that run out beyond the continental slope has never been measured directly. This limited amount of monitoring data contrasts sharply with other major types of sediment flow, such as river systems, and ensure that understanding submarine sediment density flows remains a major challenge for Earth science. The available monitoring data define a series of flow types whose character and deposits differ significantly. Large (> 100 km3) failures on the continental slope can generate fast-moving (up to 19 m/s) flows that reach the deep ocean, and deposit thick layers of sand across submarine fans. Even small volume (0.008 km3) canyon head failures can sometimes generate channelised flows that travel at > 5 m/s for several hundred kilometres. A single event off SE Taiwan shows that river floods can generate powerful flows that reach the deep ocean, in this case triggered by failure of recently deposited sediment in the canyon head. Direct monitoring evidence of powerful oceanic flows produced by plunging hyperpycnal flood water is lacking, although this process has produced shorter and weaker oceanic flows. Numerous flows can occur each year on river-fed delta fronts, where they can generate up-slope migrating crescentic bedforms. These flows tend to occur during the flood season, but are not necessarily associated with individual flood discharge peaks, suggesting that they are often triggered by delta-front slope failures. Powerful flows occur several times each year in canyons fed by sand from the shelf, associated with strong wave action. These flows can also generate up-slope migrating crescentic bedforms that most likely originate due to retrogressive breaching associated with a dense near-bed layer of sediment. Expanded dilute flows that are supercritical and fully turbulent are also triggered by wave action in canyons. Sediment density flows in lakes and reservoirs generated by plunging river flood water have been monitored in much greater detail. They are typically very dilute (< 0.01 vol.% sediment) and travel at < 50 cm/s, and are prone to generating interflows within the density stratified freshwater. A key objective for future work is to develop measurement techniques for seeing through overlying dilute clouds of sediment, to determine whether dense near-bed layers are present. There is also a need to combine monitoring of flows with detailed analyses of flow deposits, in order to understand how flows are recorded in the rock record. Finally, a source-to-sink approach is needed because the character of submarine flows can change significantly along their flow path

Southampton (e-Prints Soton)

Crossref

NERC Open Research Archive