Search CORE

165 research outputs found

Identifying Mislabeled Training Data

Author: Brodley C. E.
Friedl M. A.
Publication venue: 'AI Access Foundation'
Publication date: 01/06/2011
Field of study

This paper presents a new approach to identifying and eliminating mislabeled training instances for supervised learning. The goal of this approach is to improve classification accuracies produced by learning algorithms by improving the quality of the training data. Our approach uses a set of learning algorithms to create classifiers that serve as noise filters for the training data. We evaluate single algorithm, majority vote and consensus filters on five datasets that are prone to labeling errors. Our experiments illustrate that filtering significantly improves classification accuracy for noise levels up to 30 percent. An analytical and empirical evaluation of the precision of our approach shows that consensus filters are conservative at throwing away good data at the expense of retaining bad data and that majority filters are better at detecting bad data at the expense of throwing away good data. This suggests that for situations in which there is a paucity of data, consensus filters are preferable, whereas majority vote filters are preferable for situations with an abundance of data

arXiv.org e-Print Archive

Crossref

Smash Guard: A Hardware Solution to Prevent Security Attacks on the Function Return Address

Author: Brodley C. E.
Jalote A.
Ozdoganoglu H.
Vijaykumar T. N.
Publication venue: 'Purdue University (bepress)'
Publication date: 01/12/2003
Field of study

A buffer overflow attack is perhaps the most common attack used to compromise the security of a host. A buffer overflow can be used to change the function return address and redirect execution to execute the attacker\u27s code. We present a hardware-based solution, called SmashGuard, to protecting the return addresses stored on the program stack. SmashGuard protects against all known forms of attack on the function return address pointer. With each function call instruction a new return address is pushed onto an extra hardware stack. A return instruction compares its return address to the address from the top of the hardware stack. If a mismatch is detected, then an exception is raised. Because the stack operations and checks are done in hardware, and in parallel with the usual execution of call and return instructions, our bestperforming implementation scheme has virtually no performance overhead. While previous software-based approaches\u27 average performance degradation for the SPEC2000 benchmarks is only 2.8%, their worst-case degradation is up to 8.3%. Apart from the lack of robustness in performance, the software approaches\u27 key disadvantages are less security coverage and the need for recompilation of applications. SmashGuard, on the other hand, is secure and does not require recompilation, though the OS needs to be modified to save/restore the hardware stack at context switches, and when function call nesting exceeds the hardware stack depth

Purdue E-Pubs

Are You Tampering With My Data?

Author: A Khosla
B Biggio
C. E. Brodley
K Fukushima
Kunihiko Fukushima
Markéta Paloncýová
MD Zeiler
R Langner
T Tommasi
T Tommasi
V Behzadan
WH Ittelson
Y. LeCun
Publication venue
Publication date: 21/08/2018
Field of study

We propose a novel approach towards adversarial attacks on neural networks (NN), focusing on tampering the data used for training instead of generating attacks on trained models. Our network-agnostic method creates a backdoor during training which can be exploited at test time to force a neural network to exhibit abnormal behaviour. We demonstrate on two widely used datasets (CIFAR-10 and SVHN) that a universal modification of just one pixel per image for all the images of a class in the training set is enough to corrupt the training procedure of several state-of-the-art deep neural networks causing the networks to misclassify any images to which the modification is applied. Our aim is to bring to the attention of the machine learning community, the possibility that even learning-based methods that are personally trained on public datasets can be subject to attacks by a skillful adversary.Comment: 18 page

arXiv.org e-Print Archive

Crossref

Berner Fachhochschule: ARBOR

Decision Tree Classifiers for Star/Galaxy Separation

Author: Abazajian
Abazajian
Abazajian
Ball
Bernstein
Breiman
Brodley
E. C. Vasconcellos
F. L. LaBarbera
Fayyad
Fayyad
Freund
Gama
Geoffrey
H. Frago Campos Velho
H. V. Capelato
Haijian
Heydon-Dumbleton
Holmes
Kohavi
La Barbera
M. Trevisan
MacGillivray
Maddox
Murthy
Odewahn
Odewahn
Quinlam
Quinlam
Quinlam
R. R. de Carvalho
R. R. Gal
R. S. R. Ruiz
Ruiz
Stoughton
Suchkov
Weir
Witten
Yasuda
York
Publication venue: 'IOP Publishing'
Publication date: 08/11/2010
Field of study

We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of

884,126

SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals:

14\le r\le21

(

85.2%

) and

r\ge19

(

82.1%

). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT and Ball et al. (2006). We find that our FT classifier is comparable or better in completeness over the full magnitude range

15\le r\le21

, with much lower contamination than all but the Ball et al. classifier. At the faintest magnitudes (

r>19

), our classifier is the only one able to maintain high completeness (

>

80%) while still achieving low contamination (

\sim2.5%

). Finally, we apply our FT classifier to separate stars from galaxies in the full set of

69,545,326

SDSS photometric objects in the magnitude range

14\le r\le21

.Comment: Submitted to A

arXiv.org e-Print Archive

Crossref

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Repositório da Produção USP (Univ. de São Paulo)

OWA-FRPS: A Prototype Selection method based on Ordered Weighted Average Fuzzy Rough Set Theory

Author: C. Brodley
C. Cornelis
C. Cornelis
D. Dubois
E. Marchiori
G. Gates
I. Tomek
J. Cano
J. Derrac
J. Riquelme
J. Sanchez
K. Hattori
L. Kuncheva
L. Kuncheva
R. Barandela
R. Yager
S. García
S. García
T. Cover
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The Nearest Neighbor (NN) algorithm is a well-known and effective classification algorithm. Prototype Selection (PS), which provides NN with a good training set to pick its neighbors from, is an important topic as NN is highly susceptible to noisy data. Accurate state-of-the-art PS methods are generally slow, which motivates us to propose a new PS method, called OWA-FRPS. Based on the Ordered Weighted Average (OWA) fuzzy rough set model, we express the quality of instances, and use a wrapper approach to decide which instances to select. An experimental evaluation shows that OWA-FRPS is significantly more accurate than state-of-the-art PS methods without requiring a high computational cost.Spanish Government TIN2011-2848

Crossref

Ghent University Academic Bibliography

Repositorio Institucional Universidad de Granada

Collusion through Joint R&D: An Empirical Assessment

This paper tests whether upstream R&D cooperation leads to downstream collusion. We consider an oligopolistic setting where firms enter in research joint ventures (RJVs) to lower production costs or coordinate on collusion in the product market. We show that a sufficient condition for identifying collusive behavior is a decline in the market share of RJV-participating firms, which is also necessary and sufficient for a decrease in consumer welfare. Using information from the US National Cooperation Research Act, we estimate a market share equation correcting for the endogeneity of RJV participation and R&D expenditures. We find robust evidence that large networks between direct competitors – created through firms being members in several RJVs at the same time – are conducive to collusive outcomes in the product market which reduce consumer welfare. By contrast, RJVs among non-competitors are efficiency enhancing

Crossref

Open Access LMU ( Ludwig-Maximilians-Univ. München)

EconStor (ZBW Kiel)

Finding Anomalous Periodic Time Series: An Application to Catalogs of Periodic Variable Stars

Author: A. Hewish
A. P. Dempster
C. Sterken
Carla E. Brodley
Charles Alcock
D. Hawkins
D. L. Pollacco
D. Yu
G. Kollios
G. Richter
J. Yang
M. Petit
M. Schmidt
N. N. Samus’
P. Protopapas
Pavlos Protopapas
R. W. Klebesadel
S. Gaffney
S. Mallat
Umaa Rebbapragada
V. Barnett
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/05/2009
Field of study

Catalogs of periodic variable stars contain large numbers of periodic light-curves (photometric time series data from the astrophysics domain). Separating anomalous objects from well-known classes is an important step towards the discovery of new classes of astronomical objects. Most anomaly detection methods for time series data assume either a single continuous time series or a set of time series whose periods are aligned. Light-curve data precludes the use of these methods as the periods of any given pair of light-curves may be out of sync. One may use an existing anomaly detection method if, prior to similarity calculation, one performs the costly act of aligning two light-curves, an operation that scales poorly to massive data sets. This paper presents PCAD, an unsupervised anomaly detection method for large sets of unsynchronized periodic time-series data, that outputs a ranked list of both global and local anomalies. It calculates its anomaly score for each light-curve in relation to a set of centroids produced by a modified k-means clustering algorithm. Our method is able to scale to large data sets through the use of sampling. We validate our method on both light-curve data and other time series data sets. We demonstrate its effectiveness at finding known anomalies, and discuss the effect of sample size and number of centroids on our results. We compare our method to naive solutions and existing time series anomaly detection methods for unphased data, and show that PCAD's reported anomalies are comparable to or better than all other methods. Finally, astrophysicists on our team have verified that PCAD finds true anomalies that might be indicative of novel astrophysical phenomena

arXiv.org e-Print Archive

Crossref

Recommended from our members

Computing: report leaps geographical barriers but stumbles over gender

Author: Aiello L. C.
Amato N.
Bajcsy R.
Bonacina M. P.
Brodley C. E.
Carberry S.
Catarci T.
Clarke L. A.
Dillon L.
Ellis C. S.
Grosz B. J.
Hambrusch S. E.
Hirschberg J.
Hodgins J.
Irwin M. J.
Klawe M.
McCoy K. F.
McKeown K.
Padgham L.
Pollack Martha E.
Pollock L.
Ryder B. G.
Soffa M. L.
Sonenberg L.
Veloso M. M.
Weyuker E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/62917/1/441025a.pd

ScholarWorks@UMass Amherst

Catalogo dei prodotti della ricerca

Archivio della ricerca- Università di Roma La Sapienza

Deep Blue Documents

Improving Text Classification Accuracy by Training Label Cleaning

Author: Abney S.
Andrea Esuli
Brodley C. E.
Fabrizio Sebastiani
Freund Y.
Grady C.
Hersh W.
John G. H.
Maclin R.
Schapire R. E.
Shinnou H.
Snow R.
Yih W.-T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref