Search CORE

7 research outputs found

Active Learning with Irrelevant Examples

Author: Mazzoni Dominic
Wagstaff Kiri
Publication venue
Publication date
Field of study

An improved active learning method has been devised for training data classifiers. One example of a data classifier is the algorithm used by the United States Postal Service since the 1960s to recognize scans of handwritten digits for processing zip codes. Active learning algorithms enable rapid training with minimal investment of time on the part of human experts to provide training examples consisting of correctly classified (labeled) input data. They function by identifying which examples would be most profitable for a human expert to label. The goal is to maximize classifier accuracy while minimizing the number of examples the expert must label. Although there are several well-established methods for active learning, they may not operate well when irrelevant examples are present in the data set. That is, they may select an item for labeling that the expert simply cannot assign to any of the valid classes. In the context of classifying handwritten digits, the irrelevant items may include stray marks, smudges, and mis-scans. Querying the expert about these items results in wasted time or erroneous labels, if the expert is forced to assign the item to one of the valid classes. In contrast, the new algorithm provides a specific mechanism for avoiding querying the irrelevant items. This algorithm has two components: an active learner (which could be a conventional active learning algorithm) and a relevance classifier. The combination of these components yields a method, denoted Relevance Bias, that enables the active learner to avoid querying irrelevant data so as to increase its learning rate and efficiency when irrelevant items are present. The algorithm collects irrelevant data in a set of rejected examples, then trains the relevance classifier to distinguish between labeled (relevant) training examples and the rejected ones. The active learner combines its ranking of the items with the probability that they are relevant to yield a final decision about which item to present to the expert for labeling. Experiments on several data sets have demonstrated that the Relevance Bias approach significantly decreases the number of irrelevant items queried and also accelerates learning speed

NASA Technical Reports Server

NASA Tech Briefs, November 2009

Author
Publication venue
Publication date
Field of study

Topics covered include: Cryogenic Chamber for Servo-Hydraulic Materials Testing; Apparatus Measures Thermal Conductance Through a Thin Sample from Cryogenic to Room Temperature; Rover Attitude and Pointing System Simulation Testbed; Desktop Application Program to Simulate Cargo-Air-Drop Tests; Multimodal Friction Ignition Tester; Small-Bolt Torque-Tension Tester; Integrated Spacesuit Audio System Enhances Speech Quality and Reduces Noise; Hardware Implementation of a Bilateral Subtraction Filter; Simple Optoelectronic Feedback in Microwave Oscillators; Small X-Band Oscillator Antennas; Free-Space Optical Interconnect Employing VCSEL Diodes; Discrete Fourier Transform Analysis in a Complex Vector Space; Miniature Scroll Pumps Fabricated by LIGA; Self-Assembling, Flexible, Pre-Ceramic Composite Preforms; Flight-speed Integral Image Analysis Toolkit; Work Coordination Engine; Multi-Mission Automated Task Invocation Subsystem; Autonomously Calibrating a Quadrupole Mass Spectrometer; Determining Spacecraft Reaction Wheel Friction Parameters; Composite Silica Aerogels Opacified with Titania; Multiplexed Colorimetric Solid-Phase Extraction; Detecting Airborne Mercury by Use of Polymer/Carbon Films; Lattice-Matched Semiconductor Layers on Single Crystalline Sapphire Substrate; Pressure-Energized Seal Rings to Better Withstand Flows; Rollerjaw Rock Crusher; Microwave Sterilization and Depyrogenation System; Quantifying Therapeutic and Diagnostic Efficacy in 2D Microvascular Images; NiF2/NaF:CaF2/Ca Solid-State High-Temperature Battery Cells; Critical Coupling Between Optical Fibers and WGM Resonators; Microwave Temperature Profiler Mounted in a Standard Airborne Research Canister; Alternative Determination of Density of the Titan Atmosphere; Solar Rejection Filter for Large Telescopes; Automated CFD for Generation of Airfoil Performance Tables; Progressive Classification Using Support Vector Machines; Active Learning with Irrelevant Examples; A Data Matrix Method for Improving the Quantification of Element Percentages of SEM/EDX Analysis; Deployable Shroud for the International X-Ray Observatory; Improved Model of a Mercury Ring Damper; Optoelectronic pH Meter: Further Details; X-38 Advanced Sublimator; and Solar Simulator Represents the Mars Surface Solar Environment

NASA Technical Reports Server

A Data Matrix Method for Improving the Quantification of Element Percentages of SEM/EDX Analysis

Author: Lane John
Publication venue
Publication date
Field of study

A simple 2D M N matrix involving sample preparation enables the microanalyst to peer below the noise floor of element percentages reported by the SEM/EDX (scanning electron microscopy/ energy dispersive x-ray) analysis, thus yielding more meaningful data. Using the example of a 2 3 sample set, there are M = 2 concentration levels of the original mix under test: 10 percent ilmenite (90 percent silica) and 20 percent ilmenite (80 percent silica). For each of these M samples, N = 3 separate SEM/EDX samples were drawn. In this test, ilmenite is the element of interest. By plotting the linear trend of the M sample s known concentration versus the average of the N samples, a much higher resolution of elemental analysis can be performed. The resulting trend also shows how the noise is affecting the data, and at what point (of smaller concentrations) is it impractical to try to extract any further useful data

NASA Technical Reports Server

Progressive Classification Using Support Vector Machines

Author: Kocurek Michael
Wagstaff Kiri
Publication venue
Publication date
Field of study

An algorithm for progressive classification of data, analogous to progressive rendering of images, makes it possible to compromise between speed and accuracy. This algorithm uses support vector machines (SVMs) to classify data. An SVM is a machine learning algorithm that builds a mathematical model of the desired classification concept by identifying the critical data points, called support vectors. Coarse approximations to the concept require only a few support vectors, while precise, highly accurate models require far more support vectors. Once the model has been constructed, the SVM can be applied to new observations. The cost of classifying a new observation is proportional to the number of support vectors in the model. When computational resources are limited, an SVM of the appropriate complexity can be produced. However, if the constraints are not known when the model is constructed, or if they can change over time, a method for adaptively responding to the current resource constraints is required. This capability is particularly relevant for spacecraft (or any other real-time systems) that perform onboard data analysis. The new algorithm enables the fast, interactive application of an SVM classifier to a new set of data. The classification process achieved by this algorithm is characterized as progressive because a coarse approximation to the true classification is generated rapidly and thereafter iteratively refined. The algorithm uses two SVMs: (1) a fast, approximate one and (2) slow, highly accurate one. New data are initially classified by the fast SVM, producing a baseline approximate classification. For each classified data point, the algorithm calculates a confidence index that indicates the likelihood that it was classified correctly in the first pass. Next, the data points are sorted by their confidence indices and progressively reclassified by the slower, more accurate SVM, starting with the items most likely to be incorrectly classified. The user can halt this reclassification process at any point, thereby obtaining the best possible result for a given amount of computation time. Alternatively, the results can be displayed as they are generated, providing the user with real-time feedback about the current accuracy of classification

NASA Technical Reports Server

Active Learning with Irrelevant Examples

Author: Burl Michael
Mazzoni Dominic
Wagstaff Kiri L.
Publication venue
Publication date: 18/09/2006
Field of study

Active learning algorithms attempt to accelerate the learning process by requesting labels for the most informative items first. In real-world problems, however, there may exist unlabeled items that are irrelevant to the user's classification goals. Queries about these points slow down learning because they provide no information about the problem of interest. We have observed that when irrelevant items are present, active learning can perform worse than random selection, requiring more time (queries) to achieve the same level of accuracy. Therefore, we propose a novel approach, Relevance Bias, in which the active learner combines its default selection heuristic with the output of a simultaneously trained relevance classifier to favor items that are likely to be both informative and relevant. In our experiments on a real-world problem and two benchmark datasets, the Relevance Bias approach significantly improved the learning rate of three different active learning approaches

NASA Technical Reports Server

Active learning with irrelevant examples

Author
Publication venue: Pasadena, CA : Jet Propulsion Laboratory, National Aeronautics and Space Administration, 2006.
Publication date
Field of study

BEACON eSPACE