4 research outputs found
Active Class Incremental Learning for Imbalanced Datasets
Incremental Learning (IL) allows AI systems to adapt to streamed data. Most
existing algorithms make two strong hypotheses which reduce the realism of the
incremental scenario: (1) new data are assumed to be readily annotated when
streamed and (2) tests are run with balanced datasets while most real-life
datasets are actually imbalanced. These hypotheses are discarded and the
resulting challenges are tackled with a combination of active and imbalanced
learning. We introduce sample acquisition functions which tackle imbalance and
are compatible with IL constraints. We also consider IL as an imbalanced
learning problem instead of the established usage of knowledge distillation
against catastrophic forgetting. Here, imbalance effects are reduced during
inference through class prediction scaling. Evaluation is done with four visual
datasets and compares existing and proposed sample acquisition functions.
Results indicate that the proposed contributions have a positive effect and
reduce the gap between active and standard IL performance.Comment: Accepted in IPCV workshop from ECCV202
Mining the Web With Active Hidden Markov Models
Introduction Given the enormous amounts of information available only in unstructured or semi-structured textual documents, tools for information extraction (IE) have become enormously important. IE tools identify the relevant information in such documents and convert it into a structured format such as a database or an XML document. While first IE algorithms were hand-crafted sets of rules, researchers soon turned to learning extraction rules from hand-labeled documents. Unfortunately, rule-based approaches sometimes fail to provide the necessary robustness against the inherent variability of document structure, which has led to the recent interest in the use of hidden Markov models (HMMs) [1] for this purpose. Speech recognition and computational biochemistry are well-known applications of HMMs. Markov model algorithms that are used for part-ofspeech tagging, as well as known hidden Markov models for information extraction [1] require the training documents to b