Search CORE

96 research outputs found

Recommended from our members

Classifier Chains for Multi-Label Classification with Incomplete Labels

Author: Almuallim Jafer
Publication venue: Oregon State Unversity
Publication date
Field of study

Many methods have been explored in the literature of multi-label learning, ranging from simple problem transformation to more complex method that capture correlation among labels. However, mostly all existing works do not address the challenge with incomplete label data. The goal of this project is to extend the work of ensemble classifier chain to learn models using training examples with incomplete label assignment. This scenario is highly expected in many real-world application. For example, in image annotation, a user provides partial tags, or label assignment, for the image. We propose a new method that consider the multi-label learning problem in which portion of label assignment is missing. A further evaluation is covered in this project to study the effect of different parameters accompany this approach

ScholarsArchive@OSU

Recommended from our members

Exploiting symmetry properties in the evaluation of inductive learning algorithms : an empirical domain-independent comparative study

Author: Almuallim Hussein
Publication venue: Oregon State University. Department of Computer Science
Publication date
Field of study

Although numerous Boolean concept learning algorithms have been introduced in the literature, little is known about what categories of concepts are actually learned satisfactorily by most of these algorithms. Conventional comparison studies, which test various algorithms in some chosen domain, do not provide such information, since their conclusions are limited to the domain considered. A more general way to evaluate a learning algorithm is to test it on all the possible concepts defined on a given number of Boolean features. However, this immediately leads to unaffordable computational costs, since we need to consider as many as 22 n concepts, when the number of features is n. In [D89], experiments of this type were reported for the case of three features, while the cases of four or more features were concluded to be infeasible. This paper directly builds on the work of [D89]. We introduce two techniques that significantly cut the computational costs of the desired experiments and enable us to perform experiments over the space of concepts defined on up to five variables. The first technique is to exploit the fact that inductive learning algorithms are generally insensitive to permuting and/or complementing the features of the domain. We give a method for eliminating redundancy in the experiments by computing a set of representative concepts that suffices to characterize the behavior of a given algorithm over the space of all concepts. The second technique is to resort to statistical approximation to avoid running algorithms on all the possible samples of a concept. We show that testing a feasibly small number of samples suffices to obtain results with a high level of confidence. Applying these techniques, we report experimental results analogous to those of [D89] on some decision tree building algorithms over five Boolean features. The results we present are rather surprising and demonstrate that there is still much to be learned about the algorithms we tested. The paper also discusses the possibility of enhancing the above techniques to work for the cases of six or more Boolean features

ScholarsArchive@OSU

SOAP: Efficient Feature Selection of Numeric Attributes

Author: C. A. R. Hoare
G. Pagallo
H. Almuallim
J. Quinlan
R. Kohavi
R. Setiono
Publication venue
Publication date: 01/01/2002
Field of study

The attribute selection techniques for supervised learning, used in the preprocessing phase to emphasize the most relevant attributes, allow making models of classification simpler and easy to understand. Depending on the method to apply: starting point, search organization, evaluation strategy, and the stopping criterion, there is an added cost to the classification algorithm that we are going to use, that normally will be compensated, in greater or smaller extent, by the attribute reduction in the classification model. The algorithm (SOAP: Selection of Attributes by Projection) has some interesting characteristics: lower computational cost (O(mn log n) m attributes and n examples in the data set) with respect to other typical algorithms due to the absence of distance and statistical calculations; with no need for transformation. The performance of SOAP is analysed in two ways: percentage of reduction and classification. SOAP has been compared to CFS [6] and ReliefF [11]. The results are generated by C4.5 and 1NN before and after the application of the algorithms

CiteSeerX

Crossref

idUS. Depósito de Investigación Universidad de Sevilla

Recommended from our members

Efficient algorithms for identifying relevant features

Author: Almuallim Hussein
Dietterich Thomas G.
Publication venue: Oregon State University. Department of Computer Science
Publication date
Field of study

This paper describes efficient methods for exact and approximate implementation of the MIN-FEATURES bias, which prefers consistent hypotheses definable over as few features as possible. This bias is useful for learning domains where many irrelevant features are present in the training data. We first introduce FOCUS-2, a new algorithm that exactly implements the MINFEATURES bias. This algorithm is empirically shown to be substantially faster than the FOCUS algorithm previously given in [Almuallim and Dietterich 91]. We then introduce the Mutual-Information-Greedy, Simple-Greedy and Weighted-Greedy algorithms, which apply efficient heuristics for approximating the MINFEATURES bias. These algorithms employ greedy heuristics that trade optimality for computational efficiency. Experimental studies show that the learning performance of ID3 is greatly improved when these algorithms are used to preprocess the training data by eliminating the irrelevant features from ID3's consideration. In particular, the Weighted-Greedy algorithm provides an excellent and efficient approximation of the MIN-FEATURES bias

ScholarsArchive@OSU

Recommended from our members

On learning more concepts

Author: Almuallim Hussein
Dietterich Thomas G.
Publication venue: Oregon State University. Department of Computer Science
Publication date
Field of study

The coverage of a learning algorithm is the number of concepts that can be learned by that algorithm from samples of a given size. This paper asks whether good learning algorithms can be designed by maximizing their coverage. The paper extends a previous upper bound on the coverage of any Boolean concept learning algorithm and describes two algorithms-Multi-Balls and Large-Ball-whose coverage approaches this upper bound. Experimental measurement of the coverage of the ID3 and FRINGE algorithms shows that their coverage is far below this bound. Further analysis of Large-Ball shows that although it learns many concepts, these do not seem to be very interesting concepts. Hence, coverage maximization a.lone does not appear to yield practically useful learning algorithms. The paper concludes with a definition of coverage within a bias, which suggests a way that coverage maximization could be applied to strengthen weak preference biases.Keywords: inductive learning, concept coverage, theoretical analysis

ScholarsArchive@OSU

Osteology and relationships of Rhinopycnodus gabriellae gen. et sp. nov. (Pycnodontiformes) from the marine Late Cretaceous of Lebanon

Author: Almuallim
Anastassiou
Battiti
Beirlant
Bell
Blum
Cover
Cox
Darbellay
Daub
de Vijver
Devroye
Dreo
Fleuret
François
Guyon
Guyon
Haykin
Jaynes
Kendall
Kennel
Kohavi
Kojadinovic
L.
Liu
Margolin
McGill
Meyer
Meyer
Meyer
Miller
Nemenman
Olsen
Peng
Portinale
Rossi
Shannon
Tourassi
van't Veer
Wienholt
Yang
Yang
Yu
Publication venue
Publication date: 01/01/2013
Field of study

The osteology of Rhinopycnodus gabriellae gen. and sp. nov., a pycnodontiform fish from the marine Cenomanian (Late Cretaceous) of Lebanon, is studied in detail. This new fossil genus belongs to the family Pycnodontidae, as shown by the presence of a posterior brush-like process on its parietal. Its long and broad premaxilla, bearing one short and very broad tooth is the principal autapomorphy of this fish. Within the phylogeny of Pycnodontidae, Rhinopycnodus occupies an intermediate position between Ocloedus and Tepexichthys

Crossref

Directory of Open Access Journals

Open Repository and Bibliography - Liège

DI-fusion

Hochschulschriftenserver - Universität Frankfurt am Main

Heuristic Search over a Ranking for Feature Selection

Author: E. Xing
H. Almuallim
H. Liu
H. Liu
I. Guyon
I. Guyon
I. Inza
I. Witten
L. Yu
M. Hall
M. Xiong
R. Kohavi
Publication venue
Publication date: 01/01/2005
Field of study

In this work, we suggest a new feature selection technique that lets us use the wrapper approach for finding a well suited feature set for distinguishing experiment classes in high dimensional data sets. Our method is based on the relevance and redundancy idea, in the sense that a ranked-feature is chosen if additional information is gained by adding it. This heuristic leads to considerably better accuracy results, in comparison to the full set, and other representative feature selection algorithms in twelve well–known data sets, coupled with notable dimensionality reduction

CiteSeerX

Crossref

idUS. Depósito de Investigación Universidad de Sevilla

Thermally conductive polymer nanocomposites for filament-based additive manufacturing

Author: Al Rikabi Ihab Jabbar
Almuallim Basel
Mohammed Hussein A.
Wan Sharuzi Wan Harun
Publication venue: Springer
Publication date: 01/02/2022
Field of study

Thermal management is a crucial factor affecting the performance and lifetime in several applications, such as electronics, generators, and heat exchangers. Additive manufacturing (AM) techniques provide a new revolution in manufacturing by expanding freedom for design and fabrication for complex geometries. One way to overcome these problems is by developing novel polymer-based composite materials with improved thermal conductivity properties for AM technologies. In this review, the fundamental principles of designing high thermal conductive polymer nanocomposites are presented. High thermal conductive polymer nanocomposites generally consist of the base polymer and thermally conductive filler materials such as aluminum oxide or boron nitride which are reviewed in detail. The factors affecting the thermal conductivity of composites, such as the filler loading and overall composite structure, are also summarized. This article stands on statistical data from technical papers published during 2000–2020 about the topics of fused deposition modeling (FDM) polymers or their thermal conductive composites. Finally, the most critical factors affecting the thermal conductivity of polymer nanocomposites are described in detail. Nonetheless, various novel techniques show the potential abilities of thermal conductivity of polymer nanocomposites usage by AM technologies, enabling applications in LED devices, energy, and electronic packaging. Graphical abstract: [Figure not available: see fulltext.

UMP Institutional Repository

A bi-objective feature selection algorithm for large omics datasets

Author: Almuallim
Boros
Cavique
Cavique
Chandrashekar
Chung
Chvatal
Collette
Crama
Joncour
Kira
Liu
Pawlak
Pawlak
Peters
Polkowski
Smet
Stephens
Talbi
The 1000 Genomes Project Consortium
Yao
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

Special Issue: Fourth special issue on knowledge discovery and business intelligence.Feature selection is one of the most important concepts in data mining when dimensionality reduction is needed. The performance measures of feature selection encompass predictive accuracy and result comprehensibility. Consistency based methods are a significant category of feature selection research that substantially improves the comprehensibility of the result using the parsimony principle. In this work, the bi-objective version of the algorithm Logical Analysis of Inconsistent Data is applied to large volumes of data. In order to deal with hundreds of thousands of attributes, heuristic decomposition uses parallel processing to solve a set covering problem and a cross-validation technique. The bi-objective solutions contain the number of reduced features and the accuracy. The algorithm is applied to omics datasets with genome-like characteristics of patients with rare diseases.The authors would like to thank the FCT support UID/Multi/04046/2013. This work used the EGI, European Grid Infrastructure, with the support of the IBERGRID, Iberian Grid Infrastructure, and INCD (Portugal).info:eu-repo/semantics/publishedVersio

Crossref

Repositório Aberto da Universidade Aberta

Repositório Científico do Instituto Nacional de Saúde

A data-driven framework for identifying important components in complex systems

Author: Abdella
Almuallim
Altmann
Amaldi
Andrews
Antonello
Archer
Baraldi
Birnbaum
Blum
Bolón-Canedo
Borgonovo
Breiman
Chen
Cheok
Do
Efron
Enrico Zio
Eusgeld
Fu
Fussell
Gandini
Genuer
Genuer
Gevrey
Gregorutti
Guyon
Hong
Kazemitabar
Keating
Kohavi
Kuo
Louppe
Lyu
Mi
Modarres
Pearson
Peng
Piero Baraldi
Reshef
Salcedo-Sanz
Sheng
Stetco
Strobl
Tibshirani
Vaurio
Xuefei Lu
Zhu
Zio
Zio
Publication venue: 'Elsevier BV'
Publication date: 01/12/2020
Field of study

Crossref

Edinburgh Research Explorer