96 research outputs found
Recommended from our members
Classifier Chains for Multi-Label Classification with Incomplete Labels
Many methods have been explored in the literature of multi-label learning, ranging from simple problem transformation to more complex method that capture correlation among labels. However, mostly all existing works do not address the challenge with incomplete label data. The goal of this project is to extend the work of ensemble classifier chain to learn models using training examples with incomplete label assignment. This scenario is highly expected in many real-world application. For example, in image annotation, a user provides partial tags, or label assignment, for the image. We propose a new method that consider the multi-label learning problem in which portion of label assignment is missing. A further evaluation is covered in this project to study the effect of different parameters accompany this approach
Recommended from our members
Exploiting symmetry properties in the evaluation of inductive learning algorithms : an empirical domain-independent comparative study
Although numerous Boolean concept learning algorithms have been introduced in the literature, little is known about what categories of concepts are actually learned satisfactorily by most of these algorithms. Conventional comparison studies, which test various algorithms in some chosen domain, do not provide such information, since their conclusions are limited to the domain considered. A more general way to evaluate a learning algorithm is to test it on all the possible concepts defined on a given number of Boolean features. However, this immediately leads to unaffordable computational costs, since we need to consider as many as 22 n concepts, when the number of features is n. In [D89], experiments of this type were reported for the case of three features, while the cases of four or more features were concluded to be infeasible.
This paper directly builds on the work of [D89]. We introduce two techniques that significantly cut the computational costs of the desired experiments and enable us to perform experiments over the space of concepts defined on up to five variables. The first technique is to exploit the fact that inductive learning algorithms are generally insensitive to permuting and/or complementing the features of the domain. We give a method for eliminating redundancy in the experiments by computing a set of representative concepts that suffices to characterize the behavior of a given algorithm over the space of all concepts. The second technique is to resort to statistical approximation to avoid running algorithms on all the possible samples of a concept. We show that testing a feasibly small number of samples suffices to obtain results with a high level of confidence.
Applying these techniques, we report experimental results analogous to those of [D89] on some decision tree building algorithms over five Boolean features. The results we present are rather surprising and demonstrate that there is still much to be learned about the algorithms we tested.
The paper also discusses the possibility of enhancing the above techniques to work for the cases of six or more Boolean features
SOAP: Efficient Feature Selection of Numeric Attributes
The attribute selection techniques for supervised learning, used in the preprocessing phase to emphasize the most relevant attributes, allow making models of classification simpler and easy to understand. Depending on the method to apply: starting point, search organization, evaluation strategy, and the stopping criterion, there is an added cost to the classification algorithm that we are going to use, that normally will be compensated, in greater or smaller extent, by the attribute reduction in the classification model. The algorithm (SOAP: Selection of Attributes by Projection) has some interesting characteristics: lower computational cost (O(mn log n) m attributes and n examples in the data set) with respect to other typical algorithms due to the absence of distance and statistical calculations; with no need for transformation. The performance of SOAP is analysed in two ways: percentage of reduction and classification. SOAP has been compared to CFS [6] and ReliefF [11]. The results are generated by C4.5 and 1NN before and after the application of the algorithms
Recommended from our members
Efficient algorithms for identifying relevant features
This paper describes efficient methods for exact and approximate implementation of the MIN-FEATURES bias, which prefers consistent hypotheses definable over as few features as possible. This bias is useful for learning domains where many irrelevant features are present in the training data.
We first introduce FOCUS-2, a new algorithm that exactly implements the MINFEATURES bias. This algorithm is empirically shown to be substantially faster than the FOCUS algorithm previously given in [Almuallim and Dietterich 91]. We then introduce the Mutual-Information-Greedy, Simple-Greedy and Weighted-Greedy algorithms, which apply efficient heuristics for approximating the MINFEATURES bias. These algorithms employ greedy heuristics that trade optimality for computational efficiency. Experimental studies show that the learning performance of ID3 is greatly improved when these algorithms are used to preprocess the training data by eliminating the irrelevant features from ID3's consideration. In particular, the Weighted-Greedy algorithm provides an excellent and efficient approximation of the MIN-FEATURES bias
Recommended from our members
On learning more concepts
The coverage of a learning algorithm is the number of concepts that can be learned by that algorithm from samples of a given size. This paper asks whether good learning algorithms can be designed by maximizing their coverage. The paper extends a previous upper bound on the coverage of any Boolean concept learning algorithm and describes two algorithms-Multi-Balls and Large-Ball-whose coverage approaches this upper bound. Experimental measurement of the coverage of the ID3 and FRINGE algorithms shows that their coverage is far below this bound. Further analysis of Large-Ball shows that although it learns many concepts, these do not seem to be very interesting concepts. Hence, coverage maximization a.lone does not appear to yield practically Âuseful learning algorithms. The paper concludes with a definition of coverage within a bias, which suggests a way that coverage maximization could be applied to strengthen weak preference biases.Keywords: inductive learning, concept coverage, theoretical analysis
Osteology and relationships of Rhinopycnodus gabriellae gen. et sp. nov. (Pycnodontiformes) from the marine Late Cretaceous of Lebanon
The osteology of Rhinopycnodus gabriellae gen. and sp. nov., a pycnodontiform fish from the marine Cenomanian (Late Cretaceous) of Lebanon, is studied in detail. This new fossil genus belongs to the family Pycnodontidae, as shown by the presence of a posterior brush-like process on its parietal. Its long and broad premaxilla, bearing one short and very broad tooth is the principal autapomorphy of this fish. Within the phylogeny of Pycnodontidae, Rhinopycnodus occupies an intermediate position between Ocloedus and Tepexichthys
Heuristic Search over a Ranking for Feature Selection
In this work, we suggest a new feature selection technique that lets us use the wrapper approach for finding a well suited feature set for distinguishing experiment classes in high dimensional data sets. Our method is based on the relevance and redundancy idea, in the sense that a ranked-feature is chosen if additional information is gained by adding it. This heuristic leads to considerably better accuracy results, in comparison to the full set, and other representative feature selection algorithms in twelve well–known data sets, coupled with notable dimensionality reduction
Thermally conductive polymer nanocomposites for filament-based additive manufacturing
Thermal management is a crucial factor affecting the performance and lifetime in several applications, such as electronics, generators, and heat exchangers. Additive manufacturing (AM) techniques provide a new revolution in manufacturing by expanding freedom for design and fabrication for complex geometries. One way to overcome these problems is by developing novel polymer-based composite materials with improved thermal conductivity properties for AM technologies. In this review, the fundamental principles of designing high thermal conductive polymer nanocomposites are presented. High thermal conductive polymer nanocomposites generally consist of the base polymer and thermally conductive filler materials such as aluminum oxide or boron nitride which are reviewed in detail. The factors affecting the thermal conductivity of composites, such as the filler loading and overall composite structure, are also summarized. This article stands on statistical data from technical papers published during 2000–2020 about the topics of fused deposition modeling (FDM) polymers or their thermal conductive composites. Finally, the most critical factors affecting the thermal conductivity of polymer nanocomposites are described in detail. Nonetheless, various novel techniques show the potential abilities of thermal conductivity of polymer nanocomposites usage by AM technologies, enabling applications in LED devices, energy, and electronic packaging. Graphical abstract: [Figure not available: see fulltext.
A bi-objective feature selection algorithm for large omics datasets
Special Issue: Fourth special issue on knowledge discovery and business intelligence.Feature selection is one of the most important concepts in data mining when dimensionality reduction is needed. The performance measures of feature selection encompass predictive accuracy and result comprehensibility. Consistency based methods are a significant category of feature selection research that substantially improves the comprehensibility of the result using the parsimony principle. In this work, the bi-objective version of the algorithm Logical Analysis of Inconsistent Data is applied to large volumes of data. In order to deal with hundreds of thousands of attributes, heuristic decomposition uses parallel processing to solve a set covering problem and a cross-validation technique. The bi-objective solutions contain the number of reduced features and the accuracy. The algorithm is applied to omics datasets with genome-like characteristics of patients with rare diseases.The authors would like to thank the FCT support UID/Multi/04046/2013. This work used the EGI, European Grid Infrastructure, with the support of the IBERGRID, Iberian Grid Infrastructure, and INCD (Portugal).info:eu-repo/semantics/publishedVersio
- …