734 research outputs found

    Multiple Instance Learning: A Survey of Problem Characteristics and Applications

    Full text link
    Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research

    Machine learning with multistage classifiers for identification of of ectoparasite infected mud crab genus Scylla

    Get PDF
    Recently, the mud-crab farming can help the rural population economically. However, the existing parasite in the mud-crabs could interfere the long live of the mud-crabs. Unfortunately, the parasite has been identified to live in hundreds of mud-crabs, particularly it happened in Terengganu Coastal Water, Malaysia. This study investigates the initial identification of the parasite features based on their classes by using machine learning techniques. In this case, we employed five classifiers i.e logistic regression (LR), k-nearest neighbors (kNN), Gaussian Naive Bayes (GNB), support vector machine (SVM), and linear discriminant analysis (LDA). We compared these five classfiers to best performance of classification of the parasites. The classification process involving three stages. First, classify the parasites into two classes (normal and abnormal) regardless of their ventral types. Second, classified sexuality (female or male) and maturity (mature or immature). Finally, we compared the five classifiers to identify the species of the parasite. The experimental results showed that GNB and LDA are the most effective classifiers for carrying out the initial classification of the rhizocephalan parasite within the mud crab genus Scylla

    A Review on Detection of Medical Plant Images

    Get PDF
    Both human and non-human life on Earth depends heavily on plants. The natural cycle is most significantly influenced by plants. Because of the sophistication of recent plant discoveries and the computerization of plants, plant identification is particularly challenging in biology and agriculture. There are a variety of reasons why automatic plant classification systems must be put into place, including instruction, resource evaluation, and environmental protection. It is thought that the leaves of medicinal plants are what distinguishes them. It is an interesting goal to identify the species of plant automatically using the photo identity of their leaves because taxonomists are undertrained and biodiversity is quickly vanishing in the current environment. Due to the need for mass production, these plants must be identified immediately. The physical and emotional health of people must be taken into consideration when developing drugs. To important processing of medical herbs is to identify and classify. Since there aren't many specialists in this field, it might be difficult to correctly identify and categorize medicinal plants. Therefore, a fully automated approach is optimal for identifying medicinal plants. The numerous means for categorizing medicinal plants that take into interpretation based on the silhouette and roughness of a plant's leaf are briefly précised in this article

    Multiple instance learning for sequence data with across bag dependencies

    Full text link
    In Multiple Instance Learning (MIL) problem for sequence data, the instances inside the bags are sequences. In some real world applications such as bioinformatics, comparing a random couple of sequences makes no sense. In fact, each instance may have structural and/or functional relations with instances of other bags. Thus, the classification task should take into account this across bag relation. In this work, we present two novel MIL approaches for sequence data classification named ABClass and ABSim. ABClass extracts motifs from related instances and use them to encode sequences. A discriminative classifier is then applied to compute a partial classification result for each set of related sequences. ABSim uses a similarity measure to discriminate the related instances and to compute a scores matrix. For both approaches, an aggregation method is applied in order to generate the final classification result. We applied both approaches to solve the problem of bacterial Ionizing Radiation Resistance prediction. The experimental results of the presented approaches are satisfactory

    Machine Learning Approach for Vigilance State Classification in Mice

    Get PDF
    Sleep has a significant impact on cognitive abilities such as memory, reaction time, productivity, and creative thinking; however, there are many aspects of this important activity that are not clearly understood. Over the last century, researchers have developed technology and animal models to assist in the study of sleep. Manual sleep scoring is time consuming, reduces productivity, and is impacted by human scorer subjectivity. On the other hand, automatic sleep stage categorization can enhance consistency and reliability, aiding professionals in identifying sleep related health problems. In recent times various studies reported significant achievements for automatic vigilance detection and overcome the drawback of REM stage detection. Two models that reported very good performance are SCOPRISM and UTSN-L that replicate the manual scoring criteria. In this study, the performance of these models is documented on an independent dataset. The same dataset is also employed in feature-based machine learning approaches, where features from EEG and EMG signals are incorporated to the scoring process and NB, LDA, DT, KNN, SVM and RF models are assessed to do a comparative study on the same feature set. Results show that, the random forest model achieves the highest overall accuracy of 84.7%, while the SCOPRISM and UTSN-L models achieve 76.1% and 77.1% respectively. When evaluated on an animal-by-animal basis, this RF model exhibits a reduced standard deviation with higher accuracy. However, despite the fact that the random forest model performs better than SCOPRISM and UTSN-L, it lacks REM sensitivity and still exhibits lower classification performance for genetically engineered mice of higher age groups. Animal-wise feature normalization is carried out, which resulted in findings that outperform all prior outcomes and reported the best result for vigilance stage detection with an overall accuracy of 90.8% and a REM sensitivity of 90%. The animal-wise evaluation also shows, this approach exhibits a more robust performance over the set of test animals than prior models. Furthermore, the algorithm trained on 28 animal datasets is applied to the recordings utilized in the UTSN-L model, and overall accuracy was found 40%, with a REM recall of 16.6%. This reinforces the issue that while the machine learning algorithm excels at detecting key patterns in the dataset, performance varies depending on the equipment employed in different environments

    Multi-Label Super Learner: Multi-Label Classification and Improving Its Performance Using Heterogenous Ensemble Methods

    Get PDF
    Classification is the task of predicting the label(s) of future instances by learning and inferring from the patterns of instances with known labels. Traditional classification methods focus on single-label classification; however, many real-life problems require multi-label classification that classifies each instance into multiple categories. For example, in sentiment analysis, a person may feel multiple emotions at the same time; in bioinformatics, a gene or protein may have a number of functional expressions; in text categorization, an email, medical record, or social media posting can be identified by various tags simultaneously. As a result of such wide a range of applications, in recent years, multi-label classification has become an emerging research area. There are two general approaches to realize multi-label classification: problem transformation and algorithm adaption. The problem transformation methodology, at its core, converts a multi-label dataset into several single-label datasets, thereby allowing the transformed datasets to be modeled using existing binary or multi-class classification methods. On the other hand, the algorithm adaption methodology transforms single-label classification algorithms in order to be applied to original multi-label datasets. This thesis proposes a new method, called Multi-Label Super Leaner (MLSL), which is a stacking-based heterogeneous ensemble method. An improved multi-label classification algorithm following the problem transformation approach, MLSL combines the prediction power of several multi-label classification methods through an ensemble algorithm, super learner. The performance of this new method is compared to existing problem transformation algorithms, and our numerical results show that MLSL outperforms existing algorithms for almost all of the performance metrics
    corecore