315 research outputs found

    BAGGING BASED ENSEMBLE ANALYSIS IN HANDLING UNBALANCED DATA ON CLASSIFICATION MODELING

    Get PDF
    The purpose of this study is to Identify the algorithm of each method of handling the unbalanced class based on bagging based on the literature review. This study uses a bagging based ensemble method such as UnderBagging, OverBagging, UnderOverBagging, SMOTEBagging, Roughly Balanced Bagging and the last one is the Bagging Ensemble Variation. The data used is coded from the UCI Repository with 16 data, eight of which have class categories with low imbalance problems, and the rest are categorized as high imbalance problems. The number of classes used in this study amounted to two classes. The class with a small number is made into the minority class and the rest is made up as the majority class. The result of this research is the bagging based method gives better results when compared to classical methods such as the classification tree

    Survey on highly imbalanced multi-class data

    Get PDF
    Machine learning technology has a massive impact on society because it offers solutions to solve many complicated problems like classification, clustering analysis, and predictions, especially during the COVID-19 pandemic. Data distribution in machine learning has been an essential aspect in providing unbiased solutions. From the earliest literatures published on highly imbalanced data until recently, machine learning research has focused mostly on binary classification data problems. Research on highly imbalanced multi-class data is still greatly unexplored when the need for better analysis and predictions in handling Big Data is required. This study focuses on reviews related to the models or techniques in handling highly imbalanced multi-class data, along with their strengths and weaknesses and related domains. Furthermore, the paper uses the statistical method to explore a case study with a severely imbalanced dataset. This article aims to (1) understand the trend of highly imbalanced multi-class data through analysis of related literatures; (2) analyze the previous and current methods of handling highly imbalanced multi-class data; (3) construct a framework of highly imbalanced multi-class data. The chosen highly imbalanced multi-class dataset analysis will also be performed and adapted to the current methods or techniques in machine learning, followed by discussions on open challenges and the future direction of highly imbalanced multi-class data. Finally, for highly imbalanced multi-class data, this paper presents a novel framework. We hope this research can provide insights on the potential development of better methods or techniques to handle and manipulate highly imbalanced multi-class data

    A Feasibility Study of Azure Machine Learning for Sheet Metal Fabrication

    Get PDF
    The research demonstrated that sheet metal fabrication machines can utilize machine learning to gain competitive advantage. With various possible applications of machine learning, it was decided to focus on the topic of predictive maintenance. Implementation of the predictive service is accomplished with Microsoft Azure Machine Learning. The aim was to demonstrate to the stakeholders at the case company potential laying in machine learning. It was found that besides machine learning technologies being founded on sophisticated algorithms and mathematics it can still be utilized and bring benefits with moderate effort required. Significance of this study is in it demonstrating potentials of the machine learning to be used in improving operations management and especially for sheet metal fabrication machines.fi=Opinnäytetyö kokotekstinä PDF-muodossa.|en=Thesis fulltext in PDF format.|sv=Lärdomsprov tillgängligt som fulltext i PDF-format

    Proceedings of the 18th Irish Conference on Artificial Intelligence and Cognitive Science

    Get PDF
    These proceedings contain the papers that were accepted for publication at AICS-2007, the 18th Annual Conference on Artificial Intelligence and Cognitive Science, which was held in the Technological University Dublin; Dublin, Ireland; on the 29th to the 31st August 2007. AICS is the annual conference of the Artificial Intelligence Association of Ireland (AIAI)

    Automatic extraction of definitions

    Get PDF
    Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2014This doctoral research work provides a set of methods and heuristics for building a definition extractor or for fine-tuning an existing one. In order to develop and test the architecture, a generic definitions extractor for the Portuguese language is built. Furthermore, the methods were tested in the construction of an extractor for two languages different from Portuguese, which are English and, less extensively, Dutch. The approach presented in this work makes the proposed extractor completely different in nature in comparison to the other works in the field. It is a matter of fact that most systems that automatically extract definitions have been constructed taking into account a specific corpus on a specific topic, and are based on the manual construction of a set of rules or patterns capable of identifyinf a definition in a text. This research focused on three types of definitions, characterized by the connector between the defined term and its description. The strategy adopted can be seen as a "divide and conquer"approach. Differently from the other works representing the state of the art, specific heuristics were developed in order to deal with different types of definitions, namely copula, verbal and punctuation definitions. We used different methodology for each type of definition, namely we propose to use rule-based methods to extract punctuation definitions, machine learning with sampling algorithms for copula definitions, and machine learning with a method to increase the number of positive examples for verbal definitions. This architecture is justified by the increasing linguistic complexity that characterizes the different types of definitions. Numerous experiments have led to the conclusion that the punctuation definitions are easily described using a set of rules. These rules can be easily adapted to the relevant context and translated into other languages. However, in order to deal with the other two definitions types, the exclusive use of rules is not enough to get good performance and it asks for more advanced methods, in particular a machine learning based approach. Unlike other similar systems, which were built having in mind a specific corpus or a specific domain, the one reported here is meant to obtain good results regardless the domain or context. All the decisions made in the construction of the definition extractor take into consideration this central objective.Este trabalho de doutoramento visa proporcionar um conjunto de métodos e heurísticas para a construção de um extractor de definição ou para melhorar o desempenho de um sistema já existente, quando usado com um corpus específico. A fim de desenvolver e testar a arquitectura, um extractor de definic˛ões genérico para a língua Portuguesa foi construído. Além disso, os métodos foram testados na construção de um extractor para um idioma diferente do Português, nomeadamente Inglês, algumas heurísticas também foram testadas com uma terceira língua, ou seja o Holandês. A abordagem apresentada neste trabalho torna o extractor proposto neste trabalho completamente diferente em comparação com os outros trabalhos na área. É um fato que a maioria dos sistemas de extracção automática de definicões foram construídos tendo em conta um corpus específico com um tema bem determinado e são baseados na construc˛ão manual de um conjunto de regras ou padrões capazes de identificar uma definição num texto dum domínio específico. Esta pesquisa centrou-se em três tipos de definições, caracterizadas pela ligacão entre o termo definido e a sua descrição. A estratégia adoptada pode ser vista como uma abordagem "dividir para conquistar". Diferentemente de outras pesquisa nesta área, foram desenvolvidas heurísticas específicas a fim de lidar com as diferentes tipologias de definições, ou seja, cópula, verbais e definicões de pontuação. No presente trabalho propõe-se utilizar uma metodologia diferente para cada tipo de definição, ou seja, propomos a utilização de métodos baseados em regras para extrair as definições de pontuação, aprendizagem automática, com algoritmos de amostragem para definições cópula e aprendizagem automática com um método para aumentar automáticamente o número de exemplos positivos para a definição verbal. Esta arquitetura é justificada pela complexidade linguística crescente que caracteriza os diferentes tipos de definições. Numerosas experiências levaram à conclusão de que as definições de pontuação são facilmente descritas utilizando um conjunto de regras. Essas regras podem ser facilmente adaptadas ao contexto relevante e traduzido para outras línguas. No entanto, a fim de lidar com os outros dois tipos de definições, o uso exclusivo de regras não é suficiente para obter um bom desempenho e é preciso usar métodos mais avançados, em particular aqueles baseados em aprendizado de máquina. Ao contrário de outros sistemas semelhantes, que foram construídos tendo em mente um corpus ou um domínio específico, o sistema aqui apresentado foi desenvolvido de maneira a obter bons resultados, independentemente do domínio ou da língua. Todas as decisões tomadas na construção do extractor de definição tiveram em consideração este objectivo central.Fundação para a Ciência e a Tecnologia (FCT, SFRH/ BD/36732/2007

    Pattern recognition using genetic programming for classification of diabetes and modulation data

    Get PDF
    The field of science whose goal is to assign each input object to one of the given set of categories is called pattern recognition. A standard pattern recognition system can be divided into two main components, feature extraction and pattern classification. During the process of feature extraction, the information relevant to the problem is extracted from raw data, prepared as features and passed to a classifier for assignment of a label. Generally, the extracted feature vector has fairly large number of dimensions, from the order of hundreds to thousands, increasing the computational complexity significantly. Feature generation is introduced to handle this problem which filters out the unwanted features. The functionality of feature generation has become very important in modern pattern recognition systems as it not only reduces the dimensions of the data but also increases the classification accuracy. A genetic programming (GP) based framework has been utilised in this thesis for feature generation. GP is a process based on the biological evolution of features in which combination of original features are evolved. The stronger features propagate in this evolution while weaker features are discarded. The process of evolution is optimised in a way to improve the discriminatory power of features in every new generation. The final features generated have more discriminatory power than the original features, making the job of classifier easier. One of the main problems in GP is a tendency towards suboptimal-convergence. In this thesis, the response of features for each input instance which gives insight into strengths and weaknesses of features is used to avoid suboptimal-convergence. The strengths and weaknesses are utilised to find the right partners during crossover operation which not only helps to avoid suboptimal-convergence but also makes the evolution more effective. In order to thoroughly examine the capabilities of GP for feature generation and to cover different scenarios, different combinations of GP are designed. Each combination of GP differs in the way, the capability of the features to solve the problem (the fitness function) is evaluated. In this research Fisher criterion, Support Vector Machine and Artificial Neural Network have been used to evaluate the fitness function for binary classification problems while K-nearest neighbour classifier has been used for fitness evaluation of multi-class classification problems. Two Real world classification problems (diabetes detection and modulation classification) are used to evaluate the performance of GP for feature generation. These two problems belong to two different categories; diabetes detection is a binary classification problem while modulation classification is a multi-class classification problem. The application of GP for both the problems helps to evaluate the performance of GP for both categories. A series of experiments are conducted to evaluate and compare the results obtained using GP. The results demonstrate the superiority of GP generated features compared to features generated by conventional methods
    corecore