315 research outputs found
BAGGING BASED ENSEMBLE ANALYSIS IN HANDLING UNBALANCED DATA ON CLASSIFICATION MODELING
The purpose of this study is to Identify the algorithm of each method of handling the unbalanced class based on bagging based on the literature review. This study uses a bagging based ensemble method such as UnderBagging, OverBagging, UnderOverBagging, SMOTEBagging, Roughly Balanced Bagging and the last one is the Bagging Ensemble Variation. The data used is coded from the UCI Repository with 16 data, eight of which have class categories with low imbalance problems, and the rest are categorized as high imbalance problems. The number of classes used in this study amounted to two classes. The class with a small number is made into the minority class and the rest is made up as the majority class. The result of this research is the bagging based method gives better results when compared to classical methods such as the classification tree
Survey on highly imbalanced multi-class data
Machine learning technology has a massive impact on society because it offers solutions to solve many complicated problems like classification, clustering analysis, and predictions, especially during the COVID-19 pandemic. Data distribution in machine learning has been an essential aspect in providing unbiased solutions. From the earliest literatures published on highly imbalanced data until recently, machine learning research has focused mostly on binary classification data problems. Research on highly imbalanced multi-class data is still greatly unexplored when the need for better analysis and predictions in handling Big Data is required. This study focuses on reviews related to the models or techniques in handling highly imbalanced multi-class data, along with their strengths and weaknesses and related domains. Furthermore, the paper uses the statistical method to explore a case study with a severely imbalanced dataset. This article aims to (1) understand the trend of highly imbalanced multi-class data through analysis of related literatures; (2) analyze the previous and current methods of handling highly imbalanced multi-class data; (3) construct a framework of highly imbalanced multi-class data. The chosen highly imbalanced multi-class dataset analysis will also be performed and adapted to the current methods or techniques in machine learning, followed by discussions on open challenges and the future direction of highly imbalanced multi-class data. Finally, for highly imbalanced multi-class data, this paper presents a novel framework. We hope this research can provide insights on the potential development of better methods or techniques to handle and manipulate highly imbalanced multi-class data
A Feasibility Study of Azure Machine Learning for Sheet Metal Fabrication
The research demonstrated that sheet metal fabrication machines can utilize machine learning to gain competitive advantage. With various possible applications of machine learning, it was decided to focus on the topic of predictive maintenance. Implementation of the predictive service is accomplished with Microsoft Azure Machine Learning. The aim was to demonstrate to the stakeholders at the case company potential laying in machine learning. It was found that besides machine learning technologies being founded on sophisticated algorithms and mathematics it can still be utilized and bring benefits with moderate effort required. Significance of this study is in it demonstrating potentials of the machine learning to be used in improving operations management and especially for sheet metal fabrication machines.fi=Opinnäytetyö kokotekstinä PDF-muodossa.|en=Thesis fulltext in PDF format.|sv=Lärdomsprov tillgängligt som fulltext i PDF-format
Proceedings of the 18th Irish Conference on Artificial Intelligence and Cognitive Science
These proceedings contain the papers that were accepted for publication at AICS-2007, the 18th Annual Conference on Artificial Intelligence and Cognitive Science, which was held in the Technological University Dublin; Dublin, Ireland; on the 29th to the 31st August 2007. AICS is the annual conference of the Artificial Intelligence Association of Ireland (AIAI)
Automatic extraction of definitions
Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2014This doctoral research work provides a set of methods and heuristics for
building a definition extractor or for fine-tuning an existing one. In order
to develop and test the architecture, a generic definitions extractor for the
Portuguese language is built. Furthermore, the methods were tested in the
construction of an extractor for two languages different from Portuguese,
which are English and, less extensively, Dutch. The approach presented
in this work makes the proposed extractor completely different in nature
in comparison to the other works in the field. It is a matter of fact that
most systems that automatically extract definitions have been constructed
taking into account a specific corpus on a specific topic, and are based on
the manual construction of a set of rules or patterns capable of identifyinf
a definition in a text.
This research focused on three types of definitions, characterized by the connector
between the defined term and its description. The strategy adopted
can be seen as a "divide and conquer"approach. Differently from the other
works representing the state of the art, specific heuristics were developed in
order to deal with different types of definitions, namely copula, verbal and
punctuation definitions.
We used different methodology for each type of definition, namely we propose
to use rule-based methods to extract punctuation definitions, machine
learning with sampling algorithms for copula definitions, and machine learning
with a method to increase the number of positive examples for verbal
definitions. This architecture is justified by the increasing linguistic complexity
that characterizes the different types of definitions. Numerous experiments
have led to the conclusion that the punctuation definitions are
easily described using a set of rules. These rules can be easily adapted to
the relevant context and translated into other languages. However, in order
to deal with the other two definitions types, the exclusive use of rules is not
enough to get good performance and it asks for more advanced methods, in
particular a machine learning based approach.
Unlike other similar systems, which were built having in mind a specific
corpus or a specific domain, the one reported here is meant to obtain good
results regardless the domain or context. All the decisions made in the
construction of the definition extractor take into consideration this central
objective.Este trabalho de doutoramento visa proporcionar um conjunto de métodos
e heurísticas para a construção de um extractor de definição ou para melhorar
o desempenho de um sistema já existente, quando usado com um corpus
específico. A fim de desenvolver e testar a arquitectura, um extractor de
definic˛ões genérico para a língua Portuguesa foi construído. Além disso,
os métodos foram testados na construção de um extractor para um idioma
diferente do Português, nomeadamente Inglês, algumas heurísticas também
foram testadas com uma terceira língua, ou seja o Holandês. A abordagem
apresentada neste trabalho torna o extractor proposto neste trabalho completamente
diferente em comparação com os outros trabalhos na área. É
um fato que a maioria dos sistemas de extracção automática de definicões
foram construídos tendo em conta um corpus específico com um tema bem
determinado e são baseados na construc˛ão manual de um conjunto de regras
ou padrões capazes de identificar uma definição num texto dum domínio
específico.
Esta pesquisa centrou-se em três tipos de definições, caracterizadas pela
ligacão entre o termo definido e a sua descrição. A estratégia adoptada pode
ser vista como uma abordagem "dividir para conquistar". Diferentemente
de outras pesquisa nesta área, foram desenvolvidas heurísticas específicas
a fim de lidar com as diferentes tipologias de definições, ou seja, cópula,
verbais e definicões de pontuação.
No presente trabalho propõe-se utilizar uma metodologia diferente para cada
tipo de definição, ou seja, propomos a utilização de métodos baseados em
regras para extrair as definições de pontuação, aprendizagem automática,
com algoritmos de amostragem para definições cópula e aprendizagem automática
com um método para aumentar automáticamente o número de
exemplos positivos para a definição verbal. Esta arquitetura é justificada
pela complexidade linguística crescente que caracteriza os diferentes tipos de
definições. Numerosas experiências levaram à conclusão de que as definições
de pontuação são facilmente descritas utilizando um conjunto de regras. Essas
regras podem ser facilmente adaptadas ao contexto relevante e traduzido
para outras línguas. No entanto, a fim de lidar com os outros dois tipos de
definições, o uso exclusivo de regras não é suficiente para obter um bom
desempenho e é preciso usar métodos mais avançados, em particular aqueles
baseados em aprendizado de máquina.
Ao contrário de outros sistemas semelhantes, que foram construídos tendo
em mente um corpus ou um domínio específico, o sistema aqui apresentado
foi desenvolvido de maneira a obter bons resultados, independentemente do
domínio ou da língua. Todas as decisões tomadas na construção do extractor
de definição tiveram em consideração este objectivo central.Fundação para a Ciência e a Tecnologia (FCT, SFRH/ BD/36732/2007
Pattern recognition using genetic programming for classification of diabetes and modulation data
The field of science whose goal is to assign each input object to one of the given set of categories is called pattern recognition. A standard pattern recognition system can be divided into two main components, feature extraction and pattern classification. During the process of feature extraction, the information relevant to the problem is extracted from raw data, prepared as features and passed to a classifier for assignment of a label. Generally, the extracted feature vector has fairly large number of dimensions, from the order of hundreds to thousands, increasing the computational complexity significantly. Feature generation is introduced to handle this problem which filters out the unwanted features. The functionality of feature generation has become very important in modern pattern recognition systems as it not only reduces the dimensions of the data but also increases the classification accuracy. A genetic programming (GP) based framework has been utilised in this thesis for feature generation. GP is a process based on the biological evolution of features in which combination of original features are evolved. The stronger features propagate in this evolution while weaker features are discarded. The process of evolution is optimised in a way to improve the discriminatory power of features in every new generation. The final features generated have more discriminatory power than the original features, making the job of classifier easier. One of the main problems in GP is a tendency towards suboptimal-convergence. In this thesis, the response of features for each input instance which gives insight into strengths and weaknesses of features is used to avoid suboptimal-convergence. The strengths and weaknesses are utilised to find the right partners during crossover operation which not only helps to avoid suboptimal-convergence but also makes the evolution more effective. In order to thoroughly examine the capabilities of GP for feature generation and to cover different scenarios, different combinations of GP are designed. Each combination of GP differs in the way, the capability of the features to solve the problem (the fitness function) is evaluated. In this research Fisher criterion, Support Vector Machine and Artificial Neural Network have been used to evaluate the fitness function for binary classification problems while K-nearest neighbour classifier has been used for fitness evaluation of multi-class classification problems. Two Real world classification problems (diabetes detection and modulation classification) are used to evaluate the performance of GP for feature generation. These two problems belong to two different categories; diabetes detection is a binary classification problem while modulation classification is a multi-class classification problem. The application of GP for both the problems helps to evaluate the performance of GP for both categories. A series of experiments are conducted to evaluate and compare the results obtained using GP. The results demonstrate the superiority of GP generated features compared to features generated by conventional methods
- …