6 research outputs found

    Text Mining for Big Data Analysis in Financial Sector: A Literature Review

    Get PDF
    Big data technologies have a strong impact on different industries, starting from the last decade, which continues nowadays, with the tendency to become omnipresent. The financial sector, as most of the other sectors, concentrated their operating activities mostly on structured data investigation. However, with the support of big data technologies, information stored in diverse sources of semi-structured and unstructured data could be harvested. Recent research and practice indicate that such information can be interesting for the decision-making process. Questions about how and to what extent research on data mining in the financial sector has developed and which tools are used for these purposes remains largely unexplored. This study aims to answer three research questions: (i) What is the intellectual core of the field? (ii) Which techniques are used in the financial sector for textual mining, especially in the era of the Internet, big data, and social media? (iii) Which data sources are the most often used for text mining in the financial sector, and for which purposes? In order to answer these questions, a qualitative analysis of literature is carried out using a systematic literature review, citation and co-citation analysis

    Análise comparativa entre aprendizado supervisionado e aprendizado por transferência aplicados a análise de sentimentos em textos

    Get PDF
    Trabalho de Conclusão de Curso (graduação)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2020.Análise de sentimentos em textos é uma tarefa muito estudada no meio acadêmico e cada vez mais novos trabalhos estão sendo publicados. Empresas que pretendem adotar este tipo de tecnologia para seu negócio precisam conhecer e decidir qual a melhor estratégia de implementação de acordo com o cenário específico em que ela se encontra. Assim, este trabalho mostra que técnicas mais clássicas e simples podem ser a melhor escolha, principalmente quando recursos computacionais são limitados ou pequenos ganhos em performance não são tão críticos. Para isso, é conduzida uma análise comparativa de duas abordagens para o problema de análise de sentimentos: a abordagem clássica, onde utiliza-se algoritmos supervisionados para classificação (positivo ou negativo), como SVM e Naïve Bayes, e a abordagem moderna, onde utiliza-se métodos de aprendizado por transferência. Experimentos foram realizados para validar a ideia proposta, onde foram aplicados um conjunto de algoritmos clássicos e um conjunto de técnicas de aprendizado por transferência em quatro bases de dados focadas na tarefa de análise de sentimentos. A análise comparativa obteve resultados relevantes, a qual mostrou que os algoritmos clássicos são competitivos mesmo obtendo acurácia mais baixa que os modernos e que de fato podem ser uma boa alternativa para cenários limitados.Sentiment analysis is widely studied and it has many papers published nowadays. Com panies that seek to use this kind of technology in their business have to understand and decide which approach is the best for their specific scenario. Here, we show that clas sic approaches can be the best choice when limited hardware resource is available or small performance improvements are not critical. To show these issues, we conduct a comparative analysis between two sentiment analysis approaches. The classic approach, where supervised learning algorithms are used to classify the sentiment (positive or neg ative) as Support Vector Machine and Naive Bayes, and the new approaches, where the state-of-the-art transfer learning method is used. We perform a series of experiments to validate the proposed idea, and we create two groups of algorithms with four different data sets focused on sentiment analysis tasks, one with supervised-learning-based algorithms and another with transfer-learning-based models. The analysis achieved relevant results, showing that classic algorithms are competitive even with lower accuracy than the new algorithms and that they can be a suitable alternative in limited scenarios

    Analysis of Document Pre-Processing Effects in Text and Opinion Mining

    No full text
    Typically, textual information is available as unstructured data, which require processing so that data mining algorithms can handle such data; this processing is known as the pre-processing step in the overall text mining process. This paper aims at analyzing the strong impact that the pre-processing step has on most mining tasks. Therefore, we propose a methodology to vary distinct combinations of pre-processing steps and to analyze which pre-processing combination allows high precision. In order to show different combinations of pre-processing methods, experiments were performed by comparing some combinations such as stemming, term weighting, term elimination based on low frequency cut and stop words elimination. These combinations were applied in text and opinion mining tasks, from which correct classification rates were computed to highlight the strong impact of the pre-processing combinations. Additionally, we provide graphical representations from each pre-processing combination to show how visual approaches are useful to show the processing effects on document similarities and group formation (i.e., cohesion and separation)

    Analysis of Document Pre-Processing Effects in Text and Opinion Mining

    No full text
    Typically, textual information is available as unstructured data, which require processing so that data mining algorithms can handle such data; this processing is known as the pre-processing step in the overall text mining process. This paper aims at analyzing the strong impact that the pre-processing step has on most mining tasks. Therefore, we propose a methodology to vary distinct combinations of pre-processing steps and to analyze which pre-processing combination allows high precision. In order to show different combinations of pre-processing methods, experiments were performed by comparing some combinations such as stemming, term weighting, term elimination based on low frequency cut and stop words elimination. These combinations were applied in text and opinion mining tasks, from which correct classification rates were computed to highlight the strong impact of the pre-processing combinations. Additionally, we provide graphical representations from each pre-processing combination to show how visual approaches are useful to show the processing effects on document similarities and group formation (i.e., cohesion and separation)