8 research outputs found

    Käyttäjätyytyväisyyden ennustaminen tekstimuotoisista arvosteluista

    Get PDF
    Tässä työssä hyödynnetään piirteenirrotus- ja luokittelumenetelmiä ja testataan niiden toimivuutta käyttäjätyytyväisyyden arviointiin tekstimuotoisesta aineistosta. Piirteiden irrottamiseen tekstistä käytetään TF-IDF-algoritmia, jonka antamat piirteet syötetään vertailtaville koneoppimismenetelmille. Käytettävät koneoppimismenetelmät ovat satunnainen metsä ja tukivektorikone, josta käytetään lineaarista ja radiaalista kerneliä käyttäviä toteutuksia. Koneoppimismenetelmistä vertaillaan sekä luokitteluun että regressioon perustuvia versioita menetelmistä. Valitut menetelmät ovat alan julkaisujen perusteella yleisesti käytössä tekstin merkityksen ja sävyn analysointiin liittyvissä ongelmissa. Algoritmien esittelyn lisäksi käydään läpi aineiston käsittelystä alkaen koko aiheeseen liittyvä koneoppimisprosessi. Työssä esitellään algoritmien testauksen tulokset ja arvioidaan niiden pohjalta käytettyjen menetelmien soveltuvuutta käyttäjätyytyväisyyden ennustamiseen tekstimuotoisten arvostelujen pohjalta

    How to Conduct Rigorous Supervised Machine Learning in Information Systems Research: The Supervised Machine Learning Reportcard [in press]

    Get PDF
    Within the last decade, the application of supervised machine learning (SML) has become increasingly popular in the field of information systems (IS) research. Although the choices among different data preprocessing techniques, as well as different algorithms and their individual implementations, are fundamental building blocks of SML results, their documentation—and therefore reproducibility—is inconsistent across published IS research papers. This may be quite understandable, since the goals and motivations for SML applications vary and since the field has been rapidly evolving within IS. For the IS research community, however, this poses a big challenge, because even with full access to the data neither a complete evaluation of the SML approaches nor a replication of the research results is possible. Therefore, this article aims to provide the IS community with guidelines for comprehensively and rigorously conducting, as well as documenting, SML research: First, we review the literature concerning steps and SML process frameworks to extract relevant problem characteristics and relevant choices to be made in the application of SML. Second, we integrate these into a comprehensive “Supervised Machine Learning Reportcard (SMLR)” as an artifact to be used in future SML endeavors. Third, we apply this reportcard to a set of 121 relevant articles published in renowned IS outlets between 2010 and 2018 and demonstrate how and where the documentation of current IS research articles can be improved. Thus, this work should contribute to a more complete and rigorous application and documentation of SML approaches, thereby enabling a deeper evaluation and reproducibility / replication of results in IS research

    A proposição de um framework de Data Analytics para o estudo do desempenho da inovação

    Get PDF
    O objetivo deste estudo é propor um framework de data analytics para classificar setores econômicos em níveis de inovação – em uma escala que vai de altamente a pouco inovadores, a partir de uma base de dados com indicadores de inovação. O problema consiste em entender como se comporta o desempenho de inovação nesses setores, dado o número de empresas inovadoras que contêm e características que apresentam, e é formulado como um problema de classificação. O framework combina métodos para normalização da base, determinação do número de classes (níveis de inovação) encontrados nos dados, tratamento de classes desbalanceadas, seleção de variáveis (indicadores de inovação dos setores), classificação e estimação do desempenho da inovação (empresas que inovam no setor em relação ao total da amostra). Para isso, diferentes abordagens são experimentadas. Os modelos Random Forest, Extreme Gradient Boosting e Support Vector Machine são utilizados nas etapas de classificação das observações, seleção de variáveis e estimação da variável de saída. Na determinação do número de classes, são experimentadas abordagens gerencial e de quartis. Técnicas de Synthetic Minority Oversampling Technique são testadas para o balanceamento de amostras nas classes. A abordagem analítica no estudo dos dados de inovação das empresas auxilia na compreensão dos fatores que influenciam o desempenho da inovação dos setores e apoiará a tomada de decisão acerca de ações de fomento.The aim of this study is to propose an analytics framework to classify sectors at levels of innovation - on a scale from highly to less innovative, given a database with innovation indicators for economic sectors. The problem is to understand how innovation performance behaves in these sectors, given the number of innovative companies they contain and the characteristics they present, and it is formulated as a classification problem. The framework combines methods for data normalization, determination of the number of classes (levels of innovation), deal with imbalanced classes, feature selection (innovation indicators), classification and estimation (companies that innovate in the sector in relation to the total sample). For this, different approaches are tested. The Random Forest, Extreme Gradient Boosting and Support Vector Machine models are used in the observation classification, feature selection and output estimation steps. To determine the number of classes, managerial and quartile approaches are experimented. Synthetic Minority Oversampling Techniques are tested for balancing classes. The analytical approach in the study of companies innovation data helps to understand which factors that affect the sectors innovation performance and support decision making about fostering actions

    Gestão de projeto e produção de uma ponte ferroviária em estrutura metálica

    Get PDF
    A gestão de projeto é, atualmente, uma forma muito eficaz de organizar todas as etapas necessárias à realização de um trabalho orientado para os melhores resultados possíveis. Uma das etapas que pode ser considerada parte integrante da gestão de projeto é a gestão de produção, que gere os recursos iniciais de uma empresa transformando-os em produtos e/ou serviços de valor acrescentado. Hoje em dia, são cada vez mais procurados produtos ou serviços com o melhor rácio entre o preço e a qualidade. Esta procura constante obriga a que as empresas adaptem ou encontrem novas soluções para as suas estratégias de gestão de projeto, de forma a tornar a sua produção cada vez mais eficiente sem esquecer o fator da qualidade. O trabalho desenvolvido pretende assegurar a gestão do projeto e de produção de uma ponte ferroviária em estrutura metálica. Para isso, foram estabelecidos alguns pontos que deveriam ser cumpridos, como por exemplo a realização de um plano de trabalhos que estabelece a ordem cronológica de execução de cada tarefa, a análise da estrutura em termos de fabrico para a posterior otimização, realização do modelo 3D para a obtenção dos desenhos de fabrico, realização do plano de fabrico de forma a organizar as operações no interior da oficina, assegurar o controlo de qualidade, estudar as rotas e os meios de transporte necessários e definir o plano de montagem da estrutura em obra. O balanço final do trabalho desenvolvido permitiu perceber em que fases é necessária a intervenção para a correção de erros e possibilitar a melhoria dos processos desenvolvidos pela empresa em trabalhos futuros.Project management is currently a very effective way of organizing all the necessary steps to perform a work oriented in the direction of the best possible results. One of the steps that can be considered an integral part of project management is production management, which manages the initial resources of a company transforming them into value added products and/or services. Nowadays, products or services are increasingly pursued with the best ratio between cost and quality. This constant demand requires companies to adapt or find new solutions to their project management strategies, in order to make their production increasingly efficient without forgetting the quality factor. The developed work intends to ensure the project management and production of a railway bridge in metal structure. For this, some points that should be fulfilled were established, for example, the realization of a work plan that establishes the chronological order of execution of each task, the analysis of the structure in terms of manufacture for the subsequent optimization, realization of the 3D model to obtain the manufacturing drawings, realization of the manufacturing plan in order to organize the operations inside the workshop, ensure quality control, study the necessary routes and means of transport and define the plan of assembly of the structure in construction area. The final balance of the developed work allowed to see in which phases it is necessary the intervention to correct errors and allow the improvement of the processes developed by the company in future works

    Performance modelling for scalable deep learning

    Get PDF
    Performance modelling for scalable deep learning is very important to quantify the efficiency of large parallel workloads. Performance models are used to obtain run-time estimates by modelling various aspects of an application on a target system. Designing performance models requires comprehensive analysis in order to build accurate models. Limitations of current performance models include poor explainability in the computation time of the internal processes of a neural network model and limited applicability to particular architectures. Existing performance models in deep learning have been proposed, which are broadly categorized into two methodologies: analytical modelling and empirical modelling. Analytical modelling utilizes a transparent approach that involves converting the internal mechanisms of the model or applications into a mathematical model that corresponds to the goals of the system. Empirical modelling predicts outcomes based on observation and experimentation, characterizes algorithm performance using sample data, and is a good alternative to analytical modelling. However, both these approaches have limitations, such as poor explainability in the computation time of the internal processes of a neural network model and poor generalisation. To address these issues, hybridization of the analytical and empirical approaches has been applied, leading to the development of a novel generic performance model that provides a general expression of a deep neural network framework in a distributed environment, allowing for accurate performance analysis and prediction. The contributions can be summarized as follows: In the initial study, a comprehensive literature review led to the development of a performance model based on synchronous stochastic gradient descent (S-SGD) for analysing the execution time performance of deep learning frameworks in a multi-GPU environment. This model’s evaluation involved three deep learning models (Convolutional Neural Networks (CNN), Autoencoder (AE), and Multilayer Perceptron (MLP)), implemented in three popular deep learning frameworks (MXNet, Chainer, and TensorFlow) respectively, with a focus on following an analytical approach. Additionally, a generic expression for the performance model was formulated, considering intrinsic parameters and extrinsic scaling factors that impact computing time in a distributed environment. This formulation involved a global optimization problem with a cost function dependent on unknown constants within the generic expression. Differential evolution was utilized to identify the best fitting values, matching experimentally determined computation times. Furthermore, to enhance the accuracy and stability of the performance model, regularization techniques were applied. Lastly, the proposed generic performance model underwent experimental evaluation in a real-world application. The results of this evaluation provided valuable insights into the influence of hyperparameters on performance, demonstrating the robustness and applicability of the performance model in understanding and optimizing model behavior
    corecore