    A Text Mining Based Approach for Mining Customer Attribute Data on Undefined Quality Problem

    Understanding how the consumer perceives quality is a key issue in supply chain management. However, as the market structure continues to deepen, traditional evaluation methods using SEVRQUAL are unable identify all issues related to customer quality and unable to supply solutions. The maturation of data mining technology, however, has opened the possibilities of mining customer attribute data on quality problems from unstructured data. Based on the consumer perspective, this research uses an unsupervised machine learning text mining approach and the Recursive Neural Tensor Network to resolve the attribution process for undefined quality problems. It was found that the consumer quality perception system has a typical line-of-sight that can assist consumers quickly capture the logical structure of the quality problem. Although attributions related to quality problems are very scattered, a highly unified view was found to exist within each group, and a strategy to solve the undefined quality problem was agreed through group consensus by 61% of the consumers

    Information System Articulation Development - Managing Veracity Attributes and Quantifying Relationship with Readability of Textual Data

    Often the textual data are either disorganized or misinterpreted because of unstructured Big Data in multiple dimensions. Managing readable textual alphanumeric data and its analytics is challenging. In spatial dimensions, the facts can be ambiguous and inconsistent, posing interpretation and new knowledge discovery challenges. The information can be wordy, erratic, and noisy. The research aims to assimilate the data characteristics through Information System (IS) artefacts that are appropriate to data analytics, especially in application domains that involve big data sources. Data heterogeneity and multidimensionality can make and preclude IS-guided veracity models in the data integration process, including customer analytics services. The veracity of big data thus can impact visualization and value, including knowledge enhancement in the vast amount of textual data qualitatively. The manner the veracity features construed in each schematic, semantic and syntactic attribute dimension in several IS artefacts and relevant documents can enhance the readability of textual data robustly

    Dataperusteinen palaute eTerveyspalveluiden sisällöntuotantoon

    Web analytics has proven significant potential for constantly improving the provided web-based services and applications. By analyzing interaction data collected from web applications, it is possible to study how the applications are used in detail. The focus of this study is to analyze if interaction data collected with Piwik PRO web analytics platform using JavaScript tagging can provide sufficient detail about user behaviour and interaction in a modern single-page web application. Furthermore, the analysis seeks to answer if the collected data can be refined in a way that will help the content managers of the web application to continuously improve the content and to spot dysfunctional content. The research is based on Omapolku, a Finnish public e-health service providing digital services for personalized healthcare. In this study, the analysis focuses on evaluating digital treatment pathways in Omapolku, which provides various types of information and utilities designed for the needs of specific patient groups. The evaluation is based on the graphical user interface of a treatment pathway view by analyzing a sample dataset consisting of actions performed by the users. The data is analyzed with general web analytics metrics and by applying statistical analyses of web usage mining. The results show that the interaction data can provide necessary detail for evaluating general usage metrics and basic usage patterns. However, the results show that the data does not provide necessary information for identifying most actions performed by the users, which makes it practically impossible to link the data to the front-end components of the user interface. As an outcome of this study, it is recommended that additional identifiers are added to the front-end components of the treatment path interface and that the JavaScript tagging script is modified to record the corresponding identifiers and the action context. In addition, a novel prototype was designed as a solution to the identified challenges and to support the work of the content managers.Web-analytiikka on osoittanut nykypäivänä potentiaalinsa osana web-pohjaisten sovellusten jatkuvaa kehitystä. Web-sovelluksista kerätyn interaktiodatan analysointi mahdollistaa sen, että sovellusten käyttöä voidaan tutkia yksityiskohtaisesti. Tämä työ keskittyy analysoimaan, mikäli Piwik PRO analytiikkapalvelun JavaScript seurantakoodilla kerätty interaktiodata tarjoaa riittäviä yksityiskohtia käyttäjien käyttäytymisestä ja interaktiosta yksisivuisessa web-sovelluksessa. Tämän lisäksi työ keskittyy tutkimaan, mikäli kerättyä dataa voidaan jalostaa siten, että sitä voi hyödyntää toimintahäiriöisten sisältöjen paikantamiseen sekä sisällön jatkuvaan kehittämiseen. Tutkimus perustuu Omapolku-sovellukseen, joka on julkinen suomalainen eTerveyspalvelu. Omapolku tarjoaa digitaalisia palveluita henkilökohtaiseen terveydenhuoltoon. Tässä työssä analyysi perustuu Omapolun digitaalisien hoitopolkujen toimivuuden arvioimiseen. Digitaaliset hoitopolut tarjoavat monipuolista tietoa sekä työkaluja, jotka on suunniteltu potilasryhmäkohtaisesti tietyn hoitotarpeen mukaisesti. Hoitopolkujen toimivuuden arvointi toteutetaan tutkimalla digihoitopolkujen graafisesta käyttöliittymästä kerättyä interaktiodataa. Kerättyä dataa analysoidaan yleisillä web-analytiikan mittareilla sekä tilastollisilla web-tiedonlouhinnan menetelmillä. Työn tulokset osoittavat, että interaktiodata voi tarjota tarpeellista tietoa yleisten mittareiden laskemiseksi sekä yksinkertaisten käyttäytymismallien selvittämiseksi. Tulokset myös osoittavat, että data ei tarjoa tietoa yksityiskohtaisten tapahtumien alkuperän selvittämiseksi käyttöliittymässä. Työn tuloksena suositellaan, että digihoitopolkujen käyttöliittymän komponentteihin lisätään lisätunnisteita ja että JavaScript seurantakoodia muokataan siten, että tapahtuman konteksti ja siihen liittyvä komponenttitunniste tallennetaan tapahtumaan. Tämän lisäksi työssä esitetään prototyyppi ratkaisuna havaittuihin haasteisiin sekä tukemaan sisällöntuottajien työtä

    Recomendation systems and crowdsourcing: a good wedding for enabling innovation? Results from technology affordances and costraints theory

    Recommendation Systems have come a long way since their first appearance in the e-commerce platforms.Since then, evolved Recommendation Systems have been successfully integrated in social networks. Now its time to test their usability and replicate their success in exciting new areas of web -enabled phenomena. One of these is crowdsourcing. Research in the IS field is investigating the need, benefits and challenges of linking the two phenomena. At the moment, empirical works have only highlighted the need to implement these techniques for tasks assignment in crowdsourcing distributed work platforms and the derived benefits for contributors and firms. We review the variety of the tasks that can be crowdsourced through these platforms and theoretically evaluate the efficiency of using RS to recommend a task in creative crowdsourcing platforms. Adopting a Technology Affordances and Constraints Theory, an emerging perspective in the Information Systems (IS) literature to understand technology use and consequences, we anticipate the tensions that this implementation can generate

    Implementation of the C4.5 algorithm for micro, small, and medium enterprises classification

    The coronavirus disease-19 (COVID-19) pandemic has spread to various countries including Indonesia. Thus, implementing large-scale social restrictions (Bahasa: Pembatasan Sosial Berskala Besar (PSBB)) has resulted in the paralysis of the economy in Indonesia. including micro, small, and medium enterprises (MSMEs) have decreased turnover and even went out of business. The Department of Cooperatives and Small and Medium Enterprises (SMEs) in Pesawaran Regency, Lampung, oversees 3,808 MSMEs, whose development should be monitored as a basis for determining policies. However, there are problems in classifying MSMEs according to their categories because they have to check the existing data one by one, so it takes a long time. Therefore, this study proposed the C4.5 algorithm to solve this problem. In addition, this research compared with the naïve Bayes algorithm to find out which algorithm had a good performance and is suitable for this case. The results showed that 91% of MSMEs were included in the micro category, 8% was in a small category, and 1% was in the medium category. Based on the results, it explained that the C4.5 algorithm was bigger than naïve Bayes with a difference in the value of 3.79%. It had an accuracy value of 99.2%. Meanwhile, naive Bayes was 95.41%

    Predicting customer satisfaction with product reviews: A comparitive study of some machine learning approaches.

    In past two decades e-commerce platform developed exponentially, and with this advent, there came several challenges due to a vast amount of information. Customers not only buy products online but also get valuable information about a product they intend to buy through an online platform. Customers share their experiences by providing feedback which creates a pool of textual information and this process continuously generates data every day. The information provided by customers contains both subjective and objective text that contains a rich information regarding behaviour, liking and disliking towards a product and sentiments of customers. Moreover, this information can be helpful for the customers who are yet to buy or who are yet in decision making process. This thesis studies comparison of four supervised machine learning approaches to predict customer satisfaction. These approaches are: Naïve Bayes, Support Vector Machines (SVM), Logistic Regression (LR), and Decision Tree (DT). The models use term frequency inverse document frequency (TF-IDF) vectorization for training and testing sets of data. The models are applied after basic pre-processing of text data that includes the lower casing, lemmatization, the stop words removal, smileys removal, and digits removal. We compare the performance of models using accuracy, precision, recall, and F1-scores. Support Vector Machines (SVM) outperforms the rest of the models with the accuracy rate 83% while Naïve Bayes, Logistic Regression (LR) and Decision Tree (DT) have accuracy rate 82%, 78%, and 76%, respectively. Moreover, we evaluate the performance of classifiers using confusion matrix