3 research outputs found

    The ensemble distance on model-based clustering for regions clustering based on rainfall: The case of rainfall in West Java Indonesia

    Get PDF
    Time series data clusters are being researched thoroughly. The distance metric drives the development of the clustering time series. The ARIMA model is one of the models that can be employed in model-based clustering, although differing model selection criteria can lead to uncertainty in the model. In this investigation, we created a technique for ensemble distance-based time series data clustering. To express the distance between two series, five distances based on the five model selection criteria are utilized. The average of the five distances reflects the distance of two time series data. According to the simulation results, the ensemble distance method could boost clustering accuracy by more than 11%. Based on the pattern of rainfall levels, we applied our methods to find clusters of locations in the Province of West Java (Indonesia). The findings indicate that the rainfall pattern in the same cluster is similar. The cluster model is effective and feasible for representing individual models in a cluster

    The Combination of Contextualized Topic Model and MPNet for User Feedback Topic Modeling

    No full text
    In the era of big data and ubiquitous internet connectivity, user feedback data plays a crucial role in product development and improvement. However, extracting valuable insights from the vast pool of unstructured text data found in user feedback presents significant challenges. In this paper, we propose an innovative approach to tackle this challenge by combining the Contextualized Topic Model (CTM) and the Masked and Permuted Pre-training for Language Understanding (MPNet) model. Our approach aims to create a more accurate and context-aware topic model that enhances the understanding of user experiences and opinions. To achieve this, we first search for the optimal number of topics, focusing on generating distinguishable, general, and unique topics. Next, we perform hyperparameter optimization to fine-tune the model and maximize coherence metrics. The result is an exceptionally effective model that outperforms established topic modeling methods, including LSI, NMF, LDA, HDP, NeuralLDA, ProdLDA, ETM, and the default CTM, achieving the highest coherence CV score of 0.7091. In this study, the combination of CTM and MPNet has proven highly effective in the context of user feedback topic modeling. This model excels in generating coherent, distinguishable, and highly relevant user feedback topics, capturing the nuanced nature of user feedback data. The topics generated from this model include ‘Music and Audio Streaming,’ ’Application Performance,’ ‘Banking, Financial Services, and Customer Support,’ ’User Experience,’ ‘Other Topics,’ ’Application Content,’ and ‘Application Features.’ Our contributions include a powerful tool for developers to gain deeper insights, prioritize actions, and enhance user satisfaction by incorporating feedback into future product iterations. Furthermore, we introduce a new dataset as an open-source resource for further exploration and validation of user feedback analysis techniques and general natural language processing applications. With our proposed approach, we strive to drive business success, improve user experiences, and inform data-driven decision-making processes, ultimately benefiting both developers and users alike

    Lung and Infection CT-Scan-Based Segmentation with 3D UNet Architecture and Its Modification

    No full text
    COVID-19 is the disease that has spread over the world since December 2019. This disease has a negative impact on individuals, governments, and even the global economy, which has caused the WHO to declare COVID-19 as a PHEIC (Public Health Emergency of International Concern). Until now, there has been no medicine that can completely cure COVID-19. Therefore, to prevent the spread and reduce the negative impact of COVID-19, an accurate and fast test is needed. The use of chest radiography imaging technology, such as CXR and CT-scan, plays a significant role in the diagnosis of COVID-19. In this study, CT-scan segmentation will be carried out using the 3D version of the most recommended segmentation algorithm for bio-medical images, namely 3D UNet, and three other architectures from the 3D UNet modifications, namely 3D ResUNet, 3D VGGUNet, and 3D DenseUNet. These four architectures will be used in two cases of segmentation: binary-class segmentation, where each architecture will segment the lung area from a CT scan; and multi-class segmentation, where each architecture will segment the lung and infection area from a CT scan. Before entering the model, the dataset is preprocessed first by applying a minmax scaler to scale the pixel value to a range of zero to one, and the CLAHE method is also applied to eliminate intensity in homogeneity and noise from the data. Of the four models tested in this study, surprisingly, the original 3D UNet produced the most satisfactory results compared to the other three architectures, although it requires more iterations to obtain the maximum results. For the binary-class segmentation case, 3D UNet produced IoU scores, Dice scores, and accuracy of 94.32%, 97.05%, and 99.37%, respectively. For the case of multi-class segmentation, 3D UNet produced IoU scores, Dice scores, and accuracy of 81.58%, 88.61%, and 98.78%, respectively. The use of 3D segmentation architecture will be very helpful for medical personnel because, apart from helping the process of diagnosing someone with COVID-19, they can also find out the severity of the disease through 3D infection projections
    corecore