International Journal of Information Technology and Computer Science Applications
Not a member yet
    65 research outputs found

    Comparing Holt-Winters Variants Accuracy in Forecasting Indonesia LQ45 Stock Prices:

    Get PDF
    This study applies the Holt–Winters method, an exponential smoothing approach incorporating level, trend, and seasonal components, to compare the predictive accuracy of four variants (multiplicative, additive, OR, and average) of Holt-Winter Method in forecasting stock prices of companies listed in the LQ45 index. The dataset consists of stock prices from 2016–2021 for training and January–February 2022 for testing, with forecasting accuracy evaluated using Mean Absolute Percentage Error (MAPE), visualized through boxplots, and assessed using the nonparametric Kruskal–Wallis test. The Holt–Winters computations were performed using Microsoft Excel, while boxplot visualization and the Kruskal–Wallis test were conducted using the R programming language. The results indicate significant differences in predictive performance among the four methods with p-value = 0.04059 in Kruskal-Wallis test. The Additive Holt–Winters method achieves the best performance with the lowest MAPE, while the multiplicative method performs the worst. Among LQ45 stocks, INDF records the lowest forecasting error (1.6799%), whereas TPIA exhibits the highest (83.0783%). These results suggesting that the additive Holt–Winters method is more suitable for forecasting LQ45 stock prices under the observed condition

    Healthcare Data Integration Through Enterprise Data Warehousing: Architecture, Conformance Pipeline, and Experimental Validation for Readmission Analytics

    Get PDF
    Healthcare organizations operate a fragmented digital landscape in which hospital information systems (HIS), electronic health records (EHR), laboratory systems, billing platforms, and departmental applications are optimized for transaction processing but not for integrated analysis. The resulting interoperability gaps, semantic inconsistency, duplicated records, and uneven data quality constrain enterprise reporting and limit higher-value analytics. This paper substantially proposes implementable enterprise data warehouse architecture, formalizing its data-quality and conformance mechanisms, and validating the design through experimental analytics use case. The proposed framework combines an integration layer for ETL/ELT, conformed dimensions, departmental marts, governance controls, and an analytics layer for OLAP and machine learning. To demonstrate practical value, the paper evaluates the framework on a de-identified inpatient diabetes dataset comprising 101,766 encounters and 50 raw attributes. The experimental pipeline performs profiling, conformance mapping, diagnosis grouping, missing-value treatment, and dimensional modeling before training benchmark readmission models. The best ranking performance is obtained by XGBoost with an AUROC of 0.688 and an AUPRC of 0.235, while threshold tuning improves recall-oriented operational utility. The results show that healthcare warehousing should not be framed merely as centralized storage; rather, it is an architectural mechanism for interoperability, data quality control, reproducible analytics, and decision support. The manuscript concludes with implementation guidance and limitations relevant to hospitals seeking a scalable, governance-aware warehousing program

    Application Of K-Means Clustering In Grouping Customer Preferences For K-Pop Albums And Merchandise

    Get PDF
    The increasing popularity of K-Pop in Indonesia is particularly in the purchase of physical products. THJMINE Store faces challenges in inventory management and promotional strategies due to the lack of product grouping for albums and merchandise. This study applies the K-Means Clustering algorithm to 110 sales transaction data from July 2022 to January 2025. The method used in this study is the CRISP-DM approach, which consists of the following stages: business understanding, data understanding, data preparation, modeling, and evaluation discussion. The result of the study shows that the K-Means algorithm successfully formed three clusters with customer classification: loyal customers (cluster 0), general customers (cluster 1), and premium or collector customers (cluster 2). The model evaluation results in a DBI score of 0.6342, indicating good cluster quality. These clustering results can help THJMINE Store understand customer segmentation, develop more targeted marketing strategies, and improve inventory management efficiency

    Data Infrastructure Application in Education: An Integrated Architecture for Secure Learning Analytics and Student Performance Prediction

    Get PDF
    Data infrastructure has become a strategic backbone of contemporary education because digital learning environments continuously generate student traces that can be transformed into actionable evidence for teaching, advising, and institutional planning. Yet the practical value of educational data depends on much more than storage capacity. Institutions must integrate heterogeneous sources, manage raw and curated data simultaneously, enforce privacy constraints, and deliver analytics outputs that are operationally useful and ethically defensible. This study develops a layered educational data infrastructure architecture that connects raw learning data, extract-transform-load processes, governance mechanisms, curated analytics repositories, and machine-learning services. This paper includes a reproducible empirical evaluation using the real xAPI-Edu-Data benchmark collected from the Kalboard 360 learning management environment. Three machine-learning models are compared under a common preprocessing pipeline, and an ablation analysis quantifies the incremental value of integrated behavioral, parental, and contextual features. The best-performing model achieves a test macro-F1 of 0.797 and a macro one-vs-rest ROC-AUC of 0.919, while the ablation study shows that the full integrated feature set clearly outperforms demographic-only and behavior-only alternatives. The paper contributes structured architecture, mathematical formalization of integrated learning analytics, and empirical evidence that richer, better-governed data pipelines produce more useful predictive signals for educational decision support

    Application of data warehouse and OLAP processes for retail analytics

    No full text
    Retail organizations increasingly rely on heterogeneous operational platforms, including point-of-sale systems, customer relationship management applications, cloud data stores, and locally administered databases. Although these platforms are valuable for transaction processing, they often generate fragmented, duplicated, and semantically inconsistent data that constrain enterprise reporting, forecasting, and customer intelligence. This paper substantially extends a conceptual SwiftMart case into a full design-and-evaluation study of a retail data warehouse and Online Analytical Processing (OLAP) framework. The proposed artifact combines a Kimball-style dimensional architecture, a governed extract-transform-load (ETL) pipeline, conformed dimensions, and materialized OLAP summaries for managerial analytics. To ground the case empirically, the framework is evaluated using the open-access UCI Online Retail dataset, which contains 541,909 transaction records from a UK-based online retailer covering 1 December 2010 to 9 December 2011. The experiment transforms raw transactions into a star schema with 524,878 curated fact rows, 19,960 orders, 4,355 customer members, 4,158 product members, and 38 countries. Four representative analytical workloads are benchmarked across three storage designs: a normalized operational data store, a dimensional warehouse, and materialized aggregate tables. The dimensional warehouse reduces mean latency by 42.3% relative to baseline joins, while materialized aggregates reduce latency by approximately 99.9%. A forecasting demonstration on warehouse-generated daily revenue aggregates further shows that a random forest model outperforms a naive benchmark, achieving an RMSE of 23,715.84 versus 34,055.29. The paper contributes an end-to-end reference architecture for retail analytics, together with dimensional design rationale, mathematical formulations, algorithms, empirical results, and implementation guidance relevant to both academic researchers and practitioners

    Enhancing Association Rule Mining with Metaheuristic Parameter Optimization: A Transactional Data Analysis in Micro-Enterprise Context

    No full text
    Nasi Uduk Mama Ipan is a micro-enterprise that conducts sales through both offline and online platforms. However, only online transaction data is available in analyzable form, while the owner lacks the knowledge to process it. This situation highlights the urgency of leveraging data mining techniques to uncover hidden patterns that can inform effective promotional strategies. This study aims to apply association rule mining using Apriori and FP-Growth algorithms, enhanced through metaheuristic-based hyperparameter tuning, to extract meaningful product bundling insights from transactional data. The research begins with data preprocessing, which involves eliminating irrelevant columns and transforming transactional records into a binary format. Four metaheuristic algorithms—Genetic Algorithm, ACO, PSO, and SA—are employed to determine optimal support and confidence values for both Apriori and FP-Growth. The modeling phase is conducted using Python with the mlxtend.frequent_patterns library, with rules filtered using a lift ratio threshold above 1. Results show that both Apriori and FP-Growth algorithms produce identical bundling recommendations using parameters derived from the Genetic Algorithm. Apriori performs faster, while FP-Growth is more memory-efficient. This study demonstrates that combining association rule mining with metaheuristic optimization can effectively support MSMEs in making data-driven marketing decisions

    Public Sentiment Analysis on the Service Quality of PT PLN on X Using Naïve Bayes and K-Nearest Neighbor Algorithms.

    Get PDF
    Quality services since electricity is a primary public need. However, numerous complaints still highlight PLN’s lack of responsiveness, especially on the X platform (formerly Twitter). This study aims to analyze public sentiment toward PLN’s service quality expressed on X and compare the performance of the Naïve Bayes and K-Nearest Neighbor (KNN) algorithms in classifying sentiments into positive, negative, and neutral categories. The research employs the Knowledge Discovery in Databases (KDD) approach, involving data collection through tweet scraping using Tweet-Harvest, preprocessing (case folding, tokenizing, filtering, stemming), transformation with TF-IDF weighting, and data mining using Naïve Bayes and KNN. Evaluation through a confusion matrix shows that Naïve Bayes achieved an accuracy of 87%, outperforming KNN with an accuracy of 86%. These findings provide insights for PLN to better understand public perception and serve as a reference for future sentiment analysis research using machine learning

    A Lakehouse-Oriented Big Data Infrastructure for Educational Analytics: Integrating Administrative and Assessment Data for Early Student Risk Prediction

    Get PDF
    Educational institutions increasingly depend on heterogeneous digital systems, yet many analytics initiatives remain fragmented across student information, registration, assessment, and learning platforms. This paper proposes a lakehouse-oriented big data infrastructure for educational analytics and validates it through a reproducible early-risk prediction study using the Open University Learning Analytics Dataset (OULAD). The study integrates five public OULAD tables student information, course registration, assessment metadata, student assessment submissions, and course presentation metadata into temporally valid feature tables aligned to the student–module–presentation level. We define a windowed feature engineering framework that constructs actionable indicators such as submission rate, weighted completion score, average submission lag, and assessment coverage gap at 30%, 50%, 70%, and 100% of the course timeline. Two supervised classifiers, logistic regression and random forest, are evaluated under a stratified 80/20 protocol. The results show that administrative data alone provides weak discrimination (AUC  0.673), whereas integrated mid-course assessment evidence substantially improves performance. At the 50% course window, the random-forest model achieves an AUC of 0.947, F1 of 0.879, and recall of 0.829; even at the 30% window the model already reaches an AUC of 0.904. These findings demonstrate that the value of educational prediction depends not only on model choice but also on data integration architecture. The paper contributes (i) a lakehouse-oriented reference architecture for higher-education analytics, (ii) a temporally constrained feature engineering strategy for early-warning systems, and (iii) an empirical ablation showing that multi-source integration yields large and operationally meaningful gains

    Revisiting the IBM Retail Data Warehouse: A Governed One-Column Architecture and Reproducible Open-Dataset Validation for Retail Analytics

    Get PDF
    The IBM Retail Data Warehouse (RDW) correctly recognized the importance of integrated retail data, but it remained largely descriptive, did not formalize the underlying architecture, and lacked a reproducible empirical validation. This paper reconstructs and substantially extends that early proposal into a publication-ready research article. We first synthesize the historical IBM RDW, Retail Data Warehouse Model (RDWM), Retail Services Data Model (RSDM), and Retail Business Solution Template (RBST) concepts with contemporary data warehousing, data governance, and retail analytics literature. We then propose a governed, RDW-informed logical architecture that separates ingestion, quality control, conformed dimensional modeling, analytics marts, and decision-support services. To move beyond conceptual discussion, we instantiate the architecture with an open retail dataset from the UCI Machine Learning Repository containing 541,909 transactions. After governance-oriented preprocessing, the final analytical mart contains 392,692 valid rows, 18,532 orders, 4,338 customers, 3,665 products, and 37 countries. We formulate the transformation and forecasting workflow mathematically, define an end-to-end algorithmic pipeline, and evaluate a retail revenue forecasting task using naive, seasonal naive, linear regression, ridge regression, random forest, and gradient boosting baselines. On the hold-out test window, the best model (linear regression on warehouse-engineered features) achieves an RMSE of 4,302.61 GBP and R2=0.9766, while a raw, ungoverned pipeline yields a much weaker RMSE of 10,068.59 GBP. This corresponds to a 57.27% reduction in RMSE attributable to governance and dimensional integration. The results show that the practical value of an RDW-like architecture is not merely organizational; when implemented as a governed analytical platform, it measurably improves reproducibility, interpretability, and forecasting quality

    An Extended Relational Database Model for Interval Probability Set-Valued Attributes

    Get PDF
    In this paper, we introduce a new probabilistic relational database model as an extension of the classical relational database model for interval probability set-valued attributes to represent and handle uncertain and imprecise information in practice. To develop the new model, we use extended probabilistic values for representing interval probability set-valued relational attributes and the probabilistic interpretation of binary relations on sets for computing uncertain degree of functional dependencies, keys and relations on attribute values, and propose the new combination strategies of extended probabilistic values for building probabilistic relational algebraic operations. A set of the properties of the basic probabilistic relational algebraic operations is also formulated and prove

    55

    full texts

    63

    metadata records
    Updated in last 30 days.
    International Journal of Information Technology and Computer Science Applications
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇