6 research outputs found

    Recommendation system for web article based on association rules and topic modelling

    Get PDF
    The World Wide Web is now the primary source for information discovery. A user visits websites that provide information and browse on the particular information in ac-cordance with their topic interest. Through the navigational process, visitors often had to jump over the menu to find the right content. Recommendation system can help the visitors to find the right content immediately. In this study, we propose a two-level recommendation system, based on association rule and topic similarity. We generate association rule by applying Apriori algorithm. The dataset for association rule mining is a session of topics that made by combining the result of sessionization and topic modeling. On the other hand, the topic similarity made by comparing the topic proportion of web article. This topic proportion inferred from the Latent Dirichlet Allocation (LDA). The results show that in our dataset there are not many interesting topic relations in one session. This result can be resolved, by utilizing the second level of recommendation by looking into the article that has the similar topic

    Recommendation system for web article based on association rules and topic modelling

    Get PDF
    The World Wide Web is now the primary source for information discovery. A user visits websites that provide information and browse on the particular information in accordance   with their   topic interest.   Through  the  navigational process,  visitors  often  had  to  jump  over  the  menu  to  find  the right  content.  Recommendation system can help the visitors to find the right content immediately.  In this study, we propose a two-level recommendation system, based on association rule and topic similarity.  We generate association rule by applying Apriori algorithm.   The  dataset  for  association  rule  mining  is a  session of  topics  that  made  by  combining  the  result of  sessionization and  topic  modeling.  On  the  other   hand,   the  topic  similarity made  by  comparing   the  topic  proportion of  web  article.  This topic proportion inferred from the Latent Dirichlet Allocation (LDA). The results show that in our dataset there are not many interesting   topic relations in one session.  This  result  can  be resolved,  by  utilizing  the  second  level  of  recommendation  by looking into the article  that  has the similar  topic

    Web usage analysis of Pillar 3 disclosed information by deposit customers in turbulent times

    Get PDF
    Market discipline has been a scrutinized area since the last financial crisis in 2008. Regulators strengthened their role particularly through Pillar 3 in Basel III. However, there are still some aspects of market discipline that deserve special attention to avoid future failures. This study focuses on the analysis of the interest and behaviour of deposit stakeholders based on website data dedicated to disclosures of commercial bank in Slovakia during and after turbulent times (period 2009–2012). The data consists of log files, and web mining techniques were applied (the modelling of web user behaviour in dependence on time - based on the proposals of the authors). The results show that also in turbulent times, stakeholders’ interest in Pillar 3 disclosures is low (in line with (Munk, Pilkova, Benko, & Blažeková, 2017)) and the highest interest was identified for the Pricing List category. After turbulent times, Pillar 3 categories (Pillar 3 related information and Pillar 3 disclosures) have weak interest, with peaks at the beginning of the year, and the highest increase was in the Business Conditions category. The results suggest that the enhancement of interest of key stakeholders in disclosures inevitably requires changes to deliver sufficient disclosure data structures and to design a disclosure policy that fulfils regulatory expectations. © 2021 The AuthorsScientific Grant Agency of the Ministry of Education of the Slovak Republic (ME SR); Slovak Academy of Sciences (SAS) [VEGA-1/0776/18, VEGA-1/0821/21]Slovenská Akadémia Vied, SAV: VEGA-1/0776/18, VEGA-1/0821/2

    Webometrics benefitting from web mining? An investigation of methods and applications of two research fields

    Full text link
    Webometrics and web mining are two fields where research is focused on quantitative analyses of the web. This literature review outlines definitions of the fields, and then focuses on their methods and applications. It also discusses the potential of closer contact and collaboration between them. A key difference between the fields is that webometrics has focused on exploratory studies, whereas web mining has been dominated by studies focusing on development of methods and algorithms. Differences in type of data can also be seen, with webometrics more focused on analyses of the structure of the web and web mining more focused on web content and usage, even though both fields have been embracing the possibilities of user generated content. It is concluded that research problems where big data is needed can benefit from collaboration between webometricians, with their tradition of exploratory studies, and web miners, with their tradition of developing methods and algorithms

    Contributions to comprehensible classification

    Get PDF
    xxx, 240 p.La tesis doctoral descrita en esta memoria ha contribuido a la mejora de dos tipos de algoritmos declasificación comprensibles: algoritmos de \'arboles de decisión consolidados y algoritmos de inducciónde reglas tipo PART.En cuanto a las contribuciones a la consolidación de algoritmos de árboles de decisión, se hapropuesto una nueva estrategia de remuestreo que ajusta el número de submuestras para permitir cambiarla distribución de clases en las submuestras sin perder información. Utilizando esta estrategia, la versiónconsolidada de C4.5 (CTC) obtiene mejores resultados que un amplio conjunto de algoritmoscomprensibles basados en algoritmos genéticos y clásicos. Tres nuevos algoritmos han sido consolidados:una variante de CHAID (CHAID*) y las versiones Probability Estimation Tree de C4.5 y CHAID* (C4.4y CHAIC). Todos los algoritmos consolidados obtienen mejores resultados que sus algoritmos de\'arboles de decisión base, con tres algoritmos consolidados clasificándose entre los cuatro mejores en unacomparativa. Finalmente, se ha analizado el efecto de la poda en algoritmos simples y consolidados de\'arboles de decisión, y se ha concluido que la estrategia de poda propuesta en esta tesis es la que obtiene mejores resultados.En cuanto a las contribuciones a algoritmos tipo PART de inducción de reglas, una primerapropuesta cambia varios aspectos de como PART genera \'arboles parciales y extrae reglas de estos, locual resulta en clasificadores con mejor capacidad de generalizar y menor complejidad estructuralcomparando con los generados por PART. Una segunda propuesta utiliza \'arboles completamentedesarrollados, en vez de parcialmente desarrollados, y genera conjuntos de reglas que obtienen aúnmejores resultados de clasificación y una complejidad estructural menor. Estas dos nuevas propuestas y elalgoritmo PART original han sido complementadas con variantes basadas en CHAID* para observar siestos beneficios pueden ser trasladados a otros algoritmos de \'arboles de decisión y se ha observado, dehecho, que los algoritmos tipo PART basados en CHAID* también crean clasificadores más simples ycon mejor capacidad de clasificar que CHAID
    corecore