41 research outputs found

    Social stream classification with emerging new labels

    Get PDF
    Singapore National Research Foundation under International Research Centres in Singapore Funding Initiativ

    Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows

    Get PDF
    Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertain data, or use the sliding window model to assess data streams. Sliding window model uses a fixed-size window to only maintain the most recently inserted data and ignores all previous data (or those that are out of its window). Many real-world applications however require maintaining all inserted or obtained data. Therefore, the question arises that whether other window models can be used to find frequent patterns in dynamic streams of uncertain data.In this paper, we used landmark window model and time-fading model to answer that question. The method presented in the form of proposed algorithm, which uses the idea of landmark window model to find frequent patterns in the relational and uncertain data streams, shows a better performance in finding functional dependencies than other methods in this field. Another advantage of this method compared with other methods is that it shows tuples that do not follow a single dependency. This feature can be used to detect inconsistent data in a data set

    Duomenų tyrybos sistemų galimybių tyrimas įvairių apimčių duomenims analizuoti

    Get PDF
    Tobulėjant šiuolaikinėms informacinėms ir komunikacinėms technologijoms, sparčiai didėja apdorojamų ir saugomų duomenų kiekiai, todėl duomenų analizės uždavinys tampa vis sudėtingesnis, sunku daryti greitus, efektyvius ir teisingus sprendimus. Duomenų analizei dažnai pasitelkiama duomenų tyryba. Duomenų tyryba – tai procesas, kurio metu iš duomenų išgaunamos naudingos žinios. Duomenims apdoroti bei žinioms išgauti reikalingos duomenų tyrybos sistemos, leidžiančios apdoroti įvairios apimties duomenis. Tyrime siekiama nustatyti, kokios apimties duomenis per priimtiną laiką sugeba apdoroti populiariausios duomenų tyrybos sistemos. Nagrinėjamas ir lyginamas trijose atvirojo kodo duomenų tyrybos sistemose (WEKA, KNIME, ORANGE) įgyvendintų klasifi kavimo ir klasterizavimo algoritmų skaičiavimo laikas, analizuojant skirtingos apimties duomenų aibes. Vertinant sistemas svarbus ne tik algoritmų skaičiavimo laikas, bet ir klasifi kavimo bei klasterizavimo tikslumas, kurį pavyksta pasiekti per tą laiką, todėl straipsnyje pateikiamos ir eksperimentiniuose tyrimuose gauto tikslumo matų reikšmės.Investigation of the abilities of data mining systems to analyse various volume datasets Kotryna Paulauskienė, Olga Kurasova SummaryThe aim of the paper is to determine what volume of data the popular data mining systems are able to analyse within a reasonable period of time, when solving classifi cation and clustering problems. Three open source data mining systems are investigated: WEKA, KNIME, and ORANGE. The experiments have been carried out with eight datasets, where the number of attributes was fi xed – 100 and the number of instances ranged between 5000 and 600 000. The experimental investigation has shown that when the ORANGE system is used, the data of more than 50 000 instances are of too large volume. In order to analyse larger datasets, the WEKA and KNIME systems need to be used. The data of more than 200 000 instances are of too large volume for WEKA and KNIME, however, when simple classifi cation methods are used, both systems are able to handle 400 000 instances, and KNIME – 600 000 instances. The results have showed that KNIME can handle larger datasets than WEKA, when applying some classifi cation methods. The accuracy of classifi cation is high enough, when the classifi cation methods, implemented in the systems, are used.%; font-family: Calibri, sans-serif;">&nbsp

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p

    A Comprehensive Bibliometric Analysis on Social Network Anonymization: Current Approaches and Future Directions

    Full text link
    In recent decades, social network anonymization has become a crucial research field due to its pivotal role in preserving users' privacy. However, the high diversity of approaches introduced in relevant studies poses a challenge to gaining a profound understanding of the field. In response to this, the current study presents an exhaustive and well-structured bibliometric analysis of the social network anonymization field. To begin our research, related studies from the period of 2007-2022 were collected from the Scopus Database then pre-processed. Following this, the VOSviewer was used to visualize the network of authors' keywords. Subsequently, extensive statistical and network analyses were performed to identify the most prominent keywords and trending topics. Additionally, the application of co-word analysis through SciMAT and the Alluvial diagram allowed us to explore the themes of social network anonymization and scrutinize their evolution over time. These analyses culminated in an innovative taxonomy of the existing approaches and anticipation of potential trends in this domain. To the best of our knowledge, this is the first bibliometric analysis in the social network anonymization field, which offers a deeper understanding of the current state and an insightful roadmap for future research in this domain.Comment: 73 pages, 28 figure

    Opinion Mining for Software Development: A Systematic Literature Review

    Get PDF
    Opinion mining, sometimes referred to as sentiment analysis, has gained increasing attention in software engineering (SE) studies. SE researchers have applied opinion mining techniques in various contexts, such as identifying developers’ emotions expressed in code comments and extracting users’ critics toward mobile apps. Given the large amount of relevant studies available, it can take considerable time for researchers and developers to figure out which approaches they can adopt in their own studies and what perils these approaches entail. We conducted a systematic literature review involving 185 papers. More specifically, we present 1) well-defined categories of opinion mining-related software development activities, 2) available opinion mining approaches, whether they are evaluated when adopted in other studies, and how their performance is compared, 3) available datasets for performance evaluation and tool customization, and 4) concerns or limitations SE researchers might need to take into account when applying/customizing these opinion mining techniques. The results of our study serve as references to choose suitable opinion mining tools for software development activities, and provide critical insights for the further development of opinion mining techniques in the SE domain
    corecore