3 research outputs found
New frontiers in applied data mining
International audienceFive high-quality workshops were held at the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2009) in Bangkok, Thailand during April 27-30, 2009. There were 17, 6, 9, 4 and 5 accepted papers to be presented at the Pacific Asia Workshop on Intelligence and Security Informatics (PAISI 2009), the workshop on Advances and Issues in Biomedical Data Mining (AIBDM 2009), the workshop on Data Mining with Imbalanced Classes and Error Cost (ICEC 2009), the workshop on Open Source in Data Mining (OSDM 2009), and the workshop on Quality Issues, Measures of Interestingness and Evaluation of Data Mining Models (QIMIE 2009). One competition, PAKDD 2009 Data Mining Competition, and one local workshop, Thai Track Session, were arranged. From these workshops (except PAISI which published its works in separate LNCS proceedings), we selected two or three best papers for this LNCS publication. PAKDD is a major international conference in the areas of data mining (DM) and knowledge discovery in database (KDD). It provides an international forum for researchers and industry practitioners to share their new ideas, original research results and practical development experiences from all KDD-related areas including data mining, data warehousing, machine learning, databases, statistics, knowledge acquisition and automatic scientific discovery,data visualization, causal induction and knowledge-based systems
A series of case studies to enhance the social utility of RSS
RSS (really simple syndication, rich site summary or RDF site summary) is a dialect of
XML that provides a method of syndicating on-line content, where postings consist of
frequently updated news items, blog entries and multimedia. RSS feeds, produced by
organisations or individuals, are often aggregated, and delivered to users for consumption
via readers. The semi-structured format of RSS also allows the delivery/exchange of
machine-readable content between different platforms and systems.
Articles on web pages frequently include icons that represent social media services
which facilitate social data. Amongst these, RSS feeds deliver data which is typically
presented in the journalistic style of headline, story and snapshot(s). Consequently, applications
and academic research have employed RSS on this basis. Therefore, within the
context of social media, the question arises: can the social function, i.e. utility, of RSS be
enhanced by producing from it data which is actionable and effective?
This thesis is based upon the hypothesis that the
fluctuations in the keyword frequencies
present in RSS can be mined to produce actionable and effective data, to enhance
the technology's social utility. To this end, we present a series of laboratory-based case
studies which demonstrate two novel and logically consistent RSS-mining paradigms. Our first paradigm allows users to define mining rules to mine data from feeds. The second
paradigm employs a semi-automated classification of feeds and correlates this with sentiment.
We visualise the outputs produced by the case studies for these paradigms, where
they can benefit users in real-world scenarios, varying from statistics and trend analysis
to mining financial and sporting data.
The contributions of this thesis to web engineering and text mining are the demonstration
of the proof of concept of our paradigms, through the integration of an array of
open-source, third-party products into a coherent and innovative, alpha-version prototype
software implemented in a Java JSP/servlet-based web application architecture
A series of case studies to enhance the social utility of RSS
RSS (really simple syndication, rich site summary or RDF site summary) is a dialect of
XML that provides a method of syndicating on-line content, where postings consist of
frequently updated news items, blog entries and multimedia. RSS feeds, produced by
organisations or individuals, are often aggregated, and delivered to users for consumption
via readers. The semi-structured format of RSS also allows the delivery/exchange of
machine-readable content between different platforms and systems.
Articles on web pages frequently include icons that represent social media services
which facilitate social data. Amongst these, RSS feeds deliver data which is typically
presented in the journalistic style of headline, story and snapshot(s). Consequently, applications
and academic research have employed RSS on this basis. Therefore, within the
context of social media, the question arises: can the social function, i.e. utility, of RSS be
enhanced by producing from it data which is actionable and effective?
This thesis is based upon the hypothesis that the
fluctuations in the keyword frequencies
present in RSS can be mined to produce actionable and effective data, to enhance
the technology's social utility. To this end, we present a series of laboratory-based case
studies which demonstrate two novel and logically consistent RSS-mining paradigms. Our first paradigm allows users to define mining rules to mine data from feeds. The second
paradigm employs a semi-automated classification of feeds and correlates this with sentiment.
We visualise the outputs produced by the case studies for these paradigms, where
they can benefit users in real-world scenarios, varying from statistics and trend analysis
to mining financial and sporting data.
The contributions of this thesis to web engineering and text mining are the demonstration
of the proof of concept of our paradigms, through the integration of an array of
open-source, third-party products into a coherent and innovative, alpha-version prototype
software implemented in a Java JSP/servlet-based web application architecture