964 research outputs found
An initial state of design and development of intelligent knowledge discovery system for stock exchange database
Data mining is a challenging matter in research field for the last few years.Researchers are using different techniques in data mining.This paper discussed the initial state of Design and Development Intelligent Knowledge Discovery System for Stock Exchange (SE) Databases. We divide our problem in two modules.In first module we define Fuzzy Rule Base System to determined vague information in stock exchange databases.After normalizing massive amount of data we will apply our proposed approach, Mining Frequent Patterns with Neural Networks.Future prediction (e.g., political condition, corporation factors, macro economy factors, and psychological factors of investors) perform an important rule in Stock Exchange, so in our prediction model we will be able to predict results more precisely.In second module we will generate clustering algorithm. Generally our clustering algorithm consists of two steps including training and running steps.The training step is conducted for generating the neural network knowledge based on clustering.In running step, neural network knowledge based is used for supporting the Module in order to generate learned complete data, transformed data and interesting clusters that will help to generate interesting rules
The TRECVID 2007 BBC rushes summarization evaluation pilot
This paper provides an overview of a pilot evaluation of
video summaries using rushes from several BBC dramatic series. It was carried out under the auspices of TRECVID.
Twenty-two research teams submitted video summaries of
up to 4% duration, of 42 individual rushes video files aimed
at compressing out redundant and insignificant material.
The output of two baseline systems built on straightforward
content reduction techniques was contributed by Carnegie
Mellon University as a control. Procedures for developing
ground truth lists of important segments from each video
were developed at Dublin City University and applied to
the BBC video. At NIST each summary was judged by
three humans with respect to how much of the ground truth
was included, how easy the summary was to understand,
and how much repeated material the summary contained.
Additional objective measures included: how long it took
the system to create the summary, how long it took the assessor to judge it against the ground truth, and what the
summary's duration was. Assessor agreement on finding desired segments averaged 78% and results indicate that while it is difficult to exceed the performance of baselines, a few systems did
Web Mediators for Accessible Browsing
We present a highly accurate method for classifying web pages based on link percentage, which is the percentage of text characters that are parts of links normalized by the number of all text characters on a web page. K-means clustering is used to create unique thresholds to differentiate index pages and article pages on individual web sites. Index pages contain mostly links to articles and other indices, while article pages contain mostly text. We also present a novel link grouping algorithm using agglomerative hierarchical clustering that groups links in the same spatial neighborhood together while preserving link structure. Grouping allows users with severe disabilities to use a scan-based mechanism to tab through a web page and select items. In experiments, we saw up to a 40-fold reduction in the number of commands needed to click on a link with a scan-based interface, which shows that we can vastly improve the rate of communication for users with disabilities. We used web page classification and link grouping to alter web page display on an accessible web browser that we developed to make a usable browsing interface for users with disabilities. Our classification method consistently outperformed a baseline classifier even when using minimal data to generate article and index clusters, and achieved classification accuracy of 94.0% on web sites with well-formed or slightly malformed HTML, compared with 80.1% accuracy for the baseline classifier.National Science Foundation (IIS-0308213, IIS-039009, IIS-0093367, P200A01031, EIA-0202067
Data mining techniques for complex application domains
The emergence of advanced communication techniques has increased availability of large collection of data in electronic form in a number of application domains including healthcare, e- business, and e-learning. Everyday a large amount of records are stored electronically. However, finding useful information from such a large data collection is a challenging issue. Data mining technology aims automatically extracting hidden knowledge from large data repositories exploiting sophisticated algorithms. The hidden knowledge in the electronic data may be potentially utilized to facilitate the procedures, productivity, and reliability of several application domains.
The PhD activity has been focused on novel and effective data mining approaches to tackle the complex data coming from two main application domains: Healthcare data analysis and Textual data analysis.
The research activity, in the context of healthcare data, addressed the application of different data mining techniques to discover valuable knowledge from real exam-log data of patients. In particular, efforts have been devoted to the extraction of medical pathways, which can be exploited to analyze the actual treatments followed by patients. The derived knowledge not only provides useful information to deal with the treatment procedures but may also play an important role in future predictions of potential patient risks associated with medical treatments.
The research effort in textual data analysis is twofold. On the one hand, a novel approach to discovery of succinct summaries of large document collections has been proposed. On the other hand, the suitability of an established descriptive data mining to support domain experts in making decisions has been investigated. Both research activities are focused on adopting widely exploratory data mining techniques to textual data analysis, which require overcoming intrinsic limitations for traditional algorithms for handling textual documents efficiently and effectively
A Comparative Study of Text Summarization on E-mail Data Using Unsupervised Learning Approaches
Over the last few years, email has met with enormous popularity. People send and receive a lot of messages every day, connect with colleagues and friends, share files and information. Unfortunately, the email overload outbreak has developed into a personal trouble for users as well as a financial concerns for businesses. Accessing an ever-increasing number of lengthy emails in the present generation has become a major concern for many users. Email text summarization is a promising approach to resolve this challenge. Email messages are general domain text, unstructured and not always well developed syntactically. Such elements introduce challenges for study in text processing, especially for the task of summarization. This research employs a quantitative and inductive methodologies to implement the Unsupervised learning models that addresses summarization task problem, to efficiently generate more precise summaries and to determine which approach of implementing Unsupervised clustering models outperform the best. The precision score from ROUGE-N metrics is used as the evaluation metrics in this research. This research evaluates the performance in terms of the precision score of four different approaches of text summarization by using various combinations of feature embedding technique like Word2Vec /BERT model and hybrid/conventional clustering algorithms. The results reveals that both the approaches of using Word2Vec and BERT feature embedding along with hybrid PHA-ClusteringGain k-Means algorithm achieved increase in the precision when compared with the conventional k-means clustering model. Among those hybrid approaches performed, the one using Word2Vec as feature embedding method attained 55.73% as maximum precision value
Finding groups in data: Cluster analysis with ants
Wepresent in this paper a modification of Lumer and Faieta’s algorithm for data clustering. This approach
mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically
clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus
on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on
the final clustering by using during the classification differentmetrics of dissimilarity: Euclidean, Cosine,
and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more
conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant
clustering algorithms have received special attention, especially because they still require much
investigation to improve performance, stability and other key features that would make such algorithms
mature tools for data mining.
As a case study, this paper focus on the behavior of clustering procedures in those new approaches.
The proposed algorithm and its modifications are evaluated in a number of well-known benchmark
datasets. Empirical results clearly show that ant-based clustering algorithms performs well when
compared to another techniques
Recommended from our members
Language acquisition and machine learning
In this paper, we review recent progress in the field of machine learning and examine its implications for computational models of language acquisition. As a framework for understanding this research, we propose four component tasks involved in learning from experience - aggregation, clustering, characterization, and storage. We then consider four common problems studied by machine learning researchers - learning from examples, heuristics learning, conceptual clustering, and learning macro-operators - describing each in terms of our framework. After this, we turn to the problem of grammar acquisition, relating this problem to other learning tasks and reviewing four AI systems that have addressed the problem. Finally, we note some limitations of the earlier work and propose an alternative approach to modeling the mechanisms underlying language acquisition
An interpretable clustering approach to safety climate analysis: examining driver group distinction in safety climate perceptions
The transportation industry, particularly the trucking sector, is prone to
workplace accidents and fatalities. Accidents involving large trucks accounted
for a considerable percentage of overall traffic fatalities. Recognizing the
crucial role of safety climate in accident prevention, researchers have sought
to understand its factors and measure its impact within organizations. While
existing data-driven safety climate studies have made remarkable progress,
clustering employees based on their safety climate perception is innovative and
has not been extensively utilized in research. Identifying clusters of drivers
based on their safety climate perception allows the organization to profile its
workforce and devise more impactful interventions. The lack of utilizing the
clustering approach could be due to difficulties interpreting or explaining the
factors influencing employees' cluster membership. Moreover, existing
safety-related studies did not compare multiple clustering algorithms,
resulting in potential bias. To address these issues, this study introduces an
interpretable clustering approach for safety climate analysis. This study
compares 5 algorithms for clustering truck drivers based on their safety
climate perceptions. It proposes a novel method for quantitatively evaluating
partial dependence plots (QPDP). To better interpret the clustering results,
this study introduces different interpretable machine learning measures (SHAP,
PFI, and QPDP). Drawing on data collected from more than 7,000 American truck
drivers, this study significantly contributes to the scientific literature. It
highlights the critical role of supervisory care promotion in distinguishing
various driver groups. The Python code is available at
https://github.com/NUS-DBE/truck-driver-safety-climate.Comment: Submitted to Journal:Accident Analysis and Preventio
- …