964 research outputs found

    An initial state of design and development of intelligent knowledge discovery system for stock exchange database

    Get PDF
    Data mining is a challenging matter in research field for the last few years.Researchers are using different techniques in data mining.This paper discussed the initial state of Design and Development Intelligent Knowledge Discovery System for Stock Exchange (SE) Databases. We divide our problem in two modules.In first module we define Fuzzy Rule Base System to determined vague information in stock exchange databases.After normalizing massive amount of data we will apply our proposed approach, Mining Frequent Patterns with Neural Networks.Future prediction (e.g., political condition, corporation factors, macro economy factors, and psychological factors of investors) perform an important rule in Stock Exchange, so in our prediction model we will be able to predict results more precisely.In second module we will generate clustering algorithm. Generally our clustering algorithm consists of two steps including training and running steps.The training step is conducted for generating the neural network knowledge based on clustering.In running step, neural network knowledge based is used for supporting the Module in order to generate learned complete data, transformed data and interesting clusters that will help to generate interesting rules

    The TRECVID 2007 BBC rushes summarization evaluation pilot

    Get PDF
    This paper provides an overview of a pilot evaluation of video summaries using rushes from several BBC dramatic series. It was carried out under the auspices of TRECVID. Twenty-two research teams submitted video summaries of up to 4% duration, of 42 individual rushes video files aimed at compressing out redundant and insignificant material. The output of two baseline systems built on straightforward content reduction techniques was contributed by Carnegie Mellon University as a control. Procedures for developing ground truth lists of important segments from each video were developed at Dublin City University and applied to the BBC video. At NIST each summary was judged by three humans with respect to how much of the ground truth was included, how easy the summary was to understand, and how much repeated material the summary contained. Additional objective measures included: how long it took the system to create the summary, how long it took the assessor to judge it against the ground truth, and what the summary's duration was. Assessor agreement on finding desired segments averaged 78% and results indicate that while it is difficult to exceed the performance of baselines, a few systems did

    Web Mediators for Accessible Browsing

    Full text link
    We present a highly accurate method for classifying web pages based on link percentage, which is the percentage of text characters that are parts of links normalized by the number of all text characters on a web page. K-means clustering is used to create unique thresholds to differentiate index pages and article pages on individual web sites. Index pages contain mostly links to articles and other indices, while article pages contain mostly text. We also present a novel link grouping algorithm using agglomerative hierarchical clustering that groups links in the same spatial neighborhood together while preserving link structure. Grouping allows users with severe disabilities to use a scan-based mechanism to tab through a web page and select items. In experiments, we saw up to a 40-fold reduction in the number of commands needed to click on a link with a scan-based interface, which shows that we can vastly improve the rate of communication for users with disabilities. We used web page classification and link grouping to alter web page display on an accessible web browser that we developed to make a usable browsing interface for users with disabilities. Our classification method consistently outperformed a baseline classifier even when using minimal data to generate article and index clusters, and achieved classification accuracy of 94.0% on web sites with well-formed or slightly malformed HTML, compared with 80.1% accuracy for the baseline classifier.National Science Foundation (IIS-0308213, IIS-039009, IIS-0093367, P200A01031, EIA-0202067

    Data mining techniques for complex application domains

    Get PDF
    The emergence of advanced communication techniques has increased availability of large collection of data in electronic form in a number of application domains including healthcare, e- business, and e-learning. Everyday a large amount of records are stored electronically. However, finding useful information from such a large data collection is a challenging issue. Data mining technology aims automatically extracting hidden knowledge from large data repositories exploiting sophisticated algorithms. The hidden knowledge in the electronic data may be potentially utilized to facilitate the procedures, productivity, and reliability of several application domains. The PhD activity has been focused on novel and effective data mining approaches to tackle the complex data coming from two main application domains: Healthcare data analysis and Textual data analysis. The research activity, in the context of healthcare data, addressed the application of different data mining techniques to discover valuable knowledge from real exam-log data of patients. In particular, efforts have been devoted to the extraction of medical pathways, which can be exploited to analyze the actual treatments followed by patients. The derived knowledge not only provides useful information to deal with the treatment procedures but may also play an important role in future predictions of potential patient risks associated with medical treatments. The research effort in textual data analysis is twofold. On the one hand, a novel approach to discovery of succinct summaries of large document collections has been proposed. On the other hand, the suitability of an established descriptive data mining to support domain experts in making decisions has been investigated. Both research activities are focused on adopting widely exploratory data mining techniques to textual data analysis, which require overcoming intrinsic limitations for traditional algorithms for handling textual documents efficiently and effectively

    A Comparative Study of Text Summarization on E-mail Data Using Unsupervised Learning Approaches

    Get PDF
    Over the last few years, email has met with enormous popularity. People send and receive a lot of messages every day, connect with colleagues and friends, share files and information. Unfortunately, the email overload outbreak has developed into a personal trouble for users as well as a financial concerns for businesses. Accessing an ever-increasing number of lengthy emails in the present generation has become a major concern for many users. Email text summarization is a promising approach to resolve this challenge. Email messages are general domain text, unstructured and not always well developed syntactically. Such elements introduce challenges for study in text processing, especially for the task of summarization. This research employs a quantitative and inductive methodologies to implement the Unsupervised learning models that addresses summarization task problem, to efficiently generate more precise summaries and to determine which approach of implementing Unsupervised clustering models outperform the best. The precision score from ROUGE-N metrics is used as the evaluation metrics in this research. This research evaluates the performance in terms of the precision score of four different approaches of text summarization by using various combinations of feature embedding technique like Word2Vec /BERT model and hybrid/conventional clustering algorithms. The results reveals that both the approaches of using Word2Vec and BERT feature embedding along with hybrid PHA-ClusteringGain k-Means algorithm achieved increase in the precision when compared with the conventional k-means clustering model. Among those hybrid approaches performed, the one using Word2Vec as feature embedding method attained 55.73% as maximum precision value

    Finding groups in data: Cluster analysis with ants

    Get PDF
    Wepresent in this paper a modification of Lumer and Faieta’s algorithm for data clustering. This approach mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on the final clustering by using during the classification differentmetrics of dissimilarity: Euclidean, Cosine, and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant clustering algorithms have received special attention, especially because they still require much investigation to improve performance, stability and other key features that would make such algorithms mature tools for data mining. As a case study, this paper focus on the behavior of clustering procedures in those new approaches. The proposed algorithm and its modifications are evaluated in a number of well-known benchmark datasets. Empirical results clearly show that ant-based clustering algorithms performs well when compared to another techniques

    An interpretable clustering approach to safety climate analysis: examining driver group distinction in safety climate perceptions

    Full text link
    The transportation industry, particularly the trucking sector, is prone to workplace accidents and fatalities. Accidents involving large trucks accounted for a considerable percentage of overall traffic fatalities. Recognizing the crucial role of safety climate in accident prevention, researchers have sought to understand its factors and measure its impact within organizations. While existing data-driven safety climate studies have made remarkable progress, clustering employees based on their safety climate perception is innovative and has not been extensively utilized in research. Identifying clusters of drivers based on their safety climate perception allows the organization to profile its workforce and devise more impactful interventions. The lack of utilizing the clustering approach could be due to difficulties interpreting or explaining the factors influencing employees' cluster membership. Moreover, existing safety-related studies did not compare multiple clustering algorithms, resulting in potential bias. To address these issues, this study introduces an interpretable clustering approach for safety climate analysis. This study compares 5 algorithms for clustering truck drivers based on their safety climate perceptions. It proposes a novel method for quantitatively evaluating partial dependence plots (QPDP). To better interpret the clustering results, this study introduces different interpretable machine learning measures (SHAP, PFI, and QPDP). Drawing on data collected from more than 7,000 American truck drivers, this study significantly contributes to the scientific literature. It highlights the critical role of supervisory care promotion in distinguishing various driver groups. The Python code is available at https://github.com/NUS-DBE/truck-driver-safety-climate.Comment: Submitted to Journal:Accident Analysis and Preventio
    corecore