82 research outputs found

    CRIS-IR 2006

    Get PDF
    The recognition of entities and their relationships in document collections is an important step towards the discovery of latent knowledge as well as to support knowledge management applications. The challenge lies on how to extract and correlate entities, aiming to answer key knowledge management questions, such as; who works with whom, on which projects, with which customers and on what research areas. The present work proposes a knowledge mining approach supported by information retrieval and text mining tasks in which its core is based on the correlation of textual elements through the LRD (Latent Relation Discovery) method. Our experiments show that LRD outperform better than other correlation methods. Also, we present an application in order to demonstrate the approach over knowledge management scenarios.Fundação para a Ciência e a Tecnologia (FCT) Denmark's Electronic Research Librar

    Combining rough and fuzzy sets for feature selection

    Get PDF

    Classification Arabic Twitter User’s Insights Using Rough Set Theory

    Get PDF
    Nowadays, people using social media from around the world to share their daily affairs. Arabic twitter for example is a platform where users read, reply, post which known ‘tweets’. Users trading their opinions on different trends that are not equal in important and differed based on their power and interest. Tweets can provide rich information to make decision. The main objective of this paper is to present a framework for making a valuable decision through analyzing social users' insights based on their proximity to a particular trend with highlights their power in this trend. Tweets are exceedingly unstructured that makes it difficult to analyze. Nevertheless, our proposed model differs from previous research in this field it gathered the use of supervised and unsupervised machine learning algorithms. The process of performing this work as follows: classifying users based on the degree of their closeness/interest utilizing Mendelow’s power/interest matrix, rough set theory to eliminate the features that may be found in user profiles to find minimal sets of data. The proposed model applied two attribute reduction algorithms on our dataset to determine the optimal number of reducts for improving decision making from the user replies. In addition to, unsupervised machine learning to group their replies into subcategories such as positive, negative, or neutral. The experimental evaluation shows that Johnson algorithm has reduced the user attributes by 71% than genetic algorithm that utilized in a classification model

    Delineating Knowledge Domains in Scientific Domains in Scientific Literature using Machine Learning (ML)

    Get PDF
    The recent years have witnessed an upsurge in the number of published documents. Organizations are showing an increased interest in text classification for effective use of the information. Manual procedures for text classification can be fruitful for a handful of documents, but the same lack in credibility when the number of documents increases besides being laborious and time-consuming. Text mining techniques facilitate assigning text strings to categories rendering the process of classification fast, accurate, and hence reliable. This paper classifies chemistry documents using machine learning and statistical methods. The procedure of text classification has been described in chronological order like data preparation followed by processing, transformation, and application of classification techniques culminating in the validation of the results

    Fuzzy-Rough Attribute Reduction with Application to Web Categorization

    Get PDF
    Due to the explosive growth of electronically stored information, automatic methods must be developed to aid users in maintaining and using this abundance of informa-tion eectively. In particular, the sheer volume of redundancy present must be dealt with, leaving only the information-rich data to be processed. This paper presents a novel approach, based on an integrated use of fuzzy and rough set theories, to greatly reduce this data redundancy. Formal concepts of fuzzy-rough attribute re-duction are introduced and illustrated with a simple example. The work is applied to the problem of web categorization, considerably reducing dimensionality with minimal loss of information. Experimental results show that fuzzy-rough reduction is more powerful than the conventional rough set-based approach. Classiers that use a lower dimensional set of attributes which are retained by fuzzy-rough reduc-tion outperform those that employ more attributes returned by the existing crisp rough reduction method.

    Rough Set Based Rule Evaluations and Their Applications

    Get PDF
    Knowledge discovery is an important process in data analysis, data mining and machine learning. Typically knowledge is presented in the form of rules. However, knowledge discovery systems often generate a huge amount of rules. One of the challenges we face is how to automatically discover interesting and meaningful knowledge from such discovered rules. It is infeasible for human beings to select important and interesting rules manually. How to provide a measure to evaluate the qualities of rules in order to facilitate the understanding of data mining results becomes our focus. In this thesis, we present a series of rule evaluation techniques for the purpose of facilitating the knowledge understanding process. These evaluation techniques help not only to reduce the number of rules, but also to extract higher quality rules. Empirical studies on both artificial data sets and real world data sets demonstrate how such techniques can contribute to practical systems such as ones for medical diagnosis and web personalization. In the first part of this thesis, we discuss several rule evaluation techniques that are proposed towards rule postprocessing. We show how properly defined rule templates can be used as a rule evaluation approach. We propose two rough set based measures, a Rule Importance Measure, and a Rules-As-Attributes Measure, %a measure of considering rules as attributes, to rank the important and interesting rules. In the second part of this thesis, we show how data preprocessing can help with rule evaluation. Because well preprocessed data is essential for important rule generation, we propose a new approach for processing missing attribute values for enhancing the generated rules. In the third part of this thesis, a rough set based rule evaluation system is demonstrated to show the effectiveness of the measures proposed in this thesis. Furthermore, a new user-centric web personalization system is used as a case study to demonstrate how the proposed evaluation measures can be used in an actual application

    Proceedings of the 9th International Workshop on Information Retrieval on Current Research Information Systems

    Get PDF
    The recognition of entities and their relationships in document collections is an important step towards the discovery of latent knowledge as well as to support knowledge management applications. The challenge lies on how to extract and correlate entities, aiming to answer key knowledge management questions, such as; who works with whom, on which projects, with which customers and on what research areas. The present work proposes a knowledge mining approach supported by information retrieval and text mining tasks in which its core is based on the correlation of textual elements through the LRD (Latent Relation Discovery) method. Our experiments show that LRD outperform better than other correlation methods. Also, we present an application in order to demonstrate the approach over knowledge management scenarios

    Internet-based solutions to support distributed manufacturing

    Get PDF
    With the globalisation and constant changes in the marketplace, enterprises are adapting themselves to face new challenges. Therefore, strategic corporate alliances to share knowledge, expertise and resources represent an advantage in an increasing competitive world. This has led the integration of companies, customers, suppliers and partners using networked environments. This thesis presents three novel solutions in the tooling area, developed for Seco tools Ltd, UK. These approaches implement a proposed distributed computing architecture using Internet technologies to assist geographically dispersed tooling engineers in process planning tasks. The systems are summarised as follows. TTS is a Web-based system to support engineers and technical staff in the task of providing technical advice to clients. Seco sales engineers access the system from remote machining sites and submit/retrieve/update the required tooling data located in databases at the company headquarters. The communication platform used for this system provides an effective mechanism to share information nationwide. This system implements efficient methods, such as data relaxation techniques, confidence score and importance levels of attributes, to help the user in finding the closest solutions when specific requirements are not fully matched In the database. Cluster-F has been developed to assist engineers and clients in the assessment of cutting parameters for the tooling process. In this approach the Internet acts as a vehicle to transport the data between users and the database. Cluster-F is a KD approach that makes use of clustering and fuzzy set techniques. The novel proposal In this system is the implementation of fuzzy set concepts to obtain the proximity matrix that will lead the classification of the data. Then hierarchical clustering methods are applied on these data to link the closest objects. A general KD methodology applying rough set concepts Is proposed In this research. This covers aspects of data redundancy, Identification of relevant attributes, detection of data inconsistency, and generation of knowledge rules. R-sets, the third proposed solution, has been developed using this KD methodology. This system evaluates the variables of the tooling database to analyse known and unknown relationships in the data generated after the execution of technical trials. The aim is to discover cause-effect patterns from selected attributes contained In the database. A fourth system was also developed. It is called DBManager and was conceived to administrate the systems users accounts, sales engineers’ accounts and tool trial monitoring process of the data. This supports the implementation of the proposed distributed architecture and the maintenance of the users' accounts for the access restrictions to the system running under this architecture

    Finding patterns in student and medical office data using rough sets

    Get PDF
    Data have been obtained from King Khaled General Hospital in Saudi Arabia. In this project, I am trying to discover patterns in these data by using implemented algorithms in an experimental tool, called Rough Set Graphic User Interface (RSGUI). Several algorithms are available in RSGUI, each of which is based in Rough Set theory. My objective is to find short meaningful predictive rules. First, we need to find a minimum set of attributes that fully characterize the data. Some of the rules generated from this minimum set will be obvious, and therefore uninteresting. Others will be surprising, and therefore interesting. Usual measures of strength of a rule, such as length of the rule, certainty and coverage were considered. In addition, a measure of interestingness of the rules has been developed based on questionnaires administered to human subjects. There were bugs in the RSGUI java codes and one algorithm in particular, Inductive Learning Algorithm (ILA) missed some cases that were subsequently resolved in ILA2 but not updated in RSGUI. I solved the ILA issue on RSGUI. So now ILA on RSGUI is running well and gives good results for all cases encountered in the hospital administration and student records data.Master's These

    Rough set based ensemble classifier for web page classification

    Get PDF
    Combining the results of a number of individually trained classification systems to obtain a more accurate classifier is a widely used technique in pattern recognition. In this article, we have introduced a rough set based meta classifier to classify web pages. The proposed method consists of two parts. In the first part, the output of every individual classifier is considered for constructing a decision table. In the second part, rough set attribute reduction and rule generation processes are used on the decision table to construct a meta classifier. It has been shown that (1) the performance of the meta classifier is better than the performance of every constituent classifier and, (2) the meta classifier is optimal with respect to a quality measure defined in the article. Experimental studies show that the meta classifier improves accuracy of classification uniformly over some benchmark corpora and beats other ensemble approaches in accuracy by a decisive margin, thus demonstrating the theoretical results. Apart from this, it reduces the CPU load compared to other ensemble classification techniques by removing redundant classifiers from the combination
    • …
    corecore