2,040 research outputs found

    Feature Extraction and Duplicate Detection for Text Mining: A Survey

    Get PDF
    Text mining, also known as Intelligent Text Analysis is an important research area. It is very difficult to focus on the most appropriate information due to the high dimensionality of data. Feature Extraction is one of the important techniques in data reduction to discover the most important features. Proce- ssing massive amount of data stored in a unstructured form is a challenging task. Several pre-processing methods and algo- rithms are needed to extract useful features from huge amount of data. The survey covers different text summarization, classi- fication, clustering methods to discover useful features and also discovering query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query thereby reducing time taken by the user. Dealing with collection of text documents, it is also very important to filter out duplicate data. Once duplicates are deleted, it is recommended to replace the removed duplicates. Hence we also review the literature on duplicate detection and data fusion (remove and replace duplicates).The survey provides existing text mining techniques to extract relevant features, detect duplicates and to replace the duplicate data to get fine grained knowledge to the user

    An Efficient Information Extraction Mechanism with Page Ranking and a Classification Strategy based on Similarity Learning of Web Text Documents

    Get PDF
    Users have recently had more access to information thanks to the growth of the www information system. In these situations, search engines have developed into an essential tool for consumers to find information in a big space. The difficulty of handling this wealth of knowledge grows more difficult every day. Although search engines are crucial for information gathering, many of the results they offer are not required by the user because they are ranked according on user string matches. As a result, there were semantic disparities between the terms used in the user inquiry and the importance of catch phrases in the results. The problem of grouping relevant information into categories of related topics hasn't been solved. A Ranking Based Similarity Learning Approach and SVM based classification frame work of web text to estimate the semantic comparison between words to improve extraction of information is proposed in the work. The results of the experiment suggest improvisation in order to obtain better results by retrieving more relevant results

    Enhancing web marketing by using ontology

    Get PDF
    The existence of the Web has a major impact on people\u27s life styles. Online shopping, online banking, email, instant messenger services, search engines and bulletin boards have gradually become parts of our daily life. All kinds of information can be found on the Web. Web marketing is one of the ways to make use of online information. By extracting demographic information and interest information from the Web, marketing knowledge can be augmented by applying data mining algorithms. Therefore, this knowledge which connects customers to products can be used for marketing purposes and for targeting existing and potential customers. The Web Marketing Project with Ontology Support has the purpose to find and improve marketing knowledge. In the Web Marketing Project, association rules about marketing knowledge have been derived by applying data mining algorithms to existing Web users\u27 data. An ontology was used as a knowledge backbone to enhance data mining for marketing. The Raising Method was developed by taking advantage of the ontology. Data are preprocessed by Raising before being fed into data mining algorithms. Raising improves the quality of the set of mined association rules by increasing the average support value. Also, new rules have been discovered after applying Raising. This dissertation thoroughly describes the development and analysis of the Raising method. Moreover, a new structure, called Intersection Ontology, is introduced to represent customer groups on demand. Only needed customer nodes are created. Such an ontology is used to simplify the marketing knowledge representation. Finally, some additional ontology usages are mentioned. By integrating an ontology into Web marketing, the marketing process support has been greatly improved

    Enhancing the interactivity of a clinical decision support system by using knowledge engineering and natural language processing

    Get PDF
    Mental illness is a serious health problem and it affects many people. Increasingly,Clinical Decision Support Systems (CDSS) are being used for diagnosis and it is important to improve the reliability and performance of these systems. Missing a potential clue or a wrong diagnosis can have a detrimental effect on the patient's quality of life and could lead to a fatal outcome. The context of this research is the Galatean Risk and Safety Tool (GRiST), a mental-health-risk assessment system. Previous research has shown that success of a CDSS depends on its ease of use, reliability and interactivity. This research addresses these concerns for the GRiST by deploying data mining techniques. Clinical narratives and numerical data have both been analysed for this purpose.Clinical narratives have been processed by natural language processing (NLP)technology to extract knowledge from them. SNOMED-CT was used as a reference ontology and the performance of the different extraction algorithms have been compared. A new Ensemble Concept Mining (ECM) method has been proposed, which may eliminate the need for domain specific phrase annotation requirements. Word embedding has been used to filter phrases semantically and to build a semantic representation of each of the GRiST ontology nodes.The Chi-square and FP-growth methods have been used to find relationships between GRiST ontology nodes. Interesting patterns have been found that could be used to provide real-time feedback to clinicians. Information gain has been used efficaciously to explain the differences between the clinicians and the consensus risk. A new risk management strategy has been explored by analysing repeat assessments. A few novel methods have been proposed to perform automatic background analysis of the patient data and improve the interactivity and reliability of GRiST and similar systems

    Investigation of Heterogeneous Approach to Fact Invention of Web Users’ Web Access Behaviour

    Get PDF
    World Wide Web consists of a huge volume of different types of data. Web mining is one of the fields of data mining wherein there are different web services and a large number of web users. Web user mining is also one of the fields of web mining. The web users’ information about the web access is collected through different ways. The most common technique to collect information about the web users is through web log file. There are several other techniques available to collect web users’ web access information; they are through browser agent, user authentication, web review, web rating, web ranking and tracking cookies. The web users find it difficult to retrieve their required information in time from the web because of the huge volume of unstructured and structured information which increases the complexity of the web. Web usage mining is very much important for various purposes such as organizing website, business and maintenance service, personalization of website and reducing the network bandwidth. This paper provides an analysis about the web usage mining techniques. Â
    • …
    corecore