118,252 research outputs found

    Intelligent Web Crawling using Semantic Signatures

    Get PDF
    The quantity of test that is added to the web in the digital form continues to grow and the quest for tools that can process this huge amount of data to retrieve the data of our interest is an ongoing process. Moreover, observing these large volumes of data over a period of time is a tedious task for any human being. Text mining is very helpful in performing these kinds of tasks. Text mining is a process of observing patterns in the text data using sophisticated statistical measures both quantitatively and qualitatively. Using these text mining techniques and the power of the internet and its technologies, we have developed a tool that retrieves documents concerning topics of interest, which utilizes novel and sensitive classification tools.;This thesis presents an intelligent web crawler, named Intel-Crawl. This tool identifies web pages of interest without the user\u27s guidance or monitoring. Documents of interest are logged (by URL or file name). This package uses automatically generated semantic signatures to identify documents with content of interest. The tool also produces a vector that is a quantification of a document\u27s content based on the semantic signatures. This provides a rich and sensitive characterization of the document\u27s content. Documents are classified according to content and presented to the user for further analysis and investigation.;Intel-Crawl may be applied to any area of interest. It is likely to be very useful in areas such as law enforcement, intelligence gathering, and monitoring changes in web site contents over time. It is well-suited for scrutinizing the web activity of large collection of web pages pertaining to similar content. The utility of Intel-Crawl is demonstrated in various situations using different parameters and classification techniques

    Surveying human habit modeling and mining techniques in smart spaces

    Get PDF
    A smart space is an environment, mainly equipped with Internet-of-Things (IoT) technologies, able to provide services to humans, helping them to perform daily tasks by monitoring the space and autonomously executing actions, giving suggestions and sending alarms. Approaches suggested in the literature may differ in terms of required facilities, possible applications, amount of human intervention required, ability to support multiple users at the same time adapting to changing needs. In this paper, we propose a Systematic Literature Review (SLR) that classifies most influential approaches in the area of smart spaces according to a set of dimensions identified by answering a set of research questions. These dimensions allow to choose a specific method or approach according to available sensors, amount of labeled data, need for visual analysis, requirements in terms of enactment and decision-making on the environment. Additionally, the paper identifies a set of challenges to be addressed by future research in the field

    A framework for an Integrated Mining of Heterogeneous data in decision support systems

    Get PDF
    The volume of information available on the Internet and corporate intranets continues to increase along with the corresponding increase in the data (structured and unstructured) stored by many organizations. Over the past years, data mining techniques have been used to explore large volume of data (structured) in order to discover knowledge, often in form of a decision support system. For effective decision making, there is need to discover knowledge from both structured and unstructured data for completeness and comprehensiveness. The aim of this paper is to present a framework to discover this kind of knowledge and to present a report on the work-in-progress on an on going research work. The proposed framework is composed of three basic phases: extraction and integration, data mining and finally the relevance of such a system to the business decision support system. In the first phase, both the structured and unstructured data are combined to form an XML database (combined data warehouse (CDW)). Efficiency is enhanced by clustering of unstructured data (documents) using SOM (Self Organized Maps) clustering algorithm, extracting keyphrases based on training and TF/IDF (Term Frequency/Inverse Document Frequency) by using the KEA (Keyphrases Extraction Algorithm) toolkit. In the second phase, association rule mining technique is applied to discover knowledge from the combined data warehouse. The final phase reflects the changes that such a system will bring about to the marketing decision support system. The paper also describes a developed system which evaluates the association rules mined from structured data that forms the first phase of the research work. The proposed system is expected to improve the quality of decisions, and this will be evaluated by using standard metrics for evaluating the interestingness of association rule which is based on statistical independence and correlation analysis

    A traffic classification method using machine learning algorithm

    Get PDF
    Applying concepts of attack investigation in IT industry, this idea has been developed to design a Traffic Classification Method using Data Mining techniques at the intersection of Machine Learning Algorithm, Which will classify the normal and malicious traffic. This classification will help to learn about the unknown attacks faced by IT industry. The notion of traffic classification is not a new concept; plenty of work has been done to classify the network traffic for heterogeneous application nowadays. Existing techniques such as (payload based, port based and statistical based) have their own pros and cons which will be discussed in this literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now
    • …
    corecore