32 research outputs found

    A case study on the use of machine learning techniques for supporting technology watch

    Get PDF
    Technology Watch human agents have to read many documents in order to manually categorize and dispatch them to the correct expert, that will later add valued information to each document. In this two step process, the first one, the categorization of documents, is time consuming and relies on the knowledge of a human categorizer agent. It does not add direct valued information to the process that will be provided in the second step, when the document is revised by the correct expert. This paper proposes Machine Learning tools and techniques to learn from the manually pre-categorized data to automatically classify new content. For this work a real industrial context was considered. Text from original documents, text from added value information and Semantic Annotations of those texts were used to generate different models, considering manually pre-established categories. Moreover, three algorithms from different approaches were used to generate the models. Finally, the results obtained were compared to select the best model in terms of accuracy and also on the reduction of the amount of document readings (human workload)

    Knowledge Discovery in Databases: An Information Retrieval Perspective

    Get PDF
    The current trend of increasing capabilities in data generation and collection has resulted in an urgent need for data mining applications, also called knowledge discovery in databases. This paper identifies and examines the issues involved in extracting useful grains of knowledge from large amounts of data. It describes a framework to categorise data mining systems. The author also gives an overview of the issues pertaining to data pre processing, as well as various information gathering methodologies and techniques. The paper covers some popular tools such as classification, clustering, and generalisation. A summary of statistical and machine learning techniques used currently is also provided

    TEXT MINING AND TEMPORAL TREND DETECTION ON THE INTERNET FOR TECHNOLOGY ASSESSMENT: MODEL AND TOOL

    Get PDF
    In today´s world, organizations conduct technology assessment (TAS) prior to decision making about investments in existing, emerging, and hot technologies to avoid costly mistakes and survive in the hyper-competitive business environment. Relying on web search engines in looking for relevant information for TAS processes, decision makers face abundant unstructured information that limit their ability to assess technologies within a reasonable time frame. Thus the following qustion arises: how to extract valuable TAS knowledge from a diverse corpus of textual data on the web? To cope with this qustion, this paper presents a web-based model and tool for knowledge mapping. The proposed knowledge maps are constructed on the basis of a novel method of co-word analysis, based on webometric web counts and a temporal trend detection algorithm which employs the vector space model (VSM). The approach is demonstrated and validated for a spectrum of information technologies. Results show that the research model assessments are highly correlated with subjective expert (n=136) assessment (r \u3e 0.91), and with predictive validity valu above 85%. Thus, it seems safe to assume that this work can probably be generalized to other domains. The model contribution is emphasized by the current growing attention to the big-data phenomenon

    Web news mining in an evolving framework

    Get PDF
    Online news has become one of the major channels for Internet users to get news. News websites are daily overwhelmed with plenty of news articles. Huge amounts of online news articles are generated and updated everyday, and the processing and analysis of this large corpus of data is an important challenge. This challenge needs to be tackled by using big data techniques which process large volume of data within limited run times. Also, since we are heading into a social-media data explosion, techniques such as text mining or social network analysis need to be seriously taken into consideration. In this work we focus on one of the most common daily activities: web news reading. News websites produce thousands of articles covering a wide spectrum of topics or categories which can be considered as a big data problem. In order to extract useful information, these news articles need to be processed by using big data techniques. In this context, we present an approach for classifying huge amounts of different news articles into various categories (topic areas) based on the text content of the articles. Since these categories are constantly updated with new articles, our approach is based on Evolving Fuzzy Systems (EFS). The EFS can update in real time the model that describes a category according to the changes in the content of the corresponding articles. The novelty of the proposed system relies in the treatment of the web news articles to be used by these systems and the implementation and adjustment of them for this task. Our proposal not only classifies news articles, but it also creates human interpretable models of the different categories. This approach has been successfully tested using real on-line news. (C) 2015 Elsevier B.V. All rights reserved.This work has been supported by the Spanish Government under i-Support (Intelligent Agent Based Driver Decision Support) Project (TRA2011-29454-C03-03)
    corecore