69,908 research outputs found

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    Probabilistic learning for selective dissemination of information

    Get PDF
    New methods and new systems are needed to filter or to selectively distribute the increasing volume of electronic information being produced nowadays. An effective information filtering system is one that provides the exact information that fulfills user's interests with the minimum effort by the user to describe it. Such a system will have to be adaptive to the user changing interest. In this paper we describe and evaluate a learning model for information filtering which is an adaptation of the generalized probabilistic model of information retrieval. The model is based on the concept of 'uncertainty sampling', a technique that allows for relevance feedback both on relevant and nonrelevant documents. The proposed learning model is the core of a prototype information filtering system called ProFile

    An Analysis of Using Expert Systems and Intelligent Agents for the Virtual Library Project at the Naval Surface Warfare Center-Carderock Division

    Get PDF
    The Virtual Library Project1 at the Naval Surface Warfare Center/Carderock Division (NSWC/CD) is being developed to facilitate the incorporation and use of library documents via the Internet. These documents typically relate to the design and manufacture of ships for the U.S. Navy Fleet. As such, the libraries will store documents that contain not only text but also images, graphs and design configurations. Because of the dynamic nature of digital documents, particularly those related to design, rapid and effective cataloging of these documents becomes challenging. We conducted a research study to analyze the use of expert systems and intelligent agents to support the function of cataloging digital documents. This chapter provides an overview of past research in the use of expert systems and intelligent agents for cataloging digital documents and discusses our recommendations based on NSWC/CD’s requirements

    Lunar motion analysis and laser data management

    Get PDF
    Work completed in lunar motion analysis and laser data management during the period July 1, 1971 - September 30, 1975 was reported. In this context, analysis refers to theoretical or numerical studies involving real or potential applications of such observations to improvement of the physical model, and data management refers to the process by which observed photon events are turned into observations and are made available to potential users. The data analysis work included: (1) bringing to operational status of computer programs for the numerical integration of the lunar orbit motion and for the application of lunar laser time delays for the improvement of the parameters of the physical model, (2) program improvement and program integrity, (3) three-dimensional ephemeris, and (4) miscellaneous independent studies. The data management work included: (1) data identification, (2) observatory interfaces, and (3) data distribution

    Applications of optical processing for improving ERTS data, volume 1

    Get PDF
    Application of optically diagnosed noise information toward development of filtering subroutines for improvement of digital sensing data tape quality - Vol.

    Methods of Technical Prognostics Applicable to Embedded Systems

    Get PDF
    Hlavní cílem dizertace je poskytnutí uceleného pohledu na problematiku technické prognostiky, která nachází uplatnění v tzv. prediktivní údržbě založené na trvalém monitorování zařízení a odhadu úrovně degradace systému či jeho zbývající životnosti a to zejména v oblasti komplexních zařízení a strojů. V současnosti je technická diagnostika poměrně dobře zmapovaná a reálně nasazená na rozdíl od technické prognostiky, která je stále rozvíjejícím se oborem, který ovšem postrádá větší množství reálných aplikaci a navíc ne všechny metody jsou dostatečně přesné a aplikovatelné pro embedded systémy. Dizertační práce přináší přehled základních metod použitelných pro účely predikce zbývající užitné životnosti, jsou zde popsány metriky pomocí, kterých je možné jednotlivé přístupy porovnávat ať už z pohledu přesnosti, ale také i z pohledu výpočetní náročnosti. Jedno z dizertačních jader tvoří doporučení a postup pro výběr vhodné prognostické metody s ohledem na prognostická kritéria. Dalším dizertačním jádrem je představení tzv. částicového filtrovaní (particle filtering) vhodné pro model-based prognostiku s ověřením jejich implementace a porovnáním. Hlavní dizertační jádro reprezentuje případovou studii pro velmi aktuální téma prognostiky Li-Ion baterii s ohledem na trvalé monitorování. Případová studie demonstruje proces prognostiky založené na modelu a srovnává možné přístupy jednak pro odhad doby před vybitím baterie, ale také sleduje možné vlivy na degradaci baterie. Součástí práce je základní ověření modelu Li-Ion baterie a návrh prognostického procesu.The main aim of the thesis is to provide a comprehensive overview of technical prognosis, which is applied in the condition based maintenance, based on continuous device monitoring and remaining useful life estimation, especially in the field of complex equipment and machinery. Nowadays technical prognosis is still evolving discipline with limited number of real applications and is not so well developed as technical diagnostics, which is fairly well mapped and deployed in real systems. Thesis provides an overview of basic methods applicable for prediction of remaining useful life, metrics, which can help to compare the different approaches both in terms of accuracy and in terms of computational/deployment cost. One of the research cores consists of recommendations and guide for selecting the appropriate forecasting method with regard to the prognostic criteria. Second thesis research core provides description and applicability of particle filtering framework suitable for model-based forecasting. Verification of their implementation and comparison is provided. The main research topic of the thesis provides a case study for a very actual Li-Ion battery health monitoring and prognostics with respect to continuous monitoring. The case study demonstrates the prognostic process based on the model and compares the possible approaches for estimating both the runtime and capacity fade. Proposed methodology is verified on real measured data.

    Crowd-sourced Photographic Content for Urban Recreational Route Planning

    Get PDF
    Routing services are able to provide travel directions for users of all modes of transport. Most of them are focusing on functional journeys (i.e. journeys linking given origin and destination with minimum cost) while paying less attention to recreational trips, in particular leisure walks in an urban context. These walks are additionally predefined by time or distance and as their purpose is the process of walking itself, the attractiveness of areas that are passed by can be an important factor in route selection. This factor is hard to be formalised and requires a reliable source of information, covering the entire street network. Previous research shows that crowd-sourced data available from photo-sharing services has a potential for being a measure of space attractiveness, thus becoming a base for a routing system that suggests leisure walks, and ongoing PhD research aims to build such system. This paper demonstrates findings on four investigated data sources (Flickr, Panoramio, Picasa and Geograph) in Central London and discusses the requirements to the algorithm that is going to be implemented in the second half of this PhD research. Visual analytics was chosen as a method for understanding and comparing obtained datasets that contain hundreds of thousands records. Interactive software was developed to find a number of problems, as well as to estimate the suitability of the sources in general. It was concluded that Picasa and Geograph have problems making them less suitable for further research while Panoramio and Flickr require filtering to remove photographs that do not contribute to understanding of local attractiveness. Based on this analysis a number of filtering methods were proposed in order to improve the quality of datasets and thus provide a more reliable measure to support urban recreational routing
    corecore