794 research outputs found

    Integrating case-based reasoning and hypermedia documentation: an application for the diagnosis of a welding robot at Odense steel shipyard

    No full text
    Reliable and effective maintenance support is a vital consideration for the management within today's manufacturing environment. This paper discusses the development of a maintenance system for the world's largest robot welding facility. The development system combines a case-based reasoning approach for diagnosis with context information, as electronic on-line manuals, linked using open hypermedia technology. The work discussed in this paper delivers not only a maintenance system for the robot stations under consideration, but also a design framework for developing maintenance systems for other similar applications

    Spatially Aware Computing for Natural Interaction

    Get PDF
    Spatial information refers to the location of an object in a physical or digital world. Besides, it also includes the relative position of an object related to other objects around it. In this dissertation, three systems are designed and developed. All of them apply spatial information in different fields. The ultimate goal is to increase the user friendliness and efficiency in those applications by utilizing spatial information. The first system is a novel Web page data extraction application, which takes advantage of 2D spatial information to discover structured records from a Web page. The extracted information is useful to re-organize the layout of a Web page to fit mobile browsing. The second application utilizes the 3D spatial information of a mobile device within a large paper-based workspace to implement interactive paper that combines the merits of paper documents and mobile devices. This application can overlay digital information on top of a paper document based on the location of a mobile device within a workspace. The third application further integrates 3D space information with sound detection to realize an automatic camera management system. This application automatically controls multiple cameras in a conference room, and creates an engaging video by intelligently switching camera shots among meeting participants based on their activities. Evaluations have been made on all three applications, and the results are promising. In summary, this dissertation comprehensively explores the usage of spatial information in various applications to improve the usability

    CSM-398 - Data Extraction from Web Data Sources

    Get PDF
    This paper provides an explanation of the basic data structures used in a new page analysis technique to create wrappers (data extractors) for the result pages produced by web sites in response to user qeries via web page forms. The key structure called a tpGrid is a representation of the web page, which is easier to analyse than the raw html code. The analysis looks for repetition patterns of sets of tagSets, which are defined in the paper

    A Novel Web Scraping Approach Using the Additional Information Obtained from Web Pages

    Get PDF
    Web scraping is a process of extracting valuable and interesting text information from web pages. Most of the current studies targeting this task are mostly about automated web data extraction. In the extraction process, these studies first create a DOM tree and then access the necessary data through this tree. The construction process of this tree increases the time cost depending on the data structure of the DOM Tree. In the current web scraping literature, it is observed that time efficiency is ignored. This study proposes a novel approach, namely UzunExt, which extracts content quickly using the string methods and additional information without creating a DOM Tree. The string methods consist of the following consecutive steps: searching for a given pattern, then calculating the number of closing HTML elements for this pattern, and finally extracting content for the pattern. In the crawling process, our approach collects the additional information, including the starting position for enhancing the searching process, the number of inner tag for improving the extraction process, and tag repetition for terminating the extraction process. The string methods of this novel approach are about 60 times faster than extracting with the DOM-based method. Moreover, using these additional information improves extraction time by 2.35 times compared to using only the string methods. Furthermore, this approach can easily be adapted to other DOM-based studies/parsers in this task to enhance their time efficiencies. © 2013 IEEE

    Community based Question Answer Detection

    Get PDF
    Each day, millions of people ask questions and search for answers on the World Wide Web. Due to this, the Internet has grown to a world wide database of questions and answers, accessible to almost everyone. Since this database is so huge, it is hard to find out whether a question has been answered or even asked before. As a consequence, users are asking the same questions again and again, producing a vicious circle of new content which hides the important information. One platform for questions and answers are Web forums, also known as discussion boards. They present discussions as item streams where each item contains the contribution of one author. These contributions contain questions and answers in human readable form. People use search engines to search for information on such platforms. However, current search engines are neither optimized to highlight individual questions and answers nor to show which questions are asked often and which ones are already answered. In order to close this gap, this thesis introduces the \\emph{Effingo} system. The Effingo system is intended to extract forums from around the Web and find question and answer items. It also needs to link equal questions and aggregate associated answers. That way it is possible to find out whether a question has been asked before and whether it has already been answered. Based on these information it is possible to derive the most urgent questions from the system, to determine which ones are new and which ones are discussed and answered frequently. As a result, users are prevented from creating useless discussions, thus reducing the server load and information overload for further searches. The first research area explored by this thesis is forum data extraction. The results from this area are intended be used to create a database of forum posts as large as possible. Furthermore, it uses question-answer detection in order to find out which forum items are questions and which ones are answers and, finally, topic detection to aggregate questions on the same topic as well as discover duplicate answers. These areas are either extended by Effingo, using forum specific features such as the user graph, forum item relations and forum link structure, or adapted as a means to cope with the specific problems created by user generated content. Such problems arise from poorly written and very short texts as well as from hidden or distributed information

    Eat Your Vegetables and Be Sued for Patent Infringement

    Get PDF

    A Survey on Region Extractors from Web Documents

    Get PDF
    Extracting information from web documents has become a research area in which new proposals sprout out year after year. This has motivated several researchers to work on surveys that attempt to provide an overall picture of the many existing proposals. Unfortunately, none of these surveys provide a complete picture, because they do not take region extractors into account. These tools are kind of preprocessors, because they help information extractors focus on the regions of a web document that contain relevant information. With the increasing complexity of web documents, region extractors are becoming a must to extract information from many websites. Beyond information extraction, region extractors have also found their way into information retrieval, focused web crawling, topic distillation, adaptive content delivery, mashups, and metasearch engines. In this paper, we survey the existing proposals regarding region extractors and compare them side by side.Ministerio de Educación y Ciencia TIN2007-64119Junta de Andalucía P07-TIC-2602Junta de Andalucía P08- TIC-4100Ministerio de Ciencia e Innovación TIN2008-04718-EMinisterio de Ciencia e Innovación TIN2010-21744Ministerio de Economía, Industria y Competitividad TIN2010-09809-EMinisterio de Ciencia e Innovación TIN2010-10811-EMinisterio de Ciencia e Innovación TIN2010-09988-

    ASSESSMENT AND PREDICTION OF CARDIOVASCULAR STATUS DURING CARDIAC ARREST THROUGH MACHINE LEARNING AND DYNAMICAL TIME-SERIES ANALYSIS

    Get PDF
    In this work, new methods of feature extraction, feature selection, stochastic data characterization/modeling, variance reduction and measures for parametric discrimination are proposed. These methods have implications for data mining, machine learning, and information theory. A novel decision-support system is developed in order to guide intervention during cardiac arrest. The models are built upon knowledge extracted with signal-processing, non-linear dynamic and machine-learning methods. The proposed ECG characterization, combined with information extracted from PetCO2 signals, shows viability for decision-support in clinical settings. The approach, which focuses on integration of multiple features through machine learning techniques, suits well to inclusion of multiple physiologic signals. Ventricular Fibrillation (VF) is a common presenting dysrhythmia in the setting of cardiac arrest whose main treatment is defibrillation through direct current countershock to achieve return of spontaneous circulation. However, often defibrillation is unsuccessful and may even lead to the transition of VF to more nefarious rhythms such as asystole or pulseless electrical activity. Multiple methods have been proposed for predicting defibrillation success based on examination of the VF waveform. To date, however, no analytical technique has been widely accepted. For a given desired sensitivity, the proposed model provides a significantly higher accuracy and specificity as compared to the state-of-the-art. Notably, within the range of 80-90% of sensitivity, the method provides about 40% higher specificity. This means that when trained to have the same level of sensitivity, the model will yield far fewer false positives (unnecessary shocks). Also introduced is a new model that predicts recurrence of arrest after a successful countershock is delivered. To date, no other work has sought to build such a model. I validate the method by reporting multiple performance metrics calculated on (blind) test sets

    Semi-Supervised Named Entity Recognition:\ud Learning to Recognize 100 Entity Types with Little Supervision\ud

    Get PDF
    Named Entity Recognition (NER) aims to extract and to classify rigid designators in text such as proper names, biological species, and temporal expressions. There has been growing interest in this field of research since the early 1990s. In this thesis, we document a trend moving away from handcrafted rules, and towards machine learning approaches. Still, recent machine learning approaches have a problem with annotated data availability, which is a serious shortcoming in building and maintaining large-scale NER systems. \ud \ud In this thesis, we present an NER system built with very little supervision. Human supervision is indeed limited to listing a few examples of each named entity (NE) type. First, we introduce a proof-of-concept semi-supervised system that can recognize four NE types. Then, we expand its capacities by improving key technologies, and we apply the system to an entire hierarchy comprised of 100 NE types. \ud \ud Our work makes the following contributions: the creation of a proof-of-concept semi-supervised NER system; the demonstration of an innovative noise filtering technique for generating NE lists; the validation of a strategy for learning disambiguation rules using automatically identified, unambiguous NEs; and finally, the development of an acronym detection algorithm, thus solving a rare but very difficult problem in alias resolution. \ud \ud We believe semi-supervised learning techniques are about to break new ground in the machine learning community. In this thesis, we show that limited supervision can build complete NER systems. On standard evaluation corpora, we report performances that compare to baseline supervised systems in the task of annotating NEs in texts. \u
    corecore