Search CORE

95,283 research outputs found

p3d: a general data-reduction tool for fiber-fed integral-field spectrographs

Author: Becker T.
Böhm P.
Gerssen J.
Monreal-Ibero A.
Roth M. M.
Sandin C.
Weilbacher P.
Publication venue: 'EDP Sciences'
Publication date: 23/02/2010
Field of study

The reduction of integral-field spectrograph (IFS) data is demanding work. Many repetitive operations are required in order to convert raw data into, typically a large number of, spectra. This effort can be markedly simplified through the use of a tool or pipeline, which is designed to complete many of the repetitive operations without human interaction. Here we present our semi-automatic data-reduction tool p3d that is designed to be used with fiber-fed IFSs. Important components of p3d include a novel algorithm for automatic finding and tracing of spectra on the detector, and two methods of optimal spectrum extraction in addition to standard aperture extraction. p3d also provides tools to combine several images, perform wavelength calibration and flat field data. p3d is at the moment configured for four IFSs. In order to evaluate its performance we have tested the different components of the tool. For these tests we used both simulated and observational data. We demonstrate that for three of the IFSs a correction for so-called cross-talk due to overlapping spectra on the detector is required. Without such a correction spectra will be inaccurate, in particular if there is a significant intensity gradient across the object. Our tests showed that p3d is able to produce accurate results. p3d is a highly general and freely available tool. It is easily extended to include improved algorithms, new visualization tools and support for additional instruments. The program code can be downloaded from the p3d-project web site http://p3d.sourceforge.netComment: 18 pages, 15 figures, 3 tables, accepted for publication in A&

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

HTML Format Tables Extraction with Differentiating Cell Content as Property Name

Author: Banowosari Lintang Yuniar
Harmanto Suryadi
Purnamasari Detty
Wicaksana I Wayan Simri
Publication venue: 'American Scientific Publishers'
Publication date: 01/01/2011
Field of study

Website presents data in various forms and formats, one of them in the form of a table. Tables on the Internet can be taken such way by copy and paste, but this way is not easy if done on many tables then from extracted result they have been merged with the other tables. This article discussed the research on extraction of HTML tables which stored into a database form. The approach used was algorithm to perform the search process the number of rows and number of columns from the table, and algorithms to perform matching the contents of the table cell extraction results with a Property Name database, so it is unknown whether the extracted table has property in the row/column/table without property. Table and Property Name database displays the data in the Indonesian Language. At pre processing stage Property Name database which is also prepared the techniques to enrich the instance of the Property Name database. The tables in the extract is a table HTML format with a simple table where the form is not found of any merger of the rows and columns in the row position merge 1/column 1. This research provides techniques to enrich the instance of a database, and with the use of illustrations, and then an approach to do the extraction of tabular HTML format can be done in a semi-automatic. In addition to that property in the table which is extracted can be distinguished from the contents of the cell which is a data table

Gunadarma University Repository

Ontology-based Information Extraction with SOBA

Author: Buitelaar Paul
Cimiano Philipp
Racioppa Stefania
Siegel Melanie
Publication venue
Publication date: 20/12/2011
Field of study

In this paper we describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontologybased information extraction from soccer web pages for automatic population of a knowledge base that can be used for domainspecific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component. The originality of SOBA is in the fact that it extracts information from heterogeneous sources such as tabular structures, text and image captions in a semantically integrated way. In particular, it stores extracted information in a knowledge base, and in turn uses the knowledge base to interpret and link newly extracted information with respect to already existing entities

Hochschulschriftenserver - Universität Frankfurt am Main

Recommended from our members

Automated synthesis of data extraction and transformation programs

Author: Yaghmazadeh Navid
Publication venue
Publication date: 27/08/2018
Field of study

Due to the abundance of data in today’s data-rich world, end-users increasingly need to perform various data extraction and transformation tasks. While many of these tedious tasks can be performed in a programmatic way, most end-users lack the required programming expertise to automate them and end up spending their valuable time in manually performing various data- related tasks. The field of program synthesis aims to overcome this problem by automatically generating programs from informal specifications, such as input-output examples or natural language. This dissertation focuses on the design and implementation of new systems for automating important classes of data transformation and extraction tasks. It introduces solutions for automating data manipulation tasks on fully- structured data formats like relational tables, or on semi-structured formats such as XML and JSON documents. First, we describe a novel algorithm for synthesizing hierarchical data transformations from input-output examples. A key novelty of our approach is that it reduces the synthesis of tree transformations to the simpler problem of synthesizing transformations over the paths of the tree. We also describe a new and effective algorithm for learning path transformations that combines logical SMT-based reasoning with machine learning techniques based on decision trees. Next, we present a new methodology for learning programs that migrate tree-structured documents to relational table representations from input-output examples. Our approach achieves its goal by decomposing the synthesis task to two subproblems of (A) learning the column extraction logic, and (B) learning the row extraction logic. We propose a technique for learning column extraction programs using deterministic finite automata, and a new algorithm for predicate learning which combines integer linear programing and logic minimization. Finally, we address the problem of automating data extraction tasks from natural language. Specifically, we focus on data retrieval from relational databases and describe a novel approach for learning SQL queries from English descriptions. The method we describe is fully automatic and database-agnostic (i.e., does not require customization for each database). Our method combines semantic parsing techniques from the NLP community with novel programming languages ideas involving probabilistic type inhabitation and automated sketch repair.Computer Science

Texas ScholarWorks