95,283 research outputs found
p3d: a general data-reduction tool for fiber-fed integral-field spectrographs
The reduction of integral-field spectrograph (IFS) data is demanding work.
Many repetitive operations are required in order to convert raw data into,
typically a large number of, spectra. This effort can be markedly simplified
through the use of a tool or pipeline, which is designed to complete many of
the repetitive operations without human interaction. Here we present our
semi-automatic data-reduction tool p3d that is designed to be used with
fiber-fed IFSs. Important components of p3d include a novel algorithm for
automatic finding and tracing of spectra on the detector, and two methods of
optimal spectrum extraction in addition to standard aperture extraction. p3d
also provides tools to combine several images, perform wavelength calibration
and flat field data. p3d is at the moment configured for four IFSs. In order to
evaluate its performance we have tested the different components of the tool.
For these tests we used both simulated and observational data. We demonstrate
that for three of the IFSs a correction for so-called cross-talk due to
overlapping spectra on the detector is required. Without such a correction
spectra will be inaccurate, in particular if there is a significant intensity
gradient across the object. Our tests showed that p3d is able to produce
accurate results. p3d is a highly general and freely available tool. It is
easily extended to include improved algorithms, new visualization tools and
support for additional instruments. The program code can be downloaded from the
p3d-project web site http://p3d.sourceforge.netComment: 18 pages, 15 figures, 3 tables, accepted for publication in A&
HTML Format Tables Extraction with Differentiating Cell Content as Property Name
Website presents data in various forms and formats, one of them in the form of a table. Tables on the Internet can be taken
such way by copy and paste, but this way is not easy if done on many tables then from extracted result they have been
merged with the other tables. This article discussed the research on extraction of HTML tables which stored into a database
form. The approach used was algorithm to perform the search process the number of rows and number of columns from the
table, and algorithms to perform matching the contents of the table cell extraction results with a Property Name database, so
it is unknown whether the extracted table has property in the row/column/table without property. Table and Property Name
database displays the data in the Indonesian Language. At pre processing stage Property Name database which is also
prepared the techniques to enrich the instance of the Property Name database. The tables in the extract is a table HTML
format with a simple table where the form is not found of any merger of the rows and columns in the row position merge
1/column 1. This research provides techniques to enrich the instance of a database, and with the use of illustrations, and then
an approach to do the extraction of tabular HTML format can be done in a semi-automatic. In addition to that property in the
table which is extracted can be distinguished from the contents of the cell which is a data table
Ontology-based Information Extraction with SOBA
In this paper we describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontologybased information extraction from soccer web pages for automatic population of a knowledge base that can be used for domainspecific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component. The originality of SOBA is in the fact that it extracts information from heterogeneous sources such as tabular structures, text and image captions in a semantically integrated way. In particular, it stores extracted information in a knowledge base, and in turn uses the knowledge base to interpret and link newly extracted information with respect to already existing entities
Recommended from our members
Automated synthesis of data extraction and transformation programs
Due to the abundance of data in today’s data-rich world, end-users increasingly need to perform various data extraction and transformation tasks. While many of these tedious tasks can be performed in a programmatic way, most end-users lack the required programming expertise to automate them and end up spending their valuable time in manually performing various data- related tasks. The field of program synthesis aims to overcome this problem by automatically generating programs from informal specifications, such as input-output examples or natural language.
This dissertation focuses on the design and implementation of new systems for automating important classes of data transformation and extraction tasks. It introduces solutions for automating data manipulation tasks on fully- structured data formats like relational tables, or on semi-structured formats such as XML and JSON documents.
First, we describe a novel algorithm for synthesizing hierarchical data transformations from input-output examples. A key novelty of our approach is that it reduces the synthesis of tree transformations to the simpler problem of synthesizing transformations over the paths of the tree. We also describe a new and effective algorithm for learning path transformations that combines logical SMT-based reasoning with machine learning techniques based on decision trees.
Next, we present a new methodology for learning programs that migrate tree-structured documents to relational table representations from input-output examples. Our approach achieves its goal by decomposing the synthesis task to two subproblems of (A) learning the column extraction logic, and (B) learning the row extraction logic. We propose a technique for learning column extraction programs using deterministic finite automata, and a new algorithm for predicate learning which combines integer linear programing and logic minimization.
Finally, we address the problem of automating data extraction tasks from natural language. Specifically, we focus on data retrieval from relational databases and describe a novel approach for learning SQL queries from English descriptions. The method we describe is fully automatic and database-agnostic
(i.e., does not require customization for each database). Our method combines semantic parsing techniques from the NLP community with novel programming languages ideas involving probabilistic type inhabitation and automated sketch repair.Computer Science
- …