Search CORE

8 research outputs found

ALGORITMA UNTUK EKSTRAKSI TABEL HTML DI WEB

Author: Purnamasari Detty
Ruhama Syamsi
Wicaksana I Wayan Simri
Publication venue
Publication date
Field of study

Data di web dapat tersedia dalam format data terstruktur, semi struktur dan tidak terstruktur . Salah satu bentuk data terstruktur yang kerap disajikan pada halaman web adalah dalam bentuk tabel berbasis HTML. Pada keperluan bisnis kerap kali perlu untuk mengambil data dari berbagai sumber untuk digabungkan atau diproses lebih lanjut. Permasalahan yang timbul adalah bagaimana mengambil data dari tabel tersebut secara otomatis untuk kemudian dapat dilakukan proses lebih lanjut, seperti mengambil bagian yang dianggap penting, dan menggabungkan tabel dari halaman web yang lain. Penelitian yang dilakukan adalah mengembangkan algoritma untuk ekstraksi tiga bentuk tabel, yaitu tabel bentuk standar, tabel bentuk penggabungan baris (join row), dan tabel bentuk penggabungan cell/kolom (join coloum) dan memberikan ilustrasi dari algoritma yang dikembangkan

Gunadarma University Repository

A Natural Language Interface for Information Retrieval from Forms on the World Wide Web

Author: Meng Frank
Publication venue: AIS Electronic Library (AISeL)
Publication date: 31/12/1999
Field of study

AIS Electronic Library (AISeL)

The Web-OEM approach to Web information extraction

Author: Abiteboul
Christophides
De Rosa
Florescu
Gruser
Hammer
Hammer
Iocchi
Kistler
Lacroix
Luca Iocchi
Mendelzon
Papakonstantinou
Papakonstantinou
Papakonstantinou
Sahuguet
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Modeling tools for the integration of structured data sources

Author: Venkataramanan Jyotsna
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2011
Field of study

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2011."December 2010." Cataloged from PDF version of thesis.Includes bibliographical references (p. 61-64).Disparity in representations within structured documents such as XML or SQL makes interoperability challenging, error-prone and expensive. A model is developed to process disparate representations to an encompassing generic knowledge representation. Data sources were characterized according to a number of smaller models: their case; the underlying data storage structures; a content model based on the ontological structure defined by the documents schema; and the data model or physical structure of the schema. In order to harmonize different representations and give them semantic meaning, from the above categories the representation is mapped to a common dictionary. The models were implemented as a structured data analysis tool and a basis was built to compare across schema and documents. Data exchange within modeling and simulation environments are increasingly in the form of XML using a variety of schema. Therefore, we demonstrate the use of this modeling tool to automatically harmonized multiple disparate XML data sources in a prototype simulated environment.by Jyotsna Venkataramanan.M.Eng

DSpace@MIT

Agentbaserte tjenester i finanssektoren : en undersøkelse av mulighetene for integrering av informasjon fra ulike kilder, samt presentasjon av en valgt løsning for sektoren

Author: Olsen Anne-Gro
Publication venue: Agder University College
Publication date: 01/01/1999
Field of study

Masteroppgave i informasjons- og kommunikasjonsteknologi 1999 - Høgskolen i Agder, GrimstadDenne hovedoppgaven tar for seg informasjonsintegreringsproblemet på Internett. Problemene med å utnytte informasjonen som finnes på Internett er generelle, og også finanssektoren lider under dem, både kunder og bedrifter. I oppgaven foretar jeg en kartlegging av de foreliggende forskningsresultatene som skal lette integrering og innhenting av informasjon. Resultatene er mange siden problemet er stort, og mange har dedikert seg til denne forskningen. Av alle de mulighetene jeg her finner, er det derimot få som tas i bruk i dagens finanssektor på Internett. Den neste undersøkelsen jeg foretar i oppgaven viser nemlig at de fleste tilbydere av integrert finansinformasjon benytter seg av „tradisjonelle“ metoder under informasjons-innsamlingen som fax, telefon og e-mail. Etter en vurdering av forskningsresultatene, har jeg valgt fellesformat som den løsningen jeg syns egner seg best for finanssektoren. Fellesformatet løser grunnleggende vanskeligheter ved å strukturere informasjonen på kildesidene. Bankene opererer i dag ikke med noen standard for hva som publiseres og hvordan dette skal gjøres. Jeg utvikler derfor et forslag til et entydig informasjonsinnhold for banker og finansinstitusjoner. Dette informasjonsinnholdet tilrettelegger jeg så for det valgte fellesformatet, og viser hvordan man kan utvikle verdiøkende tjenester for finanssektoren med et strukturert informasjonsinnhold som grunnlag. Eksempler på de konsekvensene jeg ser av en omlegging til et strukturert informasjonsformat er blant annet at man på produsent-siden lettere kan gå bort fra den integrerte verdikjeden som har vært vanlig i bank, til mer outsourcing av tjenester. Videre vil det støtte framveksten av nye mellomledd på Internett ettersom innsamling og integrering av informasjon vil forenkles, og til slutt vil kundene enklere kunne utveksle erfaringer, for eksempel i kundegrupper på Internett, ved at informasjonen de oppgir er entydig strukturert

NORA - Norwegian Open Research Archives

Agder University Research Archive

Propuesta de un refinador semántico para recuperación de información desde la Web

Author: Deco Claudia
Publication venue: UR. FI-INCO
Publication date
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Wrapper Generation for Web Accessible Data Sources

Author: Jean-robert Gruser
Laura Bright
Louiqa Raschid
María Esther Vidal
Publication venue
Publication date: 01/01/1998
Field of study

There is an increase in the number of data sources that can be queried across the WWW. Such sources typically support HTML forms-based interfaces and search engines query collections of suitably indexed data. The data is displayed via a browser. One drawback to these sources is that there is no standard programming interface suitable for applications to submit queries. Second, the output (answer to a query) is not well structured. Structured objects have to be extracted from the HTML documents which contain irrelevant data and which may be volatile. Third, domain knowledge about the data source is also embedded in HTML documents and must be extracted. To solve these problems, we present technology to define and (automatically) generate wrappers for Web accessible sources. Our contributions are as follows: (1) Defining a wrapper interface to specify the capability of Web accessible data sources. (2) Developing a wrapper generation toolkit of graphical interfaces and specification languages to specify the capability of sources and the functionality of the wrapper. (3) Developing the technology to automatically generate a wrapper appropriate to the Web accessible source, from the specifications. 1

CiteSeerX

Crossref

An object-oriented approach to the translation between MOF Metaschemas

Author: Raventós Pagès Ruth
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2009
Field of study

Since the 1960s, many formal languages have been developed in order to allow software engineers to specify conceptual models and to design software artifacts. A few of these languages, such as the Unified Modeling Language (UML), have become widely used standards. They employ notations and concepts that are not readily understood by "domain experts," who understand the actual problem domain and are responsible for finding solutions to problems.The Object Management Group (OMG) developed the Semantics of Business Vocabulary and Rules (SBVR) specification as a first step towards providing a language to support the specification of "business vocabularies and rules." The function of SBVR is to capture business concepts and business rules in languages that are close enough to ordinary language, so that business experts can read and write them, and formal enough to capture the intended semantics and present them in a form that is suitable for engineering the automation of the rules.The ultimate goal of business rules approaches is to build software systems directly from vocabularies and rules. One way of reaching this goal, within the context of model-driven architecture (MDA), is to transform SBVR models into UML models. OMG also notes the need for a reverse engineering transformation between UML schemas and SBVR vocabularies and rules in order to validate UML schemas. This thesis proposes an automatic approach to translation between UML schemas and SBVR vocabularies and rules, and vice versa. It consists of the application of a new generic schema translation approach to the particular case of UML and SBVR.The main contribution of the generic approach is the extensive use of object-oriented concepts in the definition of translation mappings, particularly the use of operations (and their refinements) and invariants, both formalized in the Object Constraint Language (OCL). Translation mappings can be used to check that two schemas are translations of each other, and to translate one into the other, in either direction. Translation mappings are declaratively defined by means of preconditions, postconditions and invariants, and they can be implemented in any suitable language. The approach leverages the object-oriented constructs embedded in Meta Object Facility (MOF) metaschemas to achieve the goals of object-oriented software development in the schema translation problem.The generic schema translation approach and its application to UML schemas and SBVR vocabularies and rules is fully implemented in the UML-based Specification Environment (USE) tool and validated by a case study based on the conceptual schema of the Digital Bibliography & Library Project (DBLP) system

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura