Search CORE

111 research outputs found

Post Processing Wrapper Generated Tables For Labeling Anonymous Datasets

Author: Ahmed Emdad
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2011
Field of study

A large number of wrappers generate tables without column names for human consumption because the meaning of the columns are apparent from the context and easy for humans to understand, but in emerging applications, labels are needed for autonomous assignment and schema mapping where machine tries to understand the tables. Autonomous label assignment is critical in volume data processing where ad hoc mediation, extraction and querying is involved. We propose an algorithm Lads for Labeling Anonymous Datasets, which can holistically label/annotate tabular Web document. The algorithm has been tested on anonymous datasets from a number of sites, yielding very promising results. We report here our experimental results on anonymous datasets from a number of sites e.g., music, movie, watch, political, automobile, synthetic obtained through different search engine such as Google, Yahoo and MSN. The comparative probabilities of attributes being candidate labels are presented which seem to be very promising, achieved as high as 98% probability of assigning good label to anonymous attribute. To the best of our knowledge, this is the first of its kind for label assignment based on multiple search engines\u27 recommendation. We have introduced a new paradigm, Web search engine based annotator which can holistically label tabular Web document. We categorize column into three types: disjoint set column (DSC), repeated prefix/suffix column (RPS) and numeric column (NUM). For labeling DSC column, our method rely on hit counts from Web search engine (e.g., Google, Yahoo and MSN). We formulate speculative queries to Web search engine and use the principle of disambiguation by maximal evidence to come up with our solution. Our algorithm Lads is guaranteed to work for the disjoint set column. Experimental results from large number of sites in different domains and subjective evaluation of our approach show that the proposed algorithm Lads works fairly well. In this line we claim that our algorithm Lads is robust. In order to assign label for the Disjoint Set Column, we need a candidate set of labels (e.g., label library) which can be collected on-the-fly from user SQL query variable as well as from Web Form label tag. We classify a set of homogeneous anonymous datasets into meaningful label and at the same time cluster those labels into a label library by learning user expectation and materialization of her expectation from a site. Previous work in this field rely on extraction ontologies, we eliminate the need for domain specific ontologies as we could extract label from the Web form. Our system is novel in the sense that we accommodate label from the user query variable. We hypothesize that our proposed algorithm Lads will do a good job for autonomous label assignment. We bridge the gap between two orthogonal research directions: wrapper generation and ontology generation from Web site (i.e., label extraction). We are NOT aware of any such prior work that address to connect these two orthogonal research for value added services such as online comparison shopping

Digital Commons@Wayne State University

ProQuest OAI Repository

Judging Analogous Data Search In Resultant Web Databases

Author: Kanna Govinda Raju
Md Amanulla
Yaseen Sayeed
Publication venue: Kakinada Institute of Engineering and Technology for Women
Publication date: 28/10/2014
Field of study

The present scenario is based on internet technologies we are having a huge amount of useful Information which is usually having on the web databases but in not retaive effectively at the time of users needed. Information retrieval is major criteria for the people However it is indeed on WDBs. So. The Web has become the accessible media for many database applications, such as e-commerce and search medias. These applications store information in huge databases that user’s access, query, and update through the Web. Web sites have their own interfaces and access forms for creating HTML pages on the fly. Web database technologies define the way that these forms can connect to and retrieve data from database servers. In this paper we present a novel approach for annotating web search on the search engines like MSN. It automatically searches data using cluster techniques and present classify the retrieved data

International Journal of Science Engineering and Advance Technology (IJSEAT)

Knowledge Rich Natural Language Queries over Structured Biological Databases

Author: Chu W. W.
Goldsmith E. J.
InterProlog
Kossmann D.
Lawrence C.
Maio C. D.
Mir S.
Mou X.
Nandi A.
Novik L.
Safran M.
Swofford D. L.
Publication venue
Publication date: 30/03/2017
Field of study

Increasingly, keyword, natural language and NoSQL queries are being used for information retrieval from traditional as well as non-traditional databases such as web, document, image, GIS, legal, and health databases. While their popularity are undeniable for obvious reasons, their engineering is far from simple. In most part, semantics and intent preserving mapping of a well understood natural language query expressed over a structured database schema to a structured query language is still a difficult task, and research to tame the complexity is intense. In this paper, we propose a multi-level knowledge-based middleware to facilitate such mappings that separate the conceptual level from the physical level. We augment these multi-level abstractions with a concept reasoner and a query strategy engine to dynamically link arbitrary natural language querying to well defined structured queries. We demonstrate the feasibility of our approach by presenting a Datalog based prototype system, called BioSmart, that can compute responses to arbitrary natural language queries over arbitrary databases once a syntactic classification of the natural language query is made

arXiv.org e-Print Archive

Crossref

Twitter-demographer: a flow-based tool to enrich Twitter data

Author: Bianchi Federico
Cutrona Vincenzo
Hovy Dirk
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2022
Field of study

Twitter data have become essential to Natural Language Processing (NLP) and social science research, driving various scientific discoveries in recent years. However, the textual data alone are often not enough to conduct studies: especially social scientists need more variables to perform their analysis and control for various factors. How we augment this information, such as users' location, age, or tweet sentiment, has ramifications for anonymity and reproducibility, and requires dedicated effort. This paper describes Twitter-Demographer, a simple, flow-based tool to enrich Twitter data with additional information about tweets and users. Twitter-Demographer is aimed at NLP practitioners and (computational) social scientists who want to enrich their datasets with aggregated information, facilitating reproducibility, and providing algorithmic privacy-by-design measures for pseudo-anonymity. We discuss our design choices, inspired by the flow-based programming paradigm, to use black-box components that can easily be chained together and extended. We also analyze the ethical issues related to the use of this tool, and the built-in measures to facilitate pseudo-anonymity

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Bocconi

Creating ontology-based metadata by annotation for the semantic web

Author: Handschuh Siegfried
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2005
Field of study

KITopen

Semantic Systems : In the Era of Knowledge Graphs:16th International Conference on Semantic Systems, SEMANTiCS 2020, Amsterdam, The Netherlands, September 7-10, 2020 : proceedings

Author: Alam M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

International Migration, Integration and Social Cohesion online publications

Semantic Systems. In the Era of Knowledge Graphs : 16th International Conference on Semantic Systems, SEMANTiCS 2020, Amsterdam, The Netherlands, September 7–10, 2020, Proceedings

Author: Alam Mehwish
Blomqvist Eva
Boer Victor de
Groth Paul
Kieseberg Peter
Kirrane Sabrina
Käfer Tobias
Meroño-Peñuela Albert
Pandit Harshvardhan J.
Pellegrini Tassilo
Publication venue: Springer International Publishing
Publication date: 24/06/2021
Field of study

KITopen

Semantic Web methods for knowledge management [online]

Author: Decker Stefan
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2002
Field of study

KITopen

A widget library for creating policy-aware semantic Web applications

Author: Hollenbach James Dylan
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2010
Field of study

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 79-81).In order to truly reap the benefits of the Semantic Web, there must be adequate tools for writing Web applications that aggregate, view, and edit the widely varying data the Semantic Web makes available. As a step toward this goal, I introduce a Javascript widget library for creating Web applications that can both read from and write to the Semantic Web. In addition to providing widgets that perform editing operations, access control rules for user-generated content are supported using FOAF+SSL, a decentralized authentication technique, allowing for users to independently manage the restrictions placed on their data. I demonstrate this functionality with two examples: an aggregator application for exploring information about musicians from multiple data stores, and a universal annotation widget that allows users to make public and private comments about any resource on the Semantic Web.by James Dylan Hollenbach.M.Eng

DSpace@MIT