Search CORE

139,386 research outputs found

Generating adaptive hypertext content from the semantic web

Author: Alani Harith
Hall Wendy
Kim Sanghee
Lewis Paul
Millard David E.
Roure David De
Shadbolt Nigel
Weal Mark J.
Publication venue
Publication date: 01/01/2003
Field of study

Accessing and extracting knowledge from online documents is crucial for therealisation of the Semantic Web and the provision of advanced knowledge services. The Artequakt project is an ongoing investigation tackling these issues to facilitate the creation of tailored biographies from information harvested from the web. In this paper we will present the methods we currently use to model, consolidate and store knowledge extracted from the web so that it can be re-purposed as adaptive content. We look at how Semantic Web technology could be used within this process and also how such techniques might be used to provide content to be published via the Semantic Web

Southampton (e-Prints Soton)

Open Research Online (The Open University)

Distantly Supervised Web Relation Extraction for Knowledge Base Population

Author: Lewis
Suchanek
Vrandečić
Wu
Publication venue: 'IOS Press'
Publication date: 27/05/2016
Field of study

Extracting information from Web pages for populating large, cross-domain knowledge bases requires methods which are suitable across domains, do not require manual effort to adapt to new domains, are able to deal with noise, and integrate information extracted from different Web pages. Recent approaches have used existing knowledge bases to learn to extract information with promising results, one of those approaches being distant supervision. Distant supervision is an unsupervised method which uses background information from the Linking Open Data cloud to automatically label sentences with relations to create training data for relation classifiers. In this paper we propose the use of distant supervision for relation extraction from the Web. Although the method is promising, existing approaches are still not suitable for Web extraction as they suffer from three main issues: data sparsity, noise and lexical ambiguity. Our approach reduces the impact of data sparsity by making entity recognition tools more robust across domains and extracting relations across sentence boundaries using unsupervised co- reference resolution methods. We reduce the noise caused by lexical ambiguity by employing statistical methods to strategically select training data. To combine information extracted from multiple sources for populating knowledge bases we present and evaluate several information integration strategies and show that those benefit immensely from additional relation mentions extracted using co-reference resolution, increasing precision by 8%. We further show that strategically selecting training data can increase precision by a further 3%

CiteSeerX

Crossref

UCL Discovery

White Rose Research Online

Mining Web usage using FRS

Author: Abdullah Zainatul Shima
Omar Rosli
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/12/2018
Field of study

Web Usage Mining (WUM) is the application of data mining methods in extracting potentially useful information from web usage data. Its application includes improving website design, personalised service, target marketing etc. Among the outstanding research issues in WUM include inefficiency in mining large weblogs, extracted patterns that are not representative of actual user behavior, and mining results which are too general, uninteresting and lack insights. This paper attempts to address the above problems using a method of mining that captures user traversing activities more effectively based on the notion of regularity. A mining algorithm is introduced using the approach of vertical database. The experiments suggest that the method is efficient, scalable, and able to address confusion caused by large number of extracted patterns

Crossref

The International Islamic University Malaysia Repository

Generating Paired Transliterated-cognates Using Multiple Pronunciation Characteristics from Web corpora

Author: Kuo Jin-Shea
Yang Ying-Kuei
Publication venue: Logico-Linguistic Society of Japan
Publication date: 16/11/2005
Field of study

A novel approach to automatically extracting paired transliterated-cognates from Web corpora is proposed in this paper. One of the most important issues addressed is that of taking multiple pronunciation characteristics into account. Terms from various languages may pronounce very differently. Incorporating the knowledge of word origin may improve the pronunciation accuracy of terms. The accuracy of generated phonetic information has an important impact on term transliteration and hence transliterated-term extraction. Transliterated-term extraction is a fundamental task in natural language processing to extract paired transliterated-terms in studying term transliteration. An experiment on transliterated-term extraction from two kinds of Web resources, Web pages and anchored texts, has been conducted and evaluated. The experimental results show that many transliterated-term pairs, which cannot be extracted using the approach only exploiting English pronunciation characteristics, have been successfully extracted using the proposed approach in this paper. By taking multiple language-specific pronunciation transformations into account may further improve the output of the transliterated-term extraction

Waseda University Repository

A Semantic-Based Framework for Summarization and Page Segmentation in Web Mining

Author: Gastaldo Paolo
Leoncini Alessio
Sangiacomo Fabio
Zunino Rodolfo
Publication venue: 'IntechOpen'
Publication date: 01/01/2012
Field of study

This chapter addresses two crucial issues that arise when one applies Web-mining techniques for extracting relevant information. The first one is the acquisition of useful knowledge from textual data; the second issue stems from the fact that a web page often proposes a considerable amount of \u2018noise\u2019 with respect to the sections that are truly informative for the user's purposes. The novelty contribution of this work lies in a framework that can tackle both these tasks at the same time, supporting text summarization and page segmentation. The approach achieves this goal by exploiting semantic networks to map natural language into an abstract representation, which eventually supports the identification of the topics addressed in a text source. A heuristic algorithm uses the abstract representation to highlight the relevant segments of text in the original document. The verification of the approach effectiveness involved a publicly available benchmark, the DUC 2002 dataset, and satisfactory results confirmed the method effectiveness

IntechOpen

Archivio istituzionale della ricerca - Università di Genova

Hacking an Ambiguity Detection Tool to Extract Variation Points: an Experience Report

Author: Alessio Ferrari
FANTECHI ALESSANDRO
Laura Semini
Stefania Gnesi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Natural language (NL) requirements documents can be a precious source to identify variability information. This information can be later used to define feature models from which different systems can be instantiated. In this paper, we are interested in validating the approach we have recently proposed to extract variability issues from the ambiguity defects found in NL requirement documents. To this end, we single out ambiguities using an available NL analysis tool, QuARS, and we classify the ambiguities returned by the tool by distinguishing among false positives, real ambiguities, and variation points. We consider three medium sized requirement documents from different domains, namely, train control, social web, home automation. We report in this paper the results of the assessment. Although the validation set is not so large, the results obtained are quite uniform and permit to draw some interesting conclusions. Starting from the results obtained, we can foresee the tailoring of a NL analysis tool for extracting variability from NL requirement documents

Crossref

Archivio della Ricerca - Università di Pisa

Sentiment analysis on UTHM issues with big data

Author: Ahmadon Mohd Anuaruddin
Elmunsyah Hakkun
Mahamad Abd Kadir
Saon Sharifah
Suhaimi Noor Suhaida
Yamaguchi Shingo
Publication venue: 'Penerbit UTHM'
Publication date: 01/01/2020
Field of study

Nowadays, social media platform such as Twitter, WhatsApp, Facebook and it Messenger, as well as Instagram plays a very importance role to the society. Twitter is a micro-blogging platform that is able to provide a remarkable amount of data that can be used in several number of sentiment analysis applications such as predictions, reviews, and elections. Sentiment Analysis is a process of extracting information of issues or specific topic from enormous amount of data and categorizes it into different classes. The main target of this project is to classify Twitter data into sentiments value either positive, neutral or negative on data collected regarding Universiti Tun Hussein Onn Malaysia (UTHM) issues. This sentiment was classified using sentiment classifier, while data is trained on a Naïve Bayes Classifier, on TextBlob Python library. Lastly, results were displayed to the user, through a web application using Jupyter Notebook. This study found out that the percentage for positive, neutral and negative tweets regarding UTHM issues were 74%, 26% and 0% in English tweets, meanwhile 17%, 82% and 1 % of Bahasa Melayu tweets, respectively. Positive and neutral sentiments analysis shows positive perception of the products and services, thus promoting and branding UTHM worldwide

UTHM Institutional Repository

Journals of Universiti Tun Hussein Onn Malaysia (UTHM)

Sentiment Analysis on UTHM Issues with Big Data

Author: Ahmadon Mohd Anuaruddin
Elmunsyah Hakkun
Mahamad Abd Kadir
Saon Sharifah
Suhaimi Noor Suhaida
Yamaguchi Shingo
Publication venue: 'Penerbit UTHM'
Publication date: 25/02/2020
Field of study

Nowadays, social media platform such as Twitter, WhatsApp, Facebook and it Messenger, as well as Instagram plays a very importance role to the society. Twitter is a micro-blogging platform that is able to provide a remarkable amount of data that can be used in several number of sentiment analysis applications such as predictions, reviews, and elections. Sentiment Analysis is a process of extracting information of issues or specific topic from enormous amount of data and categorizes it into different classes. The main target of this project is to classify Twitter data into sentiments value either positive, neutral or negative on data collected regarding Universiti Tun Hussein Onn Malaysia (UTHM) issues. This sentiment was classified using sentiment classifier, while data is trained on a NaÃ¯ve Bayes Classifier, on TextBlob Python library. Lastly, results were displayed to the user, through a web application using Jupyter Notebook. This study found out that the percentage for positive, neutral and negative tweets regarding UTHM issues were 74%, 26% and 0% in English tweets, meanwhile 17%, 82% and 1 % of Bahasa Melayu tweets, respectively. Positive and neutral sentiments analysis shows positive perception of the products and services, thus promoting and branding UTHM worldwide

Journals of Universiti Tun Hussein Onn Malaysia (UTHM)

Understanding Information and Knowledge Sharing in Online Communities: Emerging Research Approaches

Author: Chen Hsin-Liang
Gruzd Anatoliy
Liu Xiaozhong
Meyers Eric
Publication venue: Scholars\u27 Mine
Publication date: 01/10/2012
Field of study

Social media have become an important component of contemporary information ecosystems. People use social media systems, such as Twitter, Facebook, YouTube, and Tumblr to communicate ideas and information needs, seek advice and solve problems, show appreciation and disagreement with a person or issue. These tools facilitate the emergence of communities, often resembling the communities of practice that arise in workplaces and educational institutions, where a common interest, identity and set of norms and structures for communicating develop through interaction. But while it seems easy to suck in data streams from social media to understand online communities, making sense of the vast data sets has been challenging. The issues include not just the tools and methods for extracting and synthesizing large data sets like the Twitter Firehose, they also extend to the ethical and responsible use and reporting of this data for academic and commercial purposes. This panel will focus on methodological approaches and research strategies for the study of social media communities, in particular web 2.0 tools that play an important role in the North American cultural landscape

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine