Search CORE

828 research outputs found

Handling uncertainty in information extraction

Author: Habib Mena B.
Keulen Maurice van
Publication venue: CEUR-WS.org
Publication date: 01/01/2011
Field of study

This position paper proposes an interactive approach for developing information extractors based on the ontology definition process with knowledge about possible (in)correctness of annotations. We discuss the problem of managing and manipulating probabilistic dependencies

Maastricht University Research Portal

University of Twente Research Information

Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art

Author: Habib Mena B.
Keulen Maurice van
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2011
Field of study

Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover

Maastricht University Research Portal

University of Twente Research Information

Neogeography: The Challenge of Channelling Large and Ill-Behaved Data Streams

Author: Habib Mena B.
Keulen Maurice van
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2011
Field of study

Neogeography is the combination of user generated data and experiences with mapping technologies. In this article we present a research project to extract valuable structured information with a geographic component from unstructured user generated text in wikis, forums, or SMSes. The extracted information should be integrated together to form a collective knowledge about certain domain. This structured information can be used further to help users from the same domain who want to get information using simple question answering system. The project intends to help workers communities in developing countries to share their knowledge, providing a simple and cheap way to contribute and get benefit using the available communication technology

Maastricht University Research Portal

University of Twente Research Information

Named Entity Extraction and Disambiguation: The Reinforcement Effect.

Author: Habib Mena B.
Keulen Maurice van
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2011
Field of study

Named entity extraction and disambiguation have received much attention in recent years. Typical fields addressing these topics are information retrieval, natural language processing, and semantic web. Although these topics are highly dependent, almost no existing works examine this dependency. It is the aim of this paper to examine the dependency and show how one affects the other, and vice versa. We conducted experiments with a set of descriptions of holiday homes with the aim to extract and disambiguate toponyms as a representative example of named entities. We experimented with three approaches for disambiguation with the purpose to infer the country of the holiday home. We examined how the effectiveness of extraction influences the effectiveness of disambiguation, and reciprocally, how filtering out ambiguous names (an activity that depends on the disambiguation process) improves the effectiveness of extraction. Since this, in turn, may improve the effectiveness of disambiguation again, it shows that extraction and disambiguation may reinforce each other.\u

CiteSeerX

Maastricht University Research Portal

University of Twente Research Information

Unsupervised improvement of named entity extraction in short informal context using disambiguation clues

Author: Habib Mena B.
Keulen Maurice van
Publication venue: CEUR-WS.org
Publication date: 01/01/2012
Field of study

Short context messages (like tweets and SMS’s) are a potentially rich source of continuously and instantly updated information. Shortness and informality of such messages are challenges for Natural Language Processing tasks. Most efforts done in this direction rely on machine learning techniques which are expensive in terms of data collection and training. In this paper we present an unsupervised Semantic Web-driven approach to improve the extraction process by using clues from the disambiguation process. For extraction we used a simple Knowledge-Base matching technique combined with a clustering-based approach for disambiguation. Experimental results on a self-collected set of tweets (as an example of short context messages) show improvement in extraction results when using unsupervised feedback from the disambiguation process

CiteSeerX

Maastricht University Research Portal

University of Twente Research Information

Concept Extraction Challenge: University of Twente at #MSM2013

Author: Habib Mena B.
Keulen Maurice van
Zhu Zhemin
Publication venue: CEUR
Publication date: 01/01/2013
Field of study

Twitter messages are a potentially rich source of continuously and instantly updated information. Shortness and informality of such messages are challenges for Natural Language Processing tasks. In this paper we present a hybrid approach for Named Entity Extraction (NEE) and Classification (NEC) for tweets. The system uses the power of the Conditional Random Fields (CRF) and the Support Vector Machines (SVM) in a hybrid way to achieve better results. For named entity type classification we used AIDA \cite{YosefHBSW11} disambiguation system to disambiguate the extracted named entities and hence find their type

Maastricht University Research Portal

CiteSeerX

University of Twente Research Information

Information extraction for social media

Author: Habib Mena B.
Keulen Maurice van
Publication venue: Association for Computational Linguistics
Publication date: 01/01/2014
Field of study

The rapid growth in IT in the last two decades has led to a growth in the amount of information available online. A new style for sharing information is social media. Social media is a continuously instantly updated source of information. In this position paper, we propose a framework for Information Extraction (IE) from unstructured user generated contents on social media. The framework proposes solutions to overcome the IE challenges in this domain such as the short context, the noisy sparse contents and the uncertain contents. To overcome the challenges facing IE from social media, State-Of-The-Art approaches need to be adapted to suit the nature of social media posts. The key components and aspects of our proposed framework are noisy text filtering, named entity extraction, named entity disambiguation, feedback loops, and uncertainty handling

CiteSeerX

Crossref

University of Twente Research Information

TAUOLA the library for tau lepton decay, and KKMC/KORALB/KORALZ/... status report

Author: Back N.K.T.
Berkhout B.
Boucher C.A.B.
Keulen W.J.
van Wijk A.J.
Publication venue
Publication date: 01/01/1997
Field of study

The status of the Monte Carlo programs for the simulation of the

\tau

lepton production in high energy accelerator experiments and decay is reviewed. In particular, the status of the following packages is discussed: (i) TAUOLA for tau-lepton decay, (ii) PHOTOS for radiative corrections in decays, (iii) KORALB, KORALZ, KKMC packages for tau-pair production in e+e- collisions and (iv) universal interface of TAUOLA for the decay of tau-leptons produced by``any'' generator. Special emphasis on requirements from new and future experiments is given. Some considerations about the software organization necessary to keep simultaneously distinct physics initializations for TAUOLA are also included.Comment: latex 7 pages, including 1 table and 5 figure files, all 6 in postscript format. Presented on 'Sixth international workshop on tau lepton physics', Victoria Canada, September 200

arXiv.org e-Print Archive

Crossref

PubMed Central

CERN Document Server

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Named Entity Extraction and Linking Challenge: University of Twente at #Microposts2014

Author: Habib Mena B.
Keulen Maurice van
Zhu Zhemin
Publication venue: CEUR-WS.org
Publication date: 01/01/2014
Field of study

Twitter is a potentially rich source of continuously and instantly updated information. Shortness and informality of tweets are challenges for Natural Language Processing (NLP) tasks. In this paper, we present a hybrid approach for Named Entity Extraction (NEE)and Linking (NEL) for tweets. Although NEE and NEL are two topics that are well studied in literature, almost all approaches treated the two problems separately. We believe that disambiguation (linking) could help improving the extraction process. We call this potential for mutual improvement, the reinforcement effect. It mimics the way humans understand natural language. Furthermore, our proposed approaches handles uncertainties involved in the two processes by considering possible alternatives

CiteSeerX

Maastricht University Research Portal

University of Twente Research Information

Toponym extraction and disambiguation enhancement using loops of feedback

Author: Habib Mena B.
Keulen Maurice van
Publication venue: Springer
Publication date: 01/01/2013
Field of study

Toponym extraction and disambiguation have received much attention in recent years. Typical fields addressing these topics are information retrieval, natural language processing, and semantic web. This paper addresses two problems with toponym extraction and disambiguation. First, almost no existing works examine the extraction and disambiguation interdependency. Second, existing disambiguation techniques mostly take as input extracted named entities without considering the uncertainty and imperfection of the extraction process. In this paper we aim to investigate both avenues and to show that explicit handling of the uncertainty of annotation has much potential for making both extraction and disambiguation more robust. We conducted experiments with a set of holiday home descriptions with the aim to extract and disambiguate toponyms. We show that the extraction confidence probabilities are useful in enhancing the effectiveness of disambiguation. Reciprocally, retraining the extraction models with information automatically derived from the disambiguation results, improves the extraction models. This mutual reinforcement is shown to even have an effect after several automatic iterations

Maastricht University Research Portal

Crossref

University of Twente Research Information