Search CORE

943 research outputs found

Importance and significance of information sharing in terrorism field

Author: Abaas Thamer
Shibghatullah Abdul Samad
Yusof R.
Publication venue: Publications International
Publication date: 15/10/2014
Field of study

Years after the 11th Sept 2001 have led in researchers to re-structure intelligence and counter terrorism in technology information to overcome problems and issues related to terrorism. This work provides an updated research of Information and Communication technology (ICT) related to re-structuring of intelligence and counter-terrorism. For this purpose, the objectives of this work is to conduct a survey on the conceptual view of the researchers who developed tools for electronic information sharing employed in intelligence and counterterrorism and summary of their works in this emerging field. The work discusses the different visions and views of information sharing, critical infrastructure, tools and key resources discussed by the researchers. It also shows some of the experiences in countries considered as international reference on the subject, including some information-sharing issues. In addition, the work carries out a review of current tools, software applications and modelling techniques around anti-terrorism in accordance with their functionality in information sharing tools. The work emphasises on identifying the various counter terrorism related works that have direct relevance to information transportation researches and advocating security informatics studies that are closely integrated with transportation research and information technologies related to the recommendations of the 9/11 commission report in 2004. The importance of this study is that it gives a unified view of the existing approaches of electronic information sharing in order to help developing tools used in intelligence and counter terrorism for future coordination and collaboration in national security applications

Universiti Teknikal Malaysia Melaka (UTeM) Repository

Pembentukan Tesaurus pada Cross-Lingual Text dengan Pendekatan Constraint Satisfaction Problem

Author: Fatichah Chastine
Purwitasari Diana
Rizqi Umy
Publication venue: 'Lembaga Penelitian dan Pengabdian kepada Masyarakat ITS'
Publication date: 31/12/2017
Field of study

Dokumen tugas akhir dan tesis sering kali disediakan dalam dua bahasa, yaitu bahasa Indonesia dan Inggris. Dalam pencarian, setiap mahasiswa memiliki kecenderungan mencari dokumen dengan menggunakan kata kunci dengan bahasa tertentu. Tujuan dari penelitian ini adalah untuk membangun cross-lingual tesaurus bahasa Indonesia dan bahasa Inggris dengan pendekatan Constraint Satisfaction Problem. Dalam penelitian ini digunakan data Tugas Akhir serta Tesis mahasiswa Institut Teknologi Sepuluh Nopember. Pada pengolahan dokumen dilakukan beberapa langkah yaitu pembentukan pararell corpus, ekstraksi kata, pembobotan kata, dan pembentukan informasi co-occurrence, yang selanjutnya dilakukan Constraint Satisfaction Problem dengan backtracking sebagai solusi pencarian. Pembobotan menggunakan TF-IDF (term frequency–inverse document frequency) Hasil dari proses pembangunan tesaurus, tesaurus yang dibentuk dengan menggunakan CSP menghasilkan precision 91,38% sedangkan tesaurus yang dibentuk tanpa menggunakan CSP menghasilkan precision 45,23%. Pencarian dokumen menggunakan tesaurus menghasilkan recall 86,67%, precision 100% dan akurasi 86,67%

Jurnal Teknik ITS

Institut Teknologi Sepuluh Nopember (ITS): Publikasi Ilmiah Online Mahasiswa ITS (POMITS)

Category tree integration by exploiting hierarchical structure.

Author
Publication venue
Publication date: 01/01/2007
Field of study

Lin, Jianfeng.Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.Includes bibliographical references (leaves 79-83).Abstracts in English and Chinese.Abstract --- p.i内容摘要 --- p.iiAcknowledgement --- p.iiiTable of Contents --- p.ivList of Figures --- p.viList of Tables --- p.viiChapter Chapter 1. --- Introduction --- p.1Chapter Chapter 2. --- Related Work --- p.6Chapter 2.1. --- Ontology Integration --- p.7Chapter 2.2. --- Schema Matching --- p.10Chapter 2.3. --- Taxonomy Integration as Text Categorization --- p.13Chapter 2.4. --- Cross-lingual Text Categorization & Cross-lingual Information Retrieval --- p.15Chapter Chapter 3. --- Problem Definition --- p.17Chapter 3.1. --- Mono-lingual Category Tree Integration --- p.17Chapter 3.2. --- Integration Operators --- p.19Chapter 3.3. --- Cross-lingual Category Tree Integration --- p.21Chapter Chapter 4. --- Mono-lingual Category Tree Integration Techniques --- p.23Chapter 4.1. --- Category Relationships --- p.23Chapter 4.2. --- Decision Rules --- p.27Chapter 4.3. --- Mapping Algorithm --- p.38Chapter Chapter 5. --- Experiment of Mono-lingual Category Tree Integration --- p.42Chapter 5.1. --- Dataset --- p.42Chapter 5.2. --- Automated Text Classifier --- p.43Chapter 5.3. --- Evaluation Metrics --- p.46Chapter 5.3.1. --- Integration Accuracy --- p.47Chapter 5.3.2. --- Precision and Recall and F1 value of the Three Operators --- p.48Chapter 5.3.3. --- "Precision and Recalls of ""Split""" --- p.48Chapter 5.4. --- Parameter Turning --- p.49Chapter 5.5. --- Experiments Results --- p.55Chapter Chapter 6. --- Cross-lingual Category Tree Integration --- p.60Chapter 6.1. --- Parallel Corpus --- p.61Chapter 6.2. --- Cross-lingual Concept Space Construction --- p.65Chapter 6.2.1. --- Phase Extraction --- p.65Chapter 6.2.2. --- Co-occurrence analysis --- p.65Chapter 6.2.3. --- Associate Constraint Network for Concept Generation --- p.67Chapter 6.3. --- Document Translation --- p.69Chapter 6.4. --- Experiment Setting --- p.72Chapter 6.5. --- Experiment Results --- p.73Chapter Chapter 7. --- Conclusion and Future Work --- p.77Reference --- p.7

CUHK Digital Repository

A history and theory of textual event detection and recognition

Author: Chen Yanping
Ding Zehua
Huang Ruizhang
Qin Yongbin
Shah Nazaraf
Zheng Qinghua
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/11/2020
Field of study

Coventry University Pure Portal

Recommended from our members

Acquiring and Harnessing Verb Knowledge for Multilingual Natural Language Processing

Author: Majewska Olga
Publication venue: University of Cambridge
Publication date: 01/02/2021
Field of study

Advances in representation learning have enabled natural language processing models to derive non-negligible linguistic information directly from text corpora in an unsupervised fashion. However, this signal is underused in downstream tasks, where they tend to fall back on superficial cues and heuristics to solve the problem at hand. Further progress relies on identifying and filling the gaps in linguistic knowledge captured in their parameters. The objective of this thesis is to address these challenges focusing on the issues of resource scarcity, interpretability, and lexical knowledge injection, with an emphasis on the category of verbs. To this end, I propose a novel paradigm for efficient acquisition of lexical knowledge leveraging native speakers’ intuitions about verb meaning to support development and downstream performance of NLP models across languages. First, I investigate the potential of acquiring semantic verb classes from non-experts through manual clustering. This subsequently informs the development of a two-phase semantic dataset creation methodology, which combines semantic clustering with fine-grained semantic similarity judgments collected through spatial arrangements of lexical stimuli. The method is tested on English and then applied to a typologically diverse sample of languages to produce the first large-scale multilingual verb dataset of this kind. I demonstrate its utility as a diagnostic tool by carrying out a comprehensive evaluation of state-of-the-art NLP models, probing representation quality across languages and domains of verb meaning, and shedding light on their deficiencies. Subsequently, I directly address these shortcomings by injecting lexical knowledge into large pretrained language models. I demonstrate that external manually curated information about verbs’ lexical properties can support data-driven models in tasks where accurate verb processing is key. Moreover, I examine the potential of extending these benefits from resource-rich to resource-poor languages through translation-based transfer. The results emphasise the usefulness of human-generated lexical knowledge in supporting NLP models and suggest that time-efficient construction of lexicons similar to those developed in this work, especially in under-resourced languages, can play an important role in boosting their linguistic capacity.ESRC Doctoral Fellowship [ES/J500033/1], ERC Consolidator Grant LEXICAL [648909

Apollo (Cambridge)

Statistical Extraction of Multilingual Natural Language Patterns for RDF Predicates: Algorithms and Applications

Author: Gerber Daniel
Publication venue
Publication date: 07/06/2016
Field of study

The Data Web has undergone a tremendous growth period. It currently consists of more then 3300 publicly available knowledge bases describing millions of resources from various domains, such as life sciences, government or geography, with over 89 billion facts. In the same way, the Document Web grew to the state where approximately 4.55 billion websites exist, 300 million photos are uploaded on Facebook as well as 3.5 billion Google searches are performed on average every day. However, there is a gap between the Document Web and the Data Web, since for example knowledge bases available on the Data Web are most commonly extracted from structured or semi-structured sources, but the majority of information available on the Web is contained in unstructured sources such as news articles, blog post, photos, forum discussions, etc. As a result, data on the Data Web not only misses a significant fragment of information but also suffers from a lack of actuality since typical extraction methods are time-consuming and can only be carried out periodically. Furthermore, provenance information is rarely taken into consideration and therefore gets lost in the transformation process. In addition, users are accustomed to entering keyword queries to satisfy their information needs. With the availability of machine-readable knowledge bases, lay users could be empowered to issue more specific questions and get more precise answers. In this thesis, we address the problem of Relation Extraction, one of the key challenges pertaining to closing the gap between the Document Web and the Data Web by four means. First, we present a distant supervision approach that allows finding multilingual natural language representations of formal relations already contained in the Data Web. We use these natural language representations to find sentences on the Document Web that contain unseen instances of this relation between two entities. Second, we address the problem of data actuality by presenting a real-time data stream RDF extraction framework and utilize this framework to extract RDF from RSS news feeds. Third, we present a novel fact validation algorithm, based on natural language representations, able to not only verify or falsify a given triple, but also to find trustworthy sources for it on the Web and estimating a time scope in which the triple holds true. The features used by this algorithm to determine if a website is indeed trustworthy are used as provenance information and therewith help to create metadata for facts in the Data Web. Finally, we present a question answering system that uses the natural language representations to map natural language question to formal SPARQL queries, allowing lay users to make use of the large amounts of data available on the Data Web to satisfy their information need

Qucosa - Publikationsserver der Universität Leipzig

Pembentukan Tesaurus pada Cross-lingual Text dengan Pendekatan Constraint Satisfaction Problem

Author: Rizqi Umy Chasanah Noor
Publication venue
Publication date: 25/07/2017
Field of study

Pencarian dokumen adalah hal yang esensial dalam bidang text mining. Dokumen tugas akhir dan tesis sering kali disediakan dalam dua bahasa, yaitu bahasa Indonesia dan Inggris. Dalam pencarian, setiap mahasiswa memiliki kecenderungan mencari dokumen dengan menggunakan kata kunci dengan bahasa tertentu. Tujuan dari pembuatan Tugas Akhir ini adalah untuk membangun cross-lingual tesaurus bahasa Indonesia dan bahasa Inggris dengan pendekatan Constraint Satisfaction Problem. Dalam penelitian ini digunakan data Tugas Akhir serta Tesis mahasiswa FTIF, Jaringan Cerdas Multimedia, Statistika dan Teknik Multimedia Jaringan di Institut Teknologi Sepuluh Nopember. Pada pengolahan dokumen dilakukan beberapa langkah yaitu document alignment, ekstraksi kata, pembobotan kata, dan pembentukan informasi co-occurrence, yang selanjutnya dilakukan Constraint Satisfaction Problem dengan backtracking sebagai solusi pencarian yang merupakan perbaikan dari metode bruteforce. Pembobotan menggunakan TF-IDF (term frequency – inverse document frequency) Hasil dari proses pembangunan tesaurus, pada proses document alignment membutuhkan waktu terlama dalam pembangunan tesaurus, yaitu 10.745 detik, sedangkan yang tercepat adalah proses Penghitungan Relevance Weight dengan waktu 10 detik. Tesaurus yang dibentuk dengan menggunakan CSP menghasilkan precision 91,38% sedangkan tesaurus yang dibentuk tanpa menggunakan CSP menghasilkan precision 45,23%. Pencarian dokumen menggunakan tesaurus menghasilkan recall 86,67% precision 100% dan akurasi 86,67%. ======================================================================================================================== Document search is essential in text mining. The final project and thesis document are often provided in two languages, there are Indonesian and English. To search document, each student has a tendency to search for documents by using keywords in a particular language. The purpose of this Final Project is to build cross-lingual thesaurus of Indonesian and English with approach of Constraint Satisfaction Problem. In this research used final project and thesis document FTIF, Multimedia and Network, Statistics and Multimedia Network Technique departement at Sepuluh Nopember Institute of Technology. In the document processing, there are several steps, there are word extraction, document alignment, weighting, and co-occurrence, which is then performed by the Constraint Satisfaction Problem with backtracking as its search solution which is an improvement of bruteforce method. Weighting using TF-IDF (term frequency - inverse document frequency) The result of the development process thesaurus on document alignment process takes the longest time in the development of thesaurus, which is 10.745 seconds, while the fastest is the process of Calculating Relevance Weight with time 10 seconds. The thesaurus formed by using CSP produces 91.38% precision whereas the thesaurus formed without using CSP produces precision 45.23%. The search document uses a thesaurus yielding 86,67% precision 100% recall and 86.67% accuracy

ITS Repository

Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

Author: EHRMANN MAUD
TURCHI MARCO
Publication venue: Centro de Innovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, Mexico
Publication date: 09/08/2011
Field of study

Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

JRC Publications Repository

Character Degradation Model and HMM Word Recognition System for Text Extracted from Maps

Author: Aria Pezeshk
Richard L. Tutwiler
Publication venue: 'IntechOpen'
Publication date: 21/10/2011
Field of study

IntechOpen

Evaluating the Fairness of Discriminative Foundation Models in Computer Vision

Author: Ali Junaid
Budhathoki Kailash
Cevher Volkan
Kleindessner Matthaeus
Russell Chris
Wenzel Florian
Publication venue
Publication date: 18/10/2023
Field of study

We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP), that are used for labeling tasks. We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy. Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning. We categorize desired behaviors based around three axes: (i) if the task concerns humans; (ii) how subjective the task is (i.e., how likely it is that people from a diverse range of backgrounds would agree on a labeling); and (iii) the intended purpose of the task and if fairness is better served by impartiality (i.e., making decisions independent of the protected attributes) or representation (i.e., making decisions to maximize diversity). Finally, we provide quantitative fairness evaluations for both binary-valued and multi-valued protected attributes over ten diverse datasets. We find that fair PCA, a post-processing method for fair representations, works very well for debiasing in most of the aforementioned tasks while incurring only minor loss of performance. However, different debiasing approaches vary in their effectiveness depending on the task. Hence, one should choose the debiasing approach depending on the specific use case.Comment: Accepted at AIES'2

arXiv.org e-Print Archive