943 research outputs found

    Importance and significance of information sharing in terrorism field

    Get PDF
    Years after the 11th Sept 2001 have led in researchers to re-structure intelligence and counter terrorism in technology information to overcome problems and issues related to terrorism. This work provides an updated research of Information and Communication technology (ICT) related to re-structuring of intelligence and counter-terrorism. For this purpose, the objectives of this work is to conduct a survey on the conceptual view of the researchers who developed tools for electronic information sharing employed in intelligence and counterterrorism and summary of their works in this emerging field. The work discusses the different visions and views of information sharing, critical infrastructure, tools and key resources discussed by the researchers. It also shows some of the experiences in countries considered as international reference on the subject, including some information-sharing issues. In addition, the work carries out a review of current tools, software applications and modelling techniques around anti-terrorism in accordance with their functionality in information sharing tools. The work emphasises on identifying the various counter terrorism related works that have direct relevance to information transportation researches and advocating security informatics studies that are closely integrated with transportation research and information technologies related to the recommendations of the 9/11 commission report in 2004. The importance of this study is that it gives a unified view of the existing approaches of electronic information sharing in order to help developing tools used in intelligence and counter terrorism for future coordination and collaboration in national security applications

    Pembentukan Tesaurus pada Cross-Lingual Text dengan Pendekatan Constraint Satisfaction Problem

    Get PDF
    Dokumen tugas akhir dan tesis sering kali disediakan dalam dua bahasa, yaitu bahasa Indonesia dan Inggris. Dalam pencarian, setiap mahasiswa memiliki kecenderungan mencari dokumen dengan menggunakan kata kunci dengan bahasa tertentu. Tujuan dari penelitian ini adalah untuk membangun cross-lingual tesaurus bahasa Indonesia dan bahasa Inggris dengan pendekatan Constraint Satisfaction Problem. Dalam penelitian ini digunakan data Tugas Akhir serta Tesis mahasiswa Institut Teknologi Sepuluh Nopember. Pada pengolahan dokumen dilakukan beberapa langkah yaitu pembentukan pararell corpus, ekstraksi kata, pembobotan kata, dan pembentukan informasi co-occurrence, yang selanjutnya dilakukan Constraint Satisfaction Problem dengan backtracking sebagai solusi pencarian. Pembobotan menggunakan TF-IDF (term frequency–inverse document frequency) Hasil dari proses pembangunan tesaurus, tesaurus yang dibentuk dengan menggunakan CSP menghasilkan precision 91,38% sedangkan tesaurus yang dibentuk tanpa menggunakan CSP menghasilkan precision 45,23%. Pencarian dokumen menggunakan tesaurus menghasilkan recall 86,67%,  precision 100% dan akurasi 86,67%

    Category tree integration by exploiting hierarchical structure.

    Get PDF
    Lin, Jianfeng.Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.Includes bibliographical references (leaves 79-83).Abstracts in English and Chinese.Abstract --- p.i内容摘要 --- p.iiAcknowledgement --- p.iiiTable of Contents --- p.ivList of Figures --- p.viList of Tables --- p.viiChapter Chapter 1. --- Introduction --- p.1Chapter Chapter 2. --- Related Work --- p.6Chapter 2.1. --- Ontology Integration --- p.7Chapter 2.2. --- Schema Matching --- p.10Chapter 2.3. --- Taxonomy Integration as Text Categorization --- p.13Chapter 2.4. --- Cross-lingual Text Categorization & Cross-lingual Information Retrieval --- p.15Chapter Chapter 3. --- Problem Definition --- p.17Chapter 3.1. --- Mono-lingual Category Tree Integration --- p.17Chapter 3.2. --- Integration Operators --- p.19Chapter 3.3. --- Cross-lingual Category Tree Integration --- p.21Chapter Chapter 4. --- Mono-lingual Category Tree Integration Techniques --- p.23Chapter 4.1. --- Category Relationships --- p.23Chapter 4.2. --- Decision Rules --- p.27Chapter 4.3. --- Mapping Algorithm --- p.38Chapter Chapter 5. --- Experiment of Mono-lingual Category Tree Integration --- p.42Chapter 5.1. --- Dataset --- p.42Chapter 5.2. --- Automated Text Classifier --- p.43Chapter 5.3. --- Evaluation Metrics --- p.46Chapter 5.3.1. --- Integration Accuracy --- p.47Chapter 5.3.2. --- Precision and Recall and F1 value of the Three Operators --- p.48Chapter 5.3.3. --- "Precision and Recalls of ""Split""" --- p.48Chapter 5.4. --- Parameter Turning --- p.49Chapter 5.5. --- Experiments Results --- p.55Chapter Chapter 6. --- Cross-lingual Category Tree Integration --- p.60Chapter 6.1. --- Parallel Corpus --- p.61Chapter 6.2. --- Cross-lingual Concept Space Construction --- p.65Chapter 6.2.1. --- Phase Extraction --- p.65Chapter 6.2.2. --- Co-occurrence analysis --- p.65Chapter 6.2.3. --- Associate Constraint Network for Concept Generation --- p.67Chapter 6.3. --- Document Translation --- p.69Chapter 6.4. --- Experiment Setting --- p.72Chapter 6.5. --- Experiment Results --- p.73Chapter Chapter 7. --- Conclusion and Future Work --- p.77Reference --- p.7

    A history and theory of textual event detection and recognition

    Get PDF

    Statistical Extraction of Multilingual Natural Language Patterns for RDF Predicates: Algorithms and Applications

    Get PDF
    The Data Web has undergone a tremendous growth period. It currently consists of more then 3300 publicly available knowledge bases describing millions of resources from various domains, such as life sciences, government or geography, with over 89 billion facts. In the same way, the Document Web grew to the state where approximately 4.55 billion websites exist, 300 million photos are uploaded on Facebook as well as 3.5 billion Google searches are performed on average every day. However, there is a gap between the Document Web and the Data Web, since for example knowledge bases available on the Data Web are most commonly extracted from structured or semi-structured sources, but the majority of information available on the Web is contained in unstructured sources such as news articles, blog post, photos, forum discussions, etc. As a result, data on the Data Web not only misses a significant fragment of information but also suffers from a lack of actuality since typical extraction methods are time-consuming and can only be carried out periodically. Furthermore, provenance information is rarely taken into consideration and therefore gets lost in the transformation process. In addition, users are accustomed to entering keyword queries to satisfy their information needs. With the availability of machine-readable knowledge bases, lay users could be empowered to issue more specific questions and get more precise answers. In this thesis, we address the problem of Relation Extraction, one of the key challenges pertaining to closing the gap between the Document Web and the Data Web by four means. First, we present a distant supervision approach that allows finding multilingual natural language representations of formal relations already contained in the Data Web. We use these natural language representations to find sentences on the Document Web that contain unseen instances of this relation between two entities. Second, we address the problem of data actuality by presenting a real-time data stream RDF extraction framework and utilize this framework to extract RDF from RSS news feeds. Third, we present a novel fact validation algorithm, based on natural language representations, able to not only verify or falsify a given triple, but also to find trustworthy sources for it on the Web and estimating a time scope in which the triple holds true. The features used by this algorithm to determine if a website is indeed trustworthy are used as provenance information and therewith help to create metadata for facts in the Data Web. Finally, we present a question answering system that uses the natural language representations to map natural language question to formal SPARQL queries, allowing lay users to make use of the large amounts of data available on the Data Web to satisfy their information need

    Pembentukan Tesaurus pada Cross-lingual Text dengan Pendekatan Constraint Satisfaction Problem

    Get PDF
    Pencarian dokumen adalah hal yang esensial dalam bidang text mining. Dokumen tugas akhir dan tesis sering kali disediakan dalam dua bahasa, yaitu bahasa Indonesia dan Inggris. Dalam pencarian, setiap mahasiswa memiliki kecenderungan mencari dokumen dengan menggunakan kata kunci dengan bahasa tertentu. Tujuan dari pembuatan Tugas Akhir ini adalah untuk membangun cross-lingual tesaurus bahasa Indonesia dan bahasa Inggris dengan pendekatan Constraint Satisfaction Problem. Dalam penelitian ini digunakan data Tugas Akhir serta Tesis mahasiswa FTIF, Jaringan Cerdas Multimedia, Statistika dan Teknik Multimedia Jaringan di Institut Teknologi Sepuluh Nopember. Pada pengolahan dokumen dilakukan beberapa langkah yaitu document alignment, ekstraksi kata, pembobotan kata, dan pembentukan informasi co-occurrence, yang selanjutnya dilakukan Constraint Satisfaction Problem dengan backtracking sebagai solusi pencarian yang merupakan perbaikan dari metode bruteforce. Pembobotan menggunakan TF-IDF (term frequency – inverse document frequency) Hasil dari proses pembangunan tesaurus, pada proses document alignment membutuhkan waktu terlama dalam pembangunan tesaurus, yaitu 10.745 detik, sedangkan yang tercepat adalah proses Penghitungan Relevance Weight dengan waktu 10 detik. Tesaurus yang dibentuk dengan menggunakan CSP menghasilkan precision 91,38% sedangkan tesaurus yang dibentuk tanpa menggunakan CSP menghasilkan precision 45,23%. Pencarian dokumen menggunakan tesaurus menghasilkan recall 86,67% precision 100% dan akurasi 86,67%. ======================================================================================================================== Document search is essential in text mining. The final project and thesis document are often provided in two languages, there are Indonesian and English. To search document, each student has a tendency to search for documents by using keywords in a particular language. The purpose of this Final Project is to build cross-lingual thesaurus of Indonesian and English with approach of Constraint Satisfaction Problem. In this research used final project and thesis document FTIF, Multimedia and Network, Statistics and Multimedia Network Technique departement at Sepuluh Nopember Institute of Technology. In the document processing, there are several steps, there are word extraction, document alignment, weighting, and co-occurrence, which is then performed by the Constraint Satisfaction Problem with backtracking as its search solution which is an improvement of bruteforce method. Weighting using TF-IDF (term frequency - inverse document frequency) The result of the development process thesaurus on document alignment process takes the longest time in the development of thesaurus, which is 10.745 seconds, while the fastest is the process of Calculating Relevance Weight with time 10 seconds. The thesaurus formed by using CSP produces 91.38% precision whereas the thesaurus formed without using CSP produces precision 45.23%. The search document uses a thesaurus yielding 86,67% precision 100% recall and 86.67% accuracy

    Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

    Get PDF
    Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

    Evaluating the Fairness of Discriminative Foundation Models in Computer Vision

    Full text link
    We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP), that are used for labeling tasks. We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy. Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning. We categorize desired behaviors based around three axes: (i) if the task concerns humans; (ii) how subjective the task is (i.e., how likely it is that people from a diverse range of backgrounds would agree on a labeling); and (iii) the intended purpose of the task and if fairness is better served by impartiality (i.e., making decisions independent of the protected attributes) or representation (i.e., making decisions to maximize diversity). Finally, we provide quantitative fairness evaluations for both binary-valued and multi-valued protected attributes over ten diverse datasets. We find that fair PCA, a post-processing method for fair representations, works very well for debiasing in most of the aforementioned tasks while incurring only minor loss of performance. However, different debiasing approaches vary in their effectiveness depending on the task. Hence, one should choose the debiasing approach depending on the specific use case.Comment: Accepted at AIES'2
    corecore