Search CORE

9 research outputs found

DETEKSI SIMILARITAS ARTIKEL ILMIAH DENGAN TEKNIK PENCOCOKAN STRING BOYER MOORE

Author: Amardeep Amardeep
Publication venue: 'Gunadarma University'
Publication date: 14/09/2020
Field of study

Tindakan plagiarisme sering terjadi khususnya pada proses penulisan baik dalam bentuk artikel ilmiah maupun jurnal. Salah satu kontrol yang dapat dilakukan untuk meminimalisir adanya tindakan plagiarisme adalah dengan melakukan perbandingan kemiripan dokumen dengan menghitung tingkat similaritas. Pada penelitian akan dilakukan analisis terhadap penggunaan algoritma Boyer-Moore dengan teknik String Matching pada dokumen berbentuk jurnal ilmiah. Penelitian ini menggunakan teknik crawling dengan memanfaatkan library beautiful soup dari python pada mesin pencari Google untuk membandingkan dokumen uji berbentuk jurnal ilmiah dengan hasil penelusuran oleh Google agar perbandingan dokumen dapat diperluas sehingga akurasi kemiripan dokumen dapat bertambah. Penelitian ini melakukan pengujian kemiripan dokumen pada jurnal bahasa Indonesia dan bahasa Inggris dalam sebuah jurnal ilmiah dimana proses stemming untuk kedua bahasa dilakukan secara terpisah. Pada deteksi kalimat berbahasa indonesia, proses stemming dilakukan menggunakanstemming Nazief-Adriani dan pada stemming kalimat berbahasa inggris digunakan algoritma Porter. Hasil analisis pencocokak string dengan algoritma Boyer-Moore pada proses bigram dapat memisahkan kata menjadi 2 kelompok kata yang disusun dalam 1 list pada setiap kalimat dan hasil pencariannya telah berhasil dilakukan, skor dan tingkat kemiripan dokumen melalu teknik crawling berhasil menghitung persentase kemiripan sebuah artikel ilmiah.Hasil penelitian ini diharapkan dapat menentukan tingkat similaritas dari dua buah dokumen, sehingga dapat meminimalisir tingkat plagiarisme khususnya pada dokumen berbentuk jurnal ilmiah

Gunadarma University: Ejournal UG

An effective named entity similarity metric for comparing data from multiple sources with varying syntax

Author: Brown Stephen
Coupland Simon
Croft David
Publication venue: 'Oxford University Press (OUP)'
Publication date: 26/08/2016
Field of study

Crossref

Coventry University Pure Portal

FORMING OF THE SEMISTRUCTURED DATA DYNAMIC INTEGRATION MASH-UP SYSTEM CONTENT

Author: Irina KUSHNIRETSKA
Publication venue: Polish Association for Knowledge Promotion
Publication date: 01/01/2015
Field of study

This paper describes the method of forming a united dynamic data set that has the general structure and only content. The procedure of forming the triplets with the structure "subject-predicate-object" with the received input information resources descriptions has been proposed. The formula of calculating the similarity factor of user query with information resource semantic metadata has been presented. The structure of the semistructured data dynamic integration system that used “Mash-Up” technology has been designed

Biblioteka Nauki - repozytorium artykuÅÃ³w

Directory of Open Access Journals

Few-shot entity linking of food names

Author: Bagshaw James
Batista-Navarro Riza
Cheng Zhuyan
Feher Darius
Ibrahim Faridz
Maidment Tom
Schlegel Viktor
Publication venue
Publication date: 01/09/2023
Field of study

Entity linking (EL), the task of automatically matching mentions in text to concepts in a target knowledge base, remains under-explored when it comes to the food domain, despite its many potential applications, e.g., finding the nutritional value of ingredients in databases. In this paper, we describe the creation of new resources supporting the development of EL methods applied to the food domain: the E.Care Knowledge Base (E.Care KB) which contains 664 food concepts and the E.Care dataset, a corpus of 468 cooking recipes where ingredient names have been manually linked to corresponding concepts in the E.Care KB. We developed and evaluated different methods for EL, namely, deep learning-based approaches underpinned by Siamese networks trained under a few-shot learning setting, traditional machine learning-based approaches underpinned by support vector machines (SVMs) and unsupervised approaches based on string matching algorithms. Combining the strengths of each of these approaches, we built a hybrid model for food EL that balances the trade-offs between performance and inference speed. Specifically, our hybrid model obtains 89.40% accuracy and links mentions at an average speed of 0.24 seconds per mention, whereas our best deep learning-based model, SVM model and unsupervised model obtain accuracies of 86.99%, 87.19% and 87.43% at inference speeds of 0.007, 0.66 and 0.02 seconds per mention, respectively

The University of Manchester - Institutional Repository

Geocoding methods and their usage in locating social media posts

Author: Grannabba Hanna
Publication venue
Publication date: 12/06/2017
Field of study

Monet aineistot sisältävät osoitteita ja paikannimiä. Jotta aineistoja voidaan hyödyntää paikkatietoanalyyseissä, ne on georeferoitava eli niille on saatava koordinaatit. Geokoodaus on prosessi, jossa osoitteelle tai paikannimelle pyritään selvittämään sijainti vertailuaineiston avulla. Sosiaalinen media on nykyisin tärkeä osa ihmisten elämää, mikä on synnyttänyt tarpeen useilla eri aloilla pystyä selvittämään, mistä paikasta sosiaalisen median julkaisut on tehty tai mitä paikkaa ne koskevat. Tämän diplomityön tarkoituksena oli tutkia geokoodausta, siihen käytettäviä menetelmiä sekä niiden ominaisuuksia. Työn teoreettisessa osiossa käydään läpi geokoodausprosessin kulku ja miten osoitteiden geokoodaus tapahtuu erilaisia vertailuaineistoja käyttämällä. Sen jälkeen perehdytään twiittien geokoodaukseen ja siihen mitä lisähaasteita se tuottaa verrattuna osoitteiden geokoodaukseen. Työn empiirisessä osiossa toteutettiin työkaluja twiittien geokoodausta varten sekä testattiin niiden toimivuutta käytännössä. Työssä havaittiin, että geokoodaus on monivaiheinen prosessi, jossa on tehtävä kompromisseja tulosten tarkkuuden, kattavuuden ja ajankäytön välillä. Tutkimuksessa verrattiin kolmea erilaista yhdistämisalgoritmia, Levenshtein distance, Longest common subsequence ja n-gram, joilla samankaltaiset merkkijonot voidaan yhdistää toisiinsa. Näistä algoritmeista n-grammeihin perustuva vertailu tuotti tarkimman tuloksen. Suurimmaksi haasteeksi havaittiin paikannimien erottaminen tavallisten sanojen joukosta, eli geoparsing. Monet tavalliset sanat esiintyvät myös paikanniminä joissain päin maailmaa, mikä aiheuttaa virheellisiä paikannuksia, ellei niitä pystytä havaitsemaan.Many documents contain addresses and place names. In order to make spatial analysis for these documents they need to be georeferenced. Which means they need coordinates. Geocoding is a process where addresses and place names are given coordinates based on a reference dataset. Social media is an important part of peoples’ life nowadays and there is an increasing need for knowing where the posts are sent from or what place they refer to. The purpose of this masters’ thesis was to examine geocoding, the methods used for it and their features. The theoretical part of the study presents the geocoding process and how address geocoding is done with different types of reference datasets. In addition, geocoding of tweets is examined, and what additional challenges it does have compared to address geocoding. In the practical part of the study tools for geocoding tweets where implemented and tested in practice. It was noticed that geocoding consists of many phases and it is necessary to make compromises between accuracy, completeness and execution time. Three different feature matching algorithms, Levenshtein distance, Longest common subsequence and n-gram, where tested. With feature matching strings that are approximately similar can be com-bined. Of these three the one based on n-grams gave the most accurate results. The biggest challenge appears to be recognizing place names among all other words, called geoparsing. Several normal words occur as place names on different places in the world. If these can’t be distinguished they will cause false matches in the geocoding results

Aaltodoc Publication Archive

A comparison of string similarity measures for toponym matching

Author: Louwerse M.M.
Recchia G.
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2013
Field of study

A comparison of string similarity measures for toponym matching

Author: Louwerse M.M.
Recchia G.
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2013
Field of study

Tilburg University Repository

Mining social structures from genealogical data

Author: Efremova I.
Publication venue: Technische Universiteit Eindhoven
Publication date: 20/04/2016
Field of study

Pure OAI Repository

Extraktion und Auswertung von Geodaten aus Sozialen Netzwerken als Element der Bürgerbeteiligung in kommunalen Belangen der Hansestadt Rostock

Author: Vettermann Ferdinand (gnd: 1122206429)
Publication venue: Universität Rostock Rostock
Publication date
Field of study

Im Rahmen dieser Arbeit ist eine Methode entwickelt worden, die es sowohl in einer hinsichtlich Twitter nachrichtenarmen Region wie der Hanse- und Universitätsstadt Rostock als auch im deutschsprachigen Raum ermöglicht, Tweets auf einer lokalen Skale zu verorten, sie vordefinierten Themen zuzuordnen und hinsichtlich Stimmungen und Trends zu analysieren. Aus der spatiotemporalen Kombination der Daten lassen sich so direkt Handlungsempfehlungen bei Extremereignissen oder Veranstaltungen ableiten.Within this work, a method has been developed which is able to geolocate tweets on a local scale, assign given topics and analyze them for trends and moods. It is special that this is settled in a German speaking area and a region with a low tweet density, the Hanseatic and university city of Rostock. Through the connection to the analogue world, valuable information can be generated and transferred from twitter. With this application it is possible to derive real time recommendations for action case of social events or extreme events

Rostocker Dokumentenserver