Search CORE

21 research outputs found

Web classification using Support Vector Machine

Author: LIM Ee Peng
SUN Aixin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2002
Field of study

Institutional Knowledge at Singapore Management University

PENGGUNAAN METODE SUPPORT VECTOR MACHINE UNTUK MENGKLASIFIKASI DAN MEMPREDIKSI ANGKUTAN UDARA JENIS PENERBANGAN DOMESTIK DAN PENERBANGAN INTERNASIONAL DI BANDA ACEH

Author: Burhanuddin Burhanuddin
Fachrurrazi Sayed
Publication venue: 'LPPM Universitas Malikussaleh'
Publication date: 02/11/2018
Field of study

Penelitian ini menyajikan analisis performansi Support Vector Machine(SVM) dengan 11 variabel bebas dan 1 variabel terikat. Metode SVMdengan data training (75%) dan data testing (25%) yang digunakan pada pengklasifikasian data Penerbangan domestic dan data penerbanganinternasional untuk menemukan hyperplane terbaik yang memisahkandua buah kelas. Hasilnya terdapat 4 support vector memberikan informasiyang dibutuhkan untuk menyakinkan bahwa metode SVM bias sebagaiclassifier dan dapat memprediksi keakuratan model dengan menggunakankurva Receiver Operating Characteristic (ROC) untuk melihat akurasi modelterbaik. mencapai 84,31%. Kata Kunci: Klasifikasi, Metode Support Vector Machine (SVM), Receiver Operating Characteristic (ROC)

Open Journal Unimal (e-Jurnal Universitas Malikussaleh)

A Classifier to Detect Profit and Non Profit Websites Upon Textual Metrics for Security Purposes

Author: Alsinglawi Belal
Darweesh Dirar
Darwish Omar
Obeidat Rasha
Tashtoush Yahya
Publication venue: LPPM ITBis Lembah Dempo
Publication date: 17/05/2022
Field of study

Currently, most organizations have a defense system to protect their digital communication network against cyberattacks. However, these defense systems deal with all network traffic regardless if it is from profit or non-profit websites. This leads to enforcing more security policies, which negatively affects network speed. Since most dangerous cyberattacks are aimed at commercial websites, because they contain more critical data such as credit card numbers, it is better to set up the defense system priorities towards actual attacks that come from profit websites. This study evaluated the effect of textual website metrics in determining the type of website as profit or nonprofit for security purposes. Classifiers were built to predict the type of website as profit or non-profit by applying machine learning techniques on a dataset. The corpus used for this research included profit and non-profit websites. Both traditional and deep machine learning techniques were applied. The results showed that J48 performed best in terms of accuracy according to its outcomes in all cases. The newly built models can be a significant tool for defense systems of organizations, as they will help them to implement the necessary security policies associated with attacks that come from both profit and non-profit websites. This will have a positive impact on the security and efficiency of the network

Journal of ICT Research and Applications

ITB Journal

Intelligent Fusion of Structural and Citation-Based Evidence for Text Classification

Author: Calado Pavel
Chen Yuxin
Cristo Marco
Fan Weiguo
Fox Edward A.
Goncalves Marcos Andre
Zhang Baoping
Publication venue
Publication date: 01/01/2004
Field of study

This paper investigates how citation-based information and structural content (e.g., title, abstract) can be combined to improve classification of text documents into predefined categories. We evaluate different measures of similarity, five derived from the citation structure of the collection, and three measures derived from the structural content, and determine how they can be fused to improve classification effectiveness. To discover the best fusion framework, we apply Genetic Programming (GP) techniques. Our empirical experiments using documents from the ACM digital library and the ACM classification scheme show that we can discover similarity functions that work better than any evidence in isolation and whose combined performance through a simple majority voting is comparable to that of Support Vector Machine classifiers

Computer Science Technical Reports @Virginia Tech

Information Gathering and Classification for Collaborative Logistics Decision Making

Author: Alfaro Rodrigo
Ceroni José
Publication venue: 'IntechOpen'
Publication date: 29/08/2011
Field of study

IntechOpen

Crossref

An unsupervised perplexity-based method for boilerplate removal

Author: Fernández Pichel Marcos
Gamallo Otero Pablo
Losada Carril David Enrique
Pichel Campos Juan Carlos
Prada Corral Manuel de
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2023
Field of study

The availability of large web-based corpora has led to significant advances in a wide range of technologies, including massive retrieval systems or deep neural networks. However, leveraging this data is challenging, since web content is plagued by the so-called boilerplate: ads, incomplete or noisy text and rests of the navigation structure, such as menus or navigation bars. In this work, we present a novel and efficient approach to extract useful and well-formed content from web-scraped data. Our approach takes advantage of Language Models and their implicit knowledge about correctly formed text, and we demonstrate here that perplexity is a valuable artefact that can contribute in terms of effectiveness and efficiency. As a matter of fact, the removal of noisy parts leads to lighter AI or search solutions that are effective and entail important reductions in resources spent. We exemplify here the usefulness of our method with two downstream tasks, search and classification, and a cleaning task. We also provide a Python package with pre-trained models and a web demo demonstrating the capabilities of our approachS

Repositorio Institucional da Universidade de Santiago de Compostela

Possibilistic clustering for crisis prediction: Systemic risk states and membership degrees

Author: Mezei József
Sarlin Peter
Publication venue: Arcada University of Applied Sciences
Publication date: 01/01/2015
Field of study

Helsingin yliopiston digitaalinen arkisto

Web Unit Mining: Finding and classifying subgraphs of web pages

Author: LIM Ee Peng
SUN Aixin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

In web classification, most researchers assume that the ob-jects to classify are individual web pages from one or more web sites. In practice, the assumption is too restrictive since a web page itself may not always correspond to a concept instance of some semantic concept (or category) given to the classification task. In this paper, we want to relax this as-sumption and allow a concept instance to be represented by a subgraph of web pages or a set of web pages. We identify several new issues to be addressed when the assumption is removed, and formulate theweb unit mining problem. We also propose an iterative web unit mining (iWUM) method that first finds subgraphs of web pages using some knowledge about web site structure. From these web subgraphs, web units are constructed and classified into semantic concepts (or categories) in an iterative manner. Our experiments us-ing the WebKB dataset showed that iWUM improves the overall classification performance and works very well on the more structured parts of a web site

CiteSeerX

Institutional Knowledge at Singapore Management University

Detección de discurso de odio online utilizando Machine Learning

Author: Shepherd Arévalo Ela Katherine
Publication venue
Publication date: 16/09/2022
Field of study

Trabajo de Fin de Grado en Ingeniería informática, Facultad de Informática UCM, Departamento de Ingeniería del Software e Inteligencia Artificial, Curso 2021/2022. Enlace al repositorio público del proyecto: https://github.com/NILGroup/TFG-2122HateSpeechDetectionHate speech directed towards marginalized people is a very common problem online, especially in social media such as Twitter or Reddit. Automatically detecting hate speech in such spaces can help mend the Internet and transform it into a safer environment for everybody. Hate speech detection fits into text classification, a series of tasks where text is organized into categories. This project2 proposes using Machine Learning algorithms to detect hate speech in online text in four languages: English, Spanish, Italian and Portuguese. The data to train the models was obtained from online, publicly available datasets. Three different algorithms with varying parameters have been used in order to compare their performance. The experiments show that the best results reach an 82.51% accuracy and around an 83% F1-score, for Italian text. Each language has different results depending on distinct factors.El discurso de odio dirigido a personas marginadas es un problema muy común en línea, especialmente en redes sociales como Twitter o Reddit. La detección automática del discurso de odio en dichos espacios puede ayudar a reparar Internet y a transformarlo en un entorno más seguro para todos. La detección del discurso de odio encaja en la clasificación de texto, donde se organiza en categorías. Este proyecto1 propone el uso de algoritmos de Machine Learning para localizar discurso de odio en textos online en cuatro idiomas: inglés, español, italiano y portugués. Los datos para entrenar los modelos se obtuvieron de datasets disponibles públicamente en línea. Se han utilizado tres algoritmos diferentes con distintos parámetros para comparar su rendimiento. Los experimentos muestran que los mejores resultados alcanzan una precisión del 82,51 % y un valor F1 de alrededor del 83 % en italiano. Los resultados para cada idioma varían dependiendo de distintos factores.Depto. de Ingeniería de Software e Inteligencia Artificial (ISIA)Fac. de InformáticaTRUEunpu

Docta Complutense