Search CORE

1,910 research outputs found

Towards More Usable Dataset Search: From Query Characterization to Snippet Generation

Author: Cheng Gong
Cheng Gong
Ding Bolin
Kacprzak Emilia
Li Rong-Hua
Noy Natasha
Pietriga Emmanuel
Sven
Toupikov Nickolai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/08/2019
Field of study

Reusing published datasets on the Web is of great interest to researchers and developers. Their data needs may be met by submitting queries to a dataset search engine to retrieve relevant datasets. In this ongoing work towards developing a more usable dataset search engine, we characterize real data needs by annotating the semantics of 1,947 queries using a novel fine-grained scheme, to provide implications for enhancing dataset search. Based on the findings, we present a query-centered framework for dataset search, and explore the implementation of snippet generation and evaluate it with a preliminary user study.Comment: 4 pages, The 28th ACM International Conference on Information and Knowledge Management (CIKM 2019

arXiv.org e-Print Archive

Crossref

Addressing the new generation of spam (Spam 2.0) through Web usage models

Author: Hayati Pedram
Publication venue: Curtin University
Publication date: 01/01/2011
Field of study

New Internet collaborative media introduce new ways of communicating that are not immune to abuse. A fake eye-catching profile in social networking websites, a promotional review, a response to a thread in online forums with unsolicited content or a manipulated Wiki page, are examples of new the generation of spam on the web, referred to as Web 2.0 Spam or Spam 2.0. Spam 2.0 is defined as the propagation of unsolicited, anonymous, mass content to infiltrate legitimate Web 2.0 applications.The current literature does not address Spam 2.0 in depth and the outcome of efforts to date are inadequate. The aim of this research is to formalise a definition for Spam 2.0 and provide Spam 2.0 filtering solutions. Early-detection, extendibility, robustness and adaptability are key factors in the design of the proposed method.This dissertation provides a comprehensive survey of the state-of-the-art web spam and Spam 2.0 filtering methods to highlight the unresolved issues and open problems, while at the same time effectively capturing the knowledge in the domain of spam filtering.This dissertation proposes three solutions in the area of Spam 2.0 filtering including: (1) characterising and profiling Spam 2.0, (2) Early-Detection based Spam 2.0 Filtering (EDSF) approach, and (3) On-the-Fly Spam 2.0 Filtering (OFSF) approach. All the proposed solutions are tested against real-world datasets and their performance is compared with that of existing Spam 2.0 filtering methods.This work has coined the term ‘Spam 2.0’, provided insight into the nature of Spam 2.0, and proposed filtering mechanisms to address this new and rapidly evolving problem

espace@Curtin

What Scanners do at L7? Exploring Horizontal Honeypots for Security Monitoring

Author: Drago Idilio
Favale Thomas
Giordano Danilo
Mellia Marco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Honeypots are a common means to collect data useful for threat intelligence. Most efforts in this area rely on vertical systems and target a specific scenario or service to analyse data collected in such deployment. We here extend the analysis of the visibility of honeypots, by revisiting the problem from a horizontal perspective. We deploy a flexible honeypot system hosting multiple services, relying on the T-Pot project. We collect data for 5 months, recording millions of application requests from tens of thousands of sources. We compare if and how the attackers interact with multiple services. We observe attackers that always focus on one or few services, and others that target tens of services simultaneously. We dig further into the dataset, providing an initial horizontal analysis of brute-force attacks against multiple services. We show, for example, clear groups of attackers that rely on different password lists on different services. All in all, this work is our initial effort to build a horizontal system that can provide insights on attacks

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

A Review – Clustering and Preprocessing For Web Log Mining

Author: Shital S. Kontamwar, Prof. Anil Warbhe, Prof. Shyam. Dubey
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/04/2015
Field of study

World Wide Web is consist of large amount of information and provides it different kinds users. Everyday number of users use log on internet.. Internet information growing enormously. Users accesses are documented in web logs. As huge storage log files are growing rapidly .One of the application of Data Mining is Web Usage Mining works on users logs. It consist of various steps such as user identification ,session identification and clustering. Again removing robot entries. In previous years data preprocessing analysis system algorithm on web usage mining has been used buts algorithm lacks on scalability problem. This proposes session identification process and building transaction preprocessing ,data cleaning by using efficient data mining algorithm . The experimental results may show considerable performance of proposed algorithm. DOI: 10.17762/ijritcc2321-8169.160415

International Journal on Recent and Innovation Trends in Computing and Communication

BlogForever: D3.1 Preservation Strategy Report

Author: Arango-Docio Silvia
Banos Vangelis
Garcia Llopis Jaime
Kalb Hendrik
Kim Yunhyong
Pinsent Ed
Ross Seamus
Sleeman Patricia
Stepanyan Karen
Trochidis Illias
Publication venue: BlogForever
Publication date: 25/10/2013
Field of study

This report describes preservation planning approaches and strategies recommended by the BlogForever project as a core component of a weblog repository design. More specifically, we start by discussing why we would want to preserve weblogs in the first place and what it is exactly that we are trying to preserve. We further present a review of past and present work and highlight why current practices in web archiving do not address the needs of weblog preservation adequately. We make three distinctive contributions in this volume: a) we propose transferable practical workflows for applying a combination of established metadata and repository standards in developing a weblog repository, b) we provide an automated approach to identifying significant properties of weblog content that uses the notion of communities and how this affects previous strategies, c) we propose a sustainability plan that draws upon community knowledge through innovative repository design

Enlighten

User behaviour and task characteristics: A field study of daily information behaviour

Author: Dumais S.
Gwizdka J.
Hagen M.
Smith C. L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/03/2017
Field of study

Previous studies investigating task based search often take the form of lab studies or large scale log analysis. In lab studies, users typically perform a designed task under a controlled environment, which may not reflect their natural behaviour. While log analysis allows the observation of users' natural search behaviour, often strong assumptions need to be made in order to associate the unobserved underlying user tasks with log signals. We describe a field study during which we log participants' daily search and browsing activities for 5 days, and users are asked to self-annotate their search logs with the tasks they conducted as well as to describe the task characteristics according to a conceptual task classification scheme. This provides u

Crossref

CWI's Institutional Repository

Using visual information analysis to explore complex patterns in the activity of designers

Author: Adams
Allard
Aurisicchio
Barnes
Bedny
Belkin
Berland
Blandford
Borlund
Cash
Cave
Chai
Clauset
Coley
Court
Cross
Dong
Dorogovtsev
Dorst
Fry
Gero
Glaser
Goldschmidt
Goldschmidt
Goodman-Deane
Hagedorn
Harary
Hertzum
Holscher
Huet
Hyldegard
Jadad
Kan
Kan
Kellar
King
Kwasitsu
Lethbridge
LiveScribe
Mario Štorga
Maznevski
Mitchell
Moody
Newman
Nguyen
Ogawa
Oh
Panopto
Parmee
Pavkovic
Pedgley
Peterson
Philip Cash
Podsakoff
Purchase
Puttre
Reed
Robinson
Salustri
Stankovic
Storga
Storga
Tang
Tino Stanković
Torgerson
Unwin
Vande Moere
VCode
Waddell
Ware
Wasiak
White
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Crossref

FAMENA Repository

Online Research Database In Technology