270 research outputs found

    Browsing Large Image Datasets through Voronoi Diagrams

    Get PDF
    Conventional browsing of image collections use mechanisms such as thumbnails arranged on a regular grid or on a line, often mounted over a scrollable panel. However, this approach does not scale well with the size of the datasets (number of images). In this paper, we propose a new thumbnail-based interface to browse large collections of images. Our approach is based on weighted centroidal anisotropic Voronoi diagrams. A dynamically changing subset of images is represented by thumbnails and shown on the screen. Thumbnails are shaped like general polygons, to better cover screen space, while still reflecting the original aspect ratios or orientation of the represented images. During the browsing process, thumbnails are dynamically rearranged, reshaped and rescaled. The objective is to devote more screen space (more numerous and larger thumbnails) to the parts of the dataset closer to the current region of interest, and progressively lesser away from it, while still making the dataset visible as a whole. During the entire process, temporal coherence is always maintained. GPU implementation easily guarantees the frame rates needed for fully smooth interactivity

    Content-aware photo collage using circle packing

    Get PDF
    published_or_final_versio

    Using Visual Analytics to Discover Bot Traffic

    Get PDF
    With the advance of technology, the Internet has become a medium tool used for many malicious activities. The presence of bot traffic has increased greatly that causes significant problems for businesses and organisations, such as spam bots, scraper bots, distributed denial of service bots and adaptive bots that aim to exploit the vulnerabilities of a website. Discriminating bot traffic against legitimate flash crowds remains an open challenge to date.In order to address the above issues and enhance security awareness, this thesis proposes an interactive visual analytics system for discovering bot traffic. The system provides an interactive visualisation, with details on demand capabilities, which enables knowledge discovery from very large datasets. It enables an analyst to understand comprehensive details without being constrained by large datasets. The system has a dashboard view to represent legitimate and bot traffic by adopting Quadtree data structure and Voronoi diagrams. The main contribution of this thesis is a novel visual analytics system that is capable of discovering bot traffic.This research conducted a literature review in order to gain systematic understanding of the research area. Furthermore, the research was conducted by utilising experiment and simulation approaches. The experiment was conducted by capturing website traffic, identifying browser fingerprints, simulating bot attacks and analysing mouse dynamics, such as movements and events, of participants. Data were captured as the participants performed a list of tasks, such as responding to the banner. The data collection is transparent to the participants and only requires JavaScript to be activated on the client side. This study involved 10 participants who are familiar with the Internet. To analyse the data, Weka 3.6.10 was used to perform classification based on a training dataset. The test dataset of all participants was evaluated using a built-in decision tree algorithm. The results of classifying the test dataset were promising, and the model was able to identify ten participants and six simulated bot attacks with an accuracy of 86.67%. Finally, the visual analytics design was formulated in order to assist an analyst to discover bot presence

    Adaptive Algorithms for Automated Processing of Document Images

    Get PDF
    Large scale document digitization projects continue to motivate interesting document understanding technologies such as script and language identification, page classification, segmentation and enhancement. Typically, however, solutions are still limited to narrow domains or regular formats such as books, forms, articles or letters and operate best on clean documents scanned in a controlled environment. More general collections of heterogeneous documents challenge the basic assumptions of state-of-the-art technology regarding quality, script, content and layout. Our work explores the use of adaptive algorithms for the automated analysis of noisy and complex document collections. We first propose, implement and evaluate an adaptive clutter detection and removal technique for complex binary documents. Our distance transform based technique aims to remove irregular and independent unwanted foreground content while leaving text content untouched. The novelty of this approach is in its determination of best approximation to clutter-content boundary with text like structures. Second, we describe a page segmentation technique called Voronoi++ for complex layouts which builds upon the state-of-the-art method proposed by Kise [Kise1999]. Our approach does not assume structured text zones and is designed to handle multi-lingual text in both handwritten and printed form. Voronoi++ is a dynamically adaptive and contextually aware approach that considers components' separation features combined with Docstrum [O'Gorman1993] based angular and neighborhood features to form provisional zone hypotheses. These provisional zones are then verified based on the context built from local separation and high-level content features. Finally, our research proposes a generic model to segment and to recognize characters for any complex syllabic or non-syllabic script, using font-models. This concept is based on the fact that font files contain all the information necessary to render text and thus a model for how to decompose them. Instead of script-specific routines, this work is a step towards a generic character and recognition scheme for both Latin and non-Latin scripts

    Multi-Modal Interfaces for Sensemaking of Graph-Connected Datasets

    Get PDF
    The visualization of hypothesized evolutionary processes is often shown through phylogenetic trees. Given evolutionary data presented in one of several widely accepted formats, software exists to render these data into a tree diagram. However, software packages commonly in use by biologists today often do not provide means to dynamically adjust and customize these diagrams for studying new hypothetical relationships, and for illustration and publication purposes. Even where these options are available, there can be a lack of intuitiveness and ease-of-use. The goal of our research is, thus, to investigate more natural and effective means of sensemaking of the data with different user input modalities. To this end, we experimented with different input modalities, designing and running a series of prototype studies, ultimately focusing our attention on pen-and-touch. Through several iterations of feedback and revision provided with the help of biology experts and students, we developed a pen-and-touch phylogenetic tree browsing and editing application called PhyloPen. This application expands on the capabilities of existing software with visualization techniques such as overview+detail, linked data views, and new interaction and manipulation techniques using pen-and-touch. To determine its impact on phylogenetic tree sensemaking, we conducted a within-subject comparative summative study against the most comparable and commonly used state-of-the-art mouse-based software system, Mesquite. Conducted with biology majors at the University of Central Florida, each used both software systems on a set number of exercise tasks of the same type. Determining effectiveness by several dependent measures, the results show PhyloPen was significantly better in terms of usefulness, satisfaction, ease-of-learning, ease-of-use, and cognitive load and relatively the same in variation of completion time. These results support an interaction paradigm that is superior to classic mouse-based interaction, which could have the potential to be applied to other communities that employ graph-based representations of their problem domains

    Methods and Distributed Software for Visualization of Cracks Propagating in Discrete Particle Systems

    Get PDF
    Scientific visualization is becoming increasingly important in analyzing and interpreting numerical and experimental data sets. Parallel computations of discrete particle systems lead to large data sets that can be produced, stored and visualized on distributed IT infrastructures. However, this leads to very complicated environments handling complex simulation and interactive visualization on the remote heterogeneous architectures. In micro-structure of continuum, broken connections between neighbouring particles can form complex cracks of unknown geometrical shape. The complex disjoint surfaces of cracks with holes and unavailability of a suitable scalar field defining the crack surfaces limit the application of the common surface extraction methods. The main visualization task is to extract the surfaces of cracks according to the connectivity of the broken connections and the geometry of the neighbouring particles. The research aims at enhancing the visualization methods of discrete particle systems and increasing speed of distributed visualization software. The dissertation consists of introduction, three main chapters and general conclusions. In the first Chapter, a literature review on visualization software, distributed environments, discrete element simulation of particle systems and crack visualization methods is presented. In the second Chapter, novel visualization methods were proposed for extraction of crack surfaces from monodispersed particle systems modelled by the discrete element method. The cell cut-based method, the Voronoi-based method and cell centre-based method explicitly define geometry of propagating cracks in fractured regions. The proposed visualization methods were implemented in the grid visualization e–service VizLitG and the distributed visualization software VisPartDEM. Partial data set transfer from the grid storage element was developed to reduce the data transfer and visualization time. In the third Chapter, the results of experimental research are presented. The performance of e-service VizLitG was evaluated in a geographically distributed grid. Different types of software were employed for data transfer in order to present the quantitative comparison. The performance of the developed visualization methods was investigated. The quantitative comparison of the execution time of local Voronoi-based method and that of global Voronoi diagrams generated by Voro++ library was presented. The accuracy of the developed methods was evaluated by computing the total depth of cuts made in particles by the extracted crack surfaces. The present research confirmed that the proposed visualization methods and the developed distributed software were capable of visualizing crack propagation modelled by the discrete element method in monodispersed particulate media

    Personalizacija sadržaja novinskih webskih portala pomoću tehnika izlučivanja informacija i težinskih Voronoievih dijagrama

    Get PDF
    News web portals present information, in previously defined topic taxonomy, in both multimedia as well as textual format, that cover all aspects of our daily lives. The information presented has a high refresh rate and as such offers a local as well as a global snapshot of the world. This thesis deals with the presentation of information extraction techniques (from web news portals) and their use in standardization of categorization schemes and automatic classification of newly published content. As the personalization method, weighted Voronoi diagrams are proposed. The aim of the study is to create a virtual profile based on the semantic value of information of visited nodes (web pages formatted with HTML language) at the individual level. The results can greatly contribute to the applicability of the personalization data to specific information sources, including various web news portals. Also, by creating a publicly available collection of prepared data future research in this domain is enabled. Scientific contribution of this doctoral thesis is therefore: a universal classification scheme, that is based on the ODP taxonomy data, is developed, a way for information extraction about user preferences, based on the analysis of user behavior data when using the Web browser, is defined, personalization system, based on the weighted Voronoi diagrams, is implemented.Jedan od načina rješavanja problema nastalih hiperprodukcijom informacija je putem personalizacije izvora informacija, u našem slučaju WWW okruženja, kreiranjem virtualnih profila temeljenih na analizi ponašajnih karakteristika korisnika s ciljem gradiranja važnosti informacija na individualnoj bazi. Sama personalizacija je najviše korištena u području pretraživanja informacija. U pregledu dosadašnjih istraživanja valja napomenuti nekoliko različitih pristupa koji su korišteni u personalizaciji dostupnog sadržaja: ontologijski pristupi, kontekstualni modeli, rudarenje podataka. Ti pristupi su najzastupljeniji u pregledanoj literaturi. Analizom literature također je uočen problem nedostatka ujednačene taksonomije pojmova koji se koriste za anotaciju informacijskih čvorova. Prevladavajući pristup anotacijije korištenje sustava označavanja koji se temelji na korisničkom unosu. Pregledani radovi ukazuju da korisnici na različitim sustavima vežu iste anotacije za iste i/ili slične objekte kod popularnih anotacija, da problem sinonima postoji ali da je zanemariv uz dovoljnu količinu podataka te da se anotacije korištene od strane običnih korisnika i stručnjaka domene preklapaju u 52% slučajeva. Ti podaci upućuju na problem nedostatka unificiranog sustava označavanja informacijskog čvora. Sustavi označavanja nose sa sobom veliku količinu "informacijskog šuma" zbog individualne prirode označavanja informacijskog čvora koji je izravno vezan za korisnikovo poznavanje domene informacijskog čvora. Kao potencijalno rješenje ovog uočenog nedostatka predlaže se korištenje postojećih taksonomija definiranih putem web direktorija. Pregled literature, od nekoliko mogućih web direktorija, najviše spominje ODP web direktorij kao najkvalitetniju taksonomiju hijerarhijske domenske kategorizacije informacijskih čvorova. Korištenje ODP kao taksonomije je navedeno unekoliko radova proučenih u sklopu obavljenog predistraživanja. Korištenjem ODP taksonomije za klasifikaciju informacijskih čvorova omogućuje se određivanje domenske pripadnosti. Ta činjenica omogućuje dodjelu vrijednosti pripadnosti informacijskog čvora pojedinoj domeni. S obzirom na kompleksnu strukturu ODP taksonomije (12 hijerarhijskih razina podjele, 17 kategorija na prvoj razini) i velikom broju potencijalnih kategorija, predlaže korištenje ODP taksonomije za klasifikaciju informacijskog čvora do razine 6. Uz uputu o broju hijerarhijskih razina koje se preporučuju za korištenje prilikom analize ODP strukture, također ističe potrebu za dubinskom klasifikacijom dokumenata. Analizom literature primijećeno je da se problemu personalizacije pristupa prvenstveno u domeni pretraživanja informacija putem WWW sučelja te da je personalizacija informacija dostupnih putem web portala slabo istražena. Kroz brojne radove koji su konzultirani prilikom pripreme predistraživačke faze kao izvori podataka za analizu iskorišteni su različiti izvori informacija: serverske log datoteke, osobna povijest pregledavanja putem preglednikovih log datoteka, aplikacije za praćenje korisnikove interakcije sa sustavom , kolačići i drugi. Podaci prikupljeni putem jednog ili više gore navedenih izvora daju nam uvid u individualno kretanje korisnika unutar definiranog informacijskog i vremenskog okvira. U pregledanoj literaturi se tako prikupljeni podaci koriste za personalizaciju informacija no ne na individualnoj razini nego na temelju grupiranja korisnika u tematski slične grupe/cjeline. Cilj ovog rada je testirati postojeće metode, koje su prepoznate od koristi za daljnji rad, te unapređenje tih metoda težinskim Voronoi dijagramima radi ostvarivanja personalizacije na individualnoj razini. Korištenje težinskih Voronoi dijagrama do sada nije zabilježen u literaturi pa samim time predstavlja inovaciju na području personalizacije informacija. Od pomoći će u tom procesu biti i radovi koji se temeljno bave prepoznavanjem uzoraka korištenja informacijskih čvorova, kojih ima značajan broj te se ne mogu svi spomenuti. Postojanje ponašajnog uzorka povezanog bilo s dugoročnim i/ili kratkoročnim podacima o korisnikovu kretanju kroz informacijski prostor omogućuje kvalitetnije filtriranje i personalizaciju dostupnih informacija. S obzirom da je cilj ovog rada prikazati mogućnost individualne personalizacije, prepoznat je potencijal korištenja težinskih Voronoi dijagrama za potrebe izgradnje virtualnog semantičkog profila te personalizaciju informacija

    Personalizacija sadržaja novinskih webskih portala pomoću tehnika izlučivanja informacija i težinskih Voronoievih dijagrama

    Get PDF
    News web portals present information, in previously defined topic taxonomy, in both multimedia as well as textual format, that cover all aspects of our daily lives. The information presented has a high refresh rate and as such offers a local as well as a global snapshot of the world. This thesis deals with the presentation of information extraction techniques (from web news portals) and their use in standardization of categorization schemes and automatic classification of newly published content. As the personalization method, weighted Voronoi diagrams are proposed. The aim of the study is to create a virtual profile based on the semantic value of information of visited nodes (web pages formatted with HTML language) at the individual level. The results can greatly contribute to the applicability of the personalization data to specific information sources, including various web news portals. Also, by creating a publicly available collection of prepared data future research in this domain is enabled. Scientific contribution of this doctoral thesis is therefore: a universal classification scheme, that is based on the ODP taxonomy data, is developed, a way for information extraction about user preferences, based on the analysis of user behavior data when using the Web browser, is defined, personalization system, based on the weighted Voronoi diagrams, is implemented.Jedan od načina rješavanja problema nastalih hiperprodukcijom informacija je putem personalizacije izvora informacija, u našem slučaju WWW okruženja, kreiranjem virtualnih profila temeljenih na analizi ponašajnih karakteristika korisnika s ciljem gradiranja važnosti informacija na individualnoj bazi. Sama personalizacija je najviše korištena u području pretraživanja informacija. U pregledu dosadašnjih istraživanja valja napomenuti nekoliko različitih pristupa koji su korišteni u personalizaciji dostupnog sadržaja: ontologijski pristupi, kontekstualni modeli, rudarenje podataka. Ti pristupi su najzastupljeniji u pregledanoj literaturi. Analizom literature također je uočen problem nedostatka ujednačene taksonomije pojmova koji se koriste za anotaciju informacijskih čvorova. Prevladavajući pristup anotacijije korištenje sustava označavanja koji se temelji na korisničkom unosu. Pregledani radovi ukazuju da korisnici na različitim sustavima vežu iste anotacije za iste i/ili slične objekte kod popularnih anotacija, da problem sinonima postoji ali da je zanemariv uz dovoljnu količinu podataka te da se anotacije korištene od strane običnih korisnika i stručnjaka domene preklapaju u 52% slučajeva. Ti podaci upućuju na problem nedostatka unificiranog sustava označavanja informacijskog čvora. Sustavi označavanja nose sa sobom veliku količinu "informacijskog šuma" zbog individualne prirode označavanja informacijskog čvora koji je izravno vezan za korisnikovo poznavanje domene informacijskog čvora. Kao potencijalno rješenje ovog uočenog nedostatka predlaže se korištenje postojećih taksonomija definiranih putem web direktorija. Pregled literature, od nekoliko mogućih web direktorija, najviše spominje ODP web direktorij kao najkvalitetniju taksonomiju hijerarhijske domenske kategorizacije informacijskih čvorova. Korištenje ODP kao taksonomije je navedeno unekoliko radova proučenih u sklopu obavljenog predistraživanja. Korištenjem ODP taksonomije za klasifikaciju informacijskih čvorova omogućuje se određivanje domenske pripadnosti. Ta činjenica omogućuje dodjelu vrijednosti pripadnosti informacijskog čvora pojedinoj domeni. S obzirom na kompleksnu strukturu ODP taksonomije (12 hijerarhijskih razina podjele, 17 kategorija na prvoj razini) i velikom broju potencijalnih kategorija, predlaže korištenje ODP taksonomije za klasifikaciju informacijskog čvora do razine 6. Uz uputu o broju hijerarhijskih razina koje se preporučuju za korištenje prilikom analize ODP strukture, također ističe potrebu za dubinskom klasifikacijom dokumenata. Analizom literature primijećeno je da se problemu personalizacije pristupa prvenstveno u domeni pretraživanja informacija putem WWW sučelja te da je personalizacija informacija dostupnih putem web portala slabo istražena. Kroz brojne radove koji su konzultirani prilikom pripreme predistraživačke faze kao izvori podataka za analizu iskorišteni su različiti izvori informacija: serverske log datoteke, osobna povijest pregledavanja putem preglednikovih log datoteka, aplikacije za praćenje korisnikove interakcije sa sustavom , kolačići i drugi. Podaci prikupljeni putem jednog ili više gore navedenih izvora daju nam uvid u individualno kretanje korisnika unutar definiranog informacijskog i vremenskog okvira. U pregledanoj literaturi se tako prikupljeni podaci koriste za personalizaciju informacija no ne na individualnoj razini nego na temelju grupiranja korisnika u tematski slične grupe/cjeline. Cilj ovog rada je testirati postojeće metode, koje su prepoznate od koristi za daljnji rad, te unapređenje tih metoda težinskim Voronoi dijagramima radi ostvarivanja personalizacije na individualnoj razini. Korištenje težinskih Voronoi dijagrama do sada nije zabilježen u literaturi pa samim time predstavlja inovaciju na području personalizacije informacija. Od pomoći će u tom procesu biti i radovi koji se temeljno bave prepoznavanjem uzoraka korištenja informacijskih čvorova, kojih ima značajan broj te se ne mogu svi spomenuti. Postojanje ponašajnog uzorka povezanog bilo s dugoročnim i/ili kratkoročnim podacima o korisnikovu kretanju kroz informacijski prostor omogućuje kvalitetnije filtriranje i personalizaciju dostupnih informacija. S obzirom da je cilj ovog rada prikazati mogućnost individualne personalizacije, prepoznat je potencijal korištenja težinskih Voronoi dijagrama za potrebe izgradnje virtualnog semantičkog profila te personalizaciju informacija
    corecore