Search CORE

356 research outputs found

Website summarization: a topic hierarchy based approach.

Author
Publication venue
Publication date: 01/01/2006
Field of study

Liu Nan.Thesis (M.Phil.)--Chinese University of Hong Kong, 2006.Includes bibliographical references (leaves 84-88).Abstracts in English and Chinese.Abstract --- p.1Acknowledgements --- p.3Contents --- p.4List of Figures --- p.6List of Tables --- p.7Chapter Chapter 1 --- Introduction --- p.8Chapter Chapter 2 --- Related Work --- p.12Chapter 2.1 --- Web Structure Mining --- p.12Chapter 2.1.1 --- HITS Algorithm --- p.13Chapter 2.1.2 --- PageRank Algorithm --- p.13Chapter 2.2 --- Website Mining --- p.14Chapter 2.2.1 --- Website Classification --- p.14Chapter 2.2.2 --- Web Unit Mining --- p.16Chapter 2.2.3 --- Logical Domain Extraction --- p.16Chapter 2.2.4 --- Web Thesaurus Construction --- p.17Chapter Chapter 3 --- Website Topic Hierarchy Generation --- p.19Chapter 3.1 --- Problem Definition --- p.19Chapter 3.2 --- Graph Based Algorithms --- p.21Chapter 3.2.1 --- Breadth First Search --- p.21Chapter 3.2.2 --- Shortest Path Search --- p.23Chapter 3.2.3 --- Minimum Directed Spanning Tree --- p.24Chapter 3.2.4 --- Discussion --- p.27Chapter 3.3 --- Edge Weight Function --- p.28Chapter 3.3.1 --- Relevance Method --- p.29Chapter 3.3.2 --- Machine Learning Method --- p.32Chapter 3.4 --- Experiments --- p.47Chapter 3.4.1 --- Data Preparation --- p.47Chapter 3.4.2 --- Performances of Breadth-first Search --- p.50Chapter 3.4.3 --- Performances of Shortest-path Search --- p.50Chapter 3.4.4 --- Performances of Directed Minimum Spanning Tree --- p.54Chapter 3.4.5 --- Comparison of Different Algorithms --- p.55Chapter Chapter 4 --- Website Summarization Through Keyphrase Extraction --- p.58Chapter 4.1 --- Introduction --- p.58Chapter 4.2 --- Background --- p.60Chapter 4.3 --- Keyphrase Extraction --- p.69Chapter 4.3.1 --- Candidate Phrases Idenfication --- p.69Chapter 4.3.2 --- Feature Calculation without Topic Hierarchy --- p.70Chapter 4.3.3 --- Feature Calculation with Topic Hierarchy --- p.72Chapter 4.3.4 --- Extraction of Keyphrases --- p.75Chapter 4.4 --- Experiments --- p.76Chapter Chapter 5 --- Conclusion and Future Work --- p.82References: --- p.8

CUHK Digital Repository

mARC: Memory by Association and Reinforcement of Contexts

Author: Descourt Patrice
Rimoux Norbert
Publication venue
Publication date: 10/12/2013
Field of study

This paper introduces the memory by Association and Reinforcement of Contexts (mARC). mARC is a novel data modeling technology rooted in the second quantization formulation of quantum mechanics. It is an all-purpose incremental and unsupervised data storage and retrieval system which can be applied to all types of signal or data, structured or unstructured, textual or not. mARC can be applied to a wide range of information clas-sification and retrieval problems like e-Discovery or contextual navigation. It can also for-mulated in the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast to Conway approach, the objects evolve in a massively multidimensional space. In order to start evaluating the potential of mARC we have built a mARC-based Internet search en-gine demonstrator with contextual functionality. We compare the behavior of the mARC demonstrator with Google search both in terms of performance and relevance. In the study we find that the mARC search engine demonstrator outperforms Google search by an order of magnitude in response time while providing more relevant results for some classes of queries

arXiv.org e-Print Archive

CiteSeerX

Multilayer Complex Network Descriptors for Color-Texture Characterization

Author: Bruno Odemir M
Condori Rayner H M
Gonçalves Wesley N
Scabini Leonardo F S
Publication venue
Publication date: 02/04/2018
Field of study

A new method based on complex networks is proposed for color-texture analysis. The proposal consists on modeling the image as a multilayer complex network where each color channel is a layer, and each pixel (in each color channel) is represented as a network vertex. The network dynamic evolution is accessed using a set of modeling parameters (radii and thresholds), and new characterization techniques are introduced to capt information regarding within and between color channel spatial interaction. An automatic and adaptive approach for threshold selection is also proposed. We conduct classification experiments on 5 well-known datasets: Vistex, Usptex, Outex13, CURet and MBT. Results among various literature methods are compared, including deep convolutional neural networks with pre-trained architectures. The proposed method presented the highest overall performance over the 5 datasets, with 97.7 of mean accuracy against 97.0 achieved by the ResNet convolutional neural network with 50 layers.Comment: 20 pages, 7 figures and 4 table

arXiv.org e-Print Archive

Questions of science: chatting with ChatGPT about complex systems

Author: Cajueiro Daniel O.
Crokidakis Nuno
de Menezes Marcio Argollo
Publication venue
Publication date: 29/03/2023
Field of study

We present an overview of the complex systems field using ChatGPT as a representation of the community's understanding. ChatGPT has learned language patterns and styles from a large dataset of internet texts, allowing it to provide answers that reflect common opinions, ideas, and language patterns found in the community. Our exploration covers both teaching and learning, and research topics. We recognize the value of ChatGPT as a source for the community's ideas.Comment: This is a work in progres

arXiv.org e-Print Archive

Predictive Modeling of Breast Cancer Diagnosis Using Neural Networks:A Kaggle Dataset Analysis

Author: Abu Sultan Anas Bachir
Abu-Naser Samy S.
Publication venue
Publication date: 01/01/2023
Field of study

Breast cancer remains a significant health concern worldwide, necessitating the development of effective diagnostic tools. In this study, we employ a neural network-based approach to analyze the Wisconsin Breast Cancer dataset, sourced from Kaggle, comprising 570 samples and 30 features. Our proposed model features six layers (1 input, 1 hidden, 1 output), and through rigorous training and validation, we achieve a remarkable accuracy rate of 99.57% and an average error of 0.000170 as shown in the image below. Furthermore, our investigation identifies the most influential features in breast cancer diagnosis, shedding light on the key determinants of malignancy. Notably, we find that factors such as fractal dimension_se, symmetry worst, compactness_worst, symmetry_se, and smoothness_se play pivotal roles in distinguishing between benign and malignant cases. This research contributes to the ongoing efforts to enhance breast cancer diagnosis, providing valuable insights into feature importance and showcasing the potential of neural networks in medical applications. Our findings have implications for improving early detection and treatment strategies, ultimately contributing to improved patient outcomes

PhilPapers

A Literature Study On Video Retrieval Approaches

Author: S PADMAKALA
Publication venue: International Journal of Innovative Technology and Research
Publication date: 13/09/2019
Field of study

A detailed survey has been carried out to identify the various research articles available in the literature in all the categories of video retrieval and to do the analysis of the major contributions and their advantages, following are the literature used for the assessment of the state-of-art work on video retrieval. Here, a large number of papershave been studied

International Journal of Innovative Technology and Research (IJITR)

Media aesthetics based multimedia storytelling.

Author: Obrador Espinosa Pere
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2011
Field of study

Since the earliest of times, humans have been interested in recording their life experiences, for future reference and for storytelling purposes. This task of recording experiences --i.e., both image and video capture-- has never before in history been as easy as it is today. This is creating a digital information overload that is becoming a great concern for the people that are trying to preserve their life experiences. As high-resolution digital still and video cameras become increasingly pervasive, unprecedented amounts of multimedia, are being downloaded to personal hard drives, and also uploaded to online social networks on a daily basis. The work presented in this dissertation is a contribution in the area of multimedia organization, as well as automatic selection of media for storytelling purposes, which eases the human task of summarizing a collection of images or videos in order to be shared with other people. As opposed to some prior art in this area, we have taken an approach in which neither user generated tags nor comments --that describe the photographs, either in their local or on-line repositories-- are taken into account, and also no user interaction with the algorithms is expected. We take an image analysis approach where both the context images --e.g. images from online social networks to which the image stories are going to be uploaded--, and the collection images --i.e., the collection of images or videos that needs to be summarized into a story--, are analyzed using image processing algorithms. This allows us to extract relevant metadata that can be used in the summarization process. Multimedia-storytellers usually follow three main steps when preparing their stories: first they choose the main story characters, the main events to describe, and finally from these media sub-groups, they choose the media based on their relevance to the story as well as based on their aesthetic value. Therefore, one of the main contributions of our work has been the design of computational models --both regression based, as well as classification based-- that correlate well with human perception of the aesthetic value of images and videos. These computational aesthetics models have been integrated into automatic selection algorithms for multimedia storytelling, which are another important contribution of our work. A human centric approach has been used in all experiments where it was feasible, and also in order to assess the final summarization results, i.e., humans are always the final judges of our algorithms, either by inspecting the aesthetic quality of the media, or by inspecting the final story generated by our algorithms. We are aware that a perfect automatically generated story summary is very hard to obtain, given the many subjective factors that play a role in such a creative process; rather, the presented approach should be seen as a first step in the storytelling creative process which removes some of the ground work that would be tedious and time consuming for the user. Overall, the main contributions of this work can be capitalized in three: (1) new media aesthetics models for both images and videos that correlate with human perception, (2) new scalable multimedia collection structures that ease the process of media summarization, and finally, (3) new media selection algorithms that are optimized for multimedia storytelling purposes.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura