356 research outputs found

    Website summarization: a topic hierarchy based approach.

    Get PDF
    Liu Nan.Thesis (M.Phil.)--Chinese University of Hong Kong, 2006.Includes bibliographical references (leaves 84-88).Abstracts in English and Chinese.Abstract --- p.1Acknowledgements --- p.3Contents --- p.4List of Figures --- p.6List of Tables --- p.7Chapter Chapter 1 --- Introduction --- p.8Chapter Chapter 2 --- Related Work --- p.12Chapter 2.1 --- Web Structure Mining --- p.12Chapter 2.1.1 --- HITS Algorithm --- p.13Chapter 2.1.2 --- PageRank Algorithm --- p.13Chapter 2.2 --- Website Mining --- p.14Chapter 2.2.1 --- Website Classification --- p.14Chapter 2.2.2 --- Web Unit Mining --- p.16Chapter 2.2.3 --- Logical Domain Extraction --- p.16Chapter 2.2.4 --- Web Thesaurus Construction --- p.17Chapter Chapter 3 --- Website Topic Hierarchy Generation --- p.19Chapter 3.1 --- Problem Definition --- p.19Chapter 3.2 --- Graph Based Algorithms --- p.21Chapter 3.2.1 --- Breadth First Search --- p.21Chapter 3.2.2 --- Shortest Path Search --- p.23Chapter 3.2.3 --- Minimum Directed Spanning Tree --- p.24Chapter 3.2.4 --- Discussion --- p.27Chapter 3.3 --- Edge Weight Function --- p.28Chapter 3.3.1 --- Relevance Method --- p.29Chapter 3.3.2 --- Machine Learning Method --- p.32Chapter 3.4 --- Experiments --- p.47Chapter 3.4.1 --- Data Preparation --- p.47Chapter 3.4.2 --- Performances of Breadth-first Search --- p.50Chapter 3.4.3 --- Performances of Shortest-path Search --- p.50Chapter 3.4.4 --- Performances of Directed Minimum Spanning Tree --- p.54Chapter 3.4.5 --- Comparison of Different Algorithms --- p.55Chapter Chapter 4 --- Website Summarization Through Keyphrase Extraction --- p.58Chapter 4.1 --- Introduction --- p.58Chapter 4.2 --- Background --- p.60Chapter 4.3 --- Keyphrase Extraction --- p.69Chapter 4.3.1 --- Candidate Phrases Idenfication --- p.69Chapter 4.3.2 --- Feature Calculation without Topic Hierarchy --- p.70Chapter 4.3.3 --- Feature Calculation with Topic Hierarchy --- p.72Chapter 4.3.4 --- Extraction of Keyphrases --- p.75Chapter 4.4 --- Experiments --- p.76Chapter Chapter 5 --- Conclusion and Future Work --- p.82References: --- p.8

    mARC: Memory by Association and Reinforcement of Contexts

    Full text link
    This paper introduces the memory by Association and Reinforcement of Contexts (mARC). mARC is a novel data modeling technology rooted in the second quantization formulation of quantum mechanics. It is an all-purpose incremental and unsupervised data storage and retrieval system which can be applied to all types of signal or data, structured or unstructured, textual or not. mARC can be applied to a wide range of information clas-sification and retrieval problems like e-Discovery or contextual navigation. It can also for-mulated in the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast to Conway approach, the objects evolve in a massively multidimensional space. In order to start evaluating the potential of mARC we have built a mARC-based Internet search en-gine demonstrator with contextual functionality. We compare the behavior of the mARC demonstrator with Google search both in terms of performance and relevance. In the study we find that the mARC search engine demonstrator outperforms Google search by an order of magnitude in response time while providing more relevant results for some classes of queries

    Multilayer Complex Network Descriptors for Color-Texture Characterization

    Full text link
    A new method based on complex networks is proposed for color-texture analysis. The proposal consists on modeling the image as a multilayer complex network where each color channel is a layer, and each pixel (in each color channel) is represented as a network vertex. The network dynamic evolution is accessed using a set of modeling parameters (radii and thresholds), and new characterization techniques are introduced to capt information regarding within and between color channel spatial interaction. An automatic and adaptive approach for threshold selection is also proposed. We conduct classification experiments on 5 well-known datasets: Vistex, Usptex, Outex13, CURet and MBT. Results among various literature methods are compared, including deep convolutional neural networks with pre-trained architectures. The proposed method presented the highest overall performance over the 5 datasets, with 97.7 of mean accuracy against 97.0 achieved by the ResNet convolutional neural network with 50 layers.Comment: 20 pages, 7 figures and 4 table

    Questions of science: chatting with ChatGPT about complex systems

    Full text link
    We present an overview of the complex systems field using ChatGPT as a representation of the community's understanding. ChatGPT has learned language patterns and styles from a large dataset of internet texts, allowing it to provide answers that reflect common opinions, ideas, and language patterns found in the community. Our exploration covers both teaching and learning, and research topics. We recognize the value of ChatGPT as a source for the community's ideas.Comment: This is a work in progres

    Predictive Modeling of Breast Cancer Diagnosis Using Neural Networks:A Kaggle Dataset Analysis

    Get PDF
    Breast cancer remains a significant health concern worldwide, necessitating the development of effective diagnostic tools. In this study, we employ a neural network-based approach to analyze the Wisconsin Breast Cancer dataset, sourced from Kaggle, comprising 570 samples and 30 features. Our proposed model features six layers (1 input, 1 hidden, 1 output), and through rigorous training and validation, we achieve a remarkable accuracy rate of 99.57% and an average error of 0.000170 as shown in the image below. Furthermore, our investigation identifies the most influential features in breast cancer diagnosis, shedding light on the key determinants of malignancy. Notably, we find that factors such as fractal dimension_se, symmetry worst, compactness_worst, symmetry_se, and smoothness_se play pivotal roles in distinguishing between benign and malignant cases. This research contributes to the ongoing efforts to enhance breast cancer diagnosis, providing valuable insights into feature importance and showcasing the potential of neural networks in medical applications. Our findings have implications for improving early detection and treatment strategies, ultimately contributing to improved patient outcomes

    A Literature Study On Video Retrieval Approaches

    Get PDF
    A detailed survey has been carried out to identify the various research articles available in the literature in all the categories of video retrieval and to do the analysis of the major contributions and their advantages, following are the literature used for the assessment of the state-of-art work on video retrieval. Here, a large number of papershave been studied

    Media aesthetics based multimedia storytelling.

    Get PDF
    Since the earliest of times, humans have been interested in recording their life experiences, for future reference and for storytelling purposes. This task of recording experiences --i.e., both image and video capture-- has never before in history been as easy as it is today. This is creating a digital information overload that is becoming a great concern for the people that are trying to preserve their life experiences. As high-resolution digital still and video cameras become increasingly pervasive, unprecedented amounts of multimedia, are being downloaded to personal hard drives, and also uploaded to online social networks on a daily basis. The work presented in this dissertation is a contribution in the area of multimedia organization, as well as automatic selection of media for storytelling purposes, which eases the human task of summarizing a collection of images or videos in order to be shared with other people. As opposed to some prior art in this area, we have taken an approach in which neither user generated tags nor comments --that describe the photographs, either in their local or on-line repositories-- are taken into account, and also no user interaction with the algorithms is expected. We take an image analysis approach where both the context images --e.g. images from online social networks to which the image stories are going to be uploaded--, and the collection images --i.e., the collection of images or videos that needs to be summarized into a story--, are analyzed using image processing algorithms. This allows us to extract relevant metadata that can be used in the summarization process. Multimedia-storytellers usually follow three main steps when preparing their stories: first they choose the main story characters, the main events to describe, and finally from these media sub-groups, they choose the media based on their relevance to the story as well as based on their aesthetic value. Therefore, one of the main contributions of our work has been the design of computational models --both regression based, as well as classification based-- that correlate well with human perception of the aesthetic value of images and videos. These computational aesthetics models have been integrated into automatic selection algorithms for multimedia storytelling, which are another important contribution of our work. A human centric approach has been used in all experiments where it was feasible, and also in order to assess the final summarization results, i.e., humans are always the final judges of our algorithms, either by inspecting the aesthetic quality of the media, or by inspecting the final story generated by our algorithms. We are aware that a perfect automatically generated story summary is very hard to obtain, given the many subjective factors that play a role in such a creative process; rather, the presented approach should be seen as a first step in the storytelling creative process which removes some of the ground work that would be tedious and time consuming for the user. Overall, the main contributions of this work can be capitalized in three: (1) new media aesthetics models for both images and videos that correlate with human perception, (2) new scalable multimedia collection structures that ease the process of media summarization, and finally, (3) new media selection algorithms that are optimized for multimedia storytelling purposes.Postprint (published version
    • …
    corecore