20 research outputs found

    Eigenvector space model to capture features of documents

    Get PDF
    Eigenvectors are a special set of vectors associated with a linear system of equations. Because of the special property of eigenvector, it has been used a lot for computer vision area. When the eigenvector is applied to information retrieval field, it is possible to obtain properties of documents data corpus. To capture properties of given documents, this paper conducted simple experiments to prove the eigenvector is also possible to use in document analysis. For the experiment, we use short abstract document of Wikipedia provided by DBpedia as a document corpus. To build an original square matrix, the most popular method named tf-idf measurement will be used. After calculating the eigenvectors of original matrix, each vector will be plotted into 3D graph to find what the eigenvector means in document processing.

    Automatic Image Annotation Using Semantic Text Analysis

    No full text
    Part 2: WorkshopInternational audienceThis paper proposed a method to find annotations corresponding to given CNN news documents for detecting terrorism image or context information. Assigning keywords or annotation to image is one of the important tasks to let machine understand web data written by human. Many techniques have been suggested for automatic image annotation in the last few years. Many researches focused on the method to extract possible annotation using low-level image features. This was the basic and traditional approach but it has a limitation that it costs lots of time. To overcome this problem, we analyze images and theirs co-occurring text data to generate possible annotations. The text data in the news documents describe the core point of news stories according to the given images and titles. Because of this fact, this paper applied text data as a resource to assign image annotations using TF (Term Frequency) value and WUP values of WordNet. The proposed method shows that text analysis is another possible technique to annotate image automatically for detecting unintended web documents

    EIGENVECTOR SPACE MODEL TO CAPTURE FEATURES OF DOCUMENTS

    No full text
    Eigenvectors are a special set of vectors associated with a linear system of equations. Because of the special property of eigenvector, it has been used a lot for computer vision area. When the eigenvector is applied to information retrieval field, it is possible to obtain properties of documents data corpus. To capture properties of given documents, this paper conducted simple experiments to prove the eigenvector is also possible to use in document analysis. For the experiment, we use short abstract document of Wikipedia provided by DBpedia as a document corpus. To build an original square matrix, the most popular method named tf-idf measurement will be used. After calculating the eigenvectors of original matrix, each vector will be plotted into 3D graph to find what the eigenvector means in document processing.C83

    Text Analysis for Monitoring Personal Information Leakage on Twitter

    No full text
    Social networking services (SNSs) such as Twitter and Facebook can be considered as new forms of media. Information spreads much faster through social media than any other forms of traditional news media because people can upload information with no time and location constraints. For this reason, people have embraced SNSs and allowed them to become an integral part of their everyday lives. People express their emotional status to let others know how they feel about certain information or events. However, they are likely not only to share information with others but also to unintentionally expose personal information such as their place of residence, phone number, and date of birth. If such information is provided to users with inappropriate intentions, there may be serious consequences such as online and offline stalking. To prevent information leakages and detect spam, many researchers have monitored e-mail systems and web blogs. This paper considers text messages on Twitter, which is one of the most popular SNSs in the world, to reveal various hidden patterns by using several coefficient approaches. This paper focuses on users who exchange Tweets and examines the types of information that they reciprocate other's Tweets by monitoring samples of 50 million Tweets which were collected by Stanford University in November 2009. We chose an active Twitter user based on "happy birthday" rule and detecting their information related to place to live and personal names by using proposed coefficient method and compared with other coefficient approaches. As a result of this research, we can conclude that the proposed coefficient method is able to detect and recommend the standard English words for non-standard words in few conditions. Eventually, we detected 88,882 (24.287%) more name included Tweets and 14,054 (3.84%) location related Tweets compared by using only standard word matching method

    Least Slack Time Rate first: an Efficient Scheduling Algorithm for Pervasive Computing Environment

    No full text
    Real-time systems like pervasive computing have to complete executing a task within the predetermined time while ensuring that the execution results are logically correct. Such systems require intelligent scheduling methods that can adequately promptly distribute the given tasks to a processor(s). In this paper, we propose LSTR (Least Slack Time Rate first), a new and simple scheduling algorithm, for a multi-processor environment, and demonstrate its efficient performance through various tests

    A Study on Webtoon Generation Using CLIP and Diffusion Models

    No full text
    This study focuses on harnessing deep-learning-based text-to-image transformation techniques to help webtoon creators’ creative outputs. We converted publicly available datasets (e.g., MSCOCO) into a multimodal webtoon dataset using CartoonGAN. First, the dataset was leveraged for training contrastive language image pre-training (CLIP), a model composed of multi-lingual BERT and a Vision Transformer that learnt to associate text with images. Second, a pre-trained diffusion model was employed to generate webtoons through text and text-similar image input. The webtoon dataset comprised treatments (i.e., textual descriptions) paired with their corresponding webtoon illustrations. CLIP (operating through contrastive learning) extracted features from different data modalities and aligned similar data more closely within the same feature space while pushing dissimilar data apart. This model learnt the relationships between various modalities in multimodal data. To generate webtoons using the diffusion model, the process involved providing the CLIP features of the desired webtoon’s text with those of the most text-similar image to a pre-trained diffusion model. Experiments were conducted using both single- and continuous-text inputs to generate webtoons. In the experiments, both single-text and continuous-text inputs were used to generate webtoons, and the results showed an inception score of 7.14 when using continuous-text inputs. The text-to-image technology developed here could streamline the webtoon creation process for artists by enabling the efficient generation of webtoons based on the provided text. However, the current inability to generate webtoons from multiple sentences or images while maintaining a consistent artistic style was noted. Therefore, further research is imperative to develop a text-to-image model capable of handling multi-sentence and -lingual input while ensuring coherence in the artistic style across the generated webtoon images

    Data Independent Acquisition Based Bi-Directional Deep Networks for Biometric ECG Authentication

    No full text
    In this report, the study of non-fiducial based approaches for Electrocardiogram(ECG) biometric authentication is examined, and several excessive techniques are proposed to perform comparative experiments for evaluating the best possible approach for all the classification tasks. Non-fiducial methods are designed to extract the discriminative information of a signal without annotating fiducial points. However, this process requires peak detection to identify a heartbeat signal. Based on recent studies that usually rely on heartbeat segmentation, QRS detection is required, and the process can be complicated for ECG signals for which the QRS complex is absent. Thus, many studies only conduct biometric authentication tasks on ECG signals with QRS complexes, and are hindered by similar limitations. To overcome this issue, we proposed a data-independent acquisition method to facilitate highly generalizable signal processing and feature learning processes. This is achieved by enhancing random segmentation to avoid complicated fiducial feature extraction, along with auto-correlation to eliminate the phase difference due to random segmentation. Subsequently, a bidirectional recurrent neural network (RNN) with long short-term memory (LSTM) deep networks is utilized to automatically learn the features associated with the signal and to perform an authentication task. The experimental results suggest that the proposed data-independent approach using a BLSTM network achieves a relatively high classification accuracy for every dataset relative to the compared techniques. Moreover, it exhibited a significantly higher accuracy rate in experiments using ECG signals without the QRS complex. The results also revealed that data-dependent methods can only perform well for specified data types and amendments of data variations, whereas the presented approach can also be considered for generalization to other quasi-periodical biometric signal-based classification tasks in future studies
    corecore