1,094 research outputs found

    Secure Identification in Social Wireless Networks

    Get PDF
    The applications based on social networking have brought revolution towards social life and are continuously gaining popularity among the Internet users. Due to the advanced computational resources offered by the innovative hardware and nominal subscriber charges of network operators, most of the online social networks are transforming into the mobile domain by offering exciting applications and games exclusively designed for users on the go. Moreover, the mobile devices are considered more personal as compared to their desktop rivals, so there is a tendency among the mobile users to store sensitive data like contacts, passwords, bank account details, updated calendar entries with key dates and personal notes on their devices. The Project Social Wireless Network Secure Identification (SWIN) is carried out at Swedish Institute of Computer Science (SICS) to explore the practicality of providing the secure mobile social networking portal with advanced security features to tackle potential security threats by extending the existing methods with more innovative security technologies. In addition to the extensive background study and the determination of marketable use-cases with their corresponding security requirements, this thesis proposes a secure identification design to satisfy the security dimensions for both online and offline peers. We have implemented an initial prototype using PHP Socket and OpenSSL library to simulate the secure identification procedure based on the proposed design. The design is in compliance with 3GPP‟s Generic Authentication Architecture (GAA) and our implementation has demonstrated the flexibility of the solution to be applied independently for the applications requiring secure identification. Finally, the thesis provides strong foundation for the advanced implementation on mobile platform in future

    Unsupervised Biomedical Named Entity Recognition

    Get PDF
    Named entity recognition (NER) from text is an important task for several applications, including in the biomedical domain. Supervised machine learning based systems have been the most successful on NER task, however, they require correct annotations in large quantities for training. Annotating text manually is very labor intensive and also needs domain expertise. The purpose of this research is to reduce human annotation effort and to decrease cost of annotation for building NER systems in the biomedical domain. The method developed in this work is based on leveraging the availability of resources like UMLS (Unified Medical Language System), that contain a list of biomedical entities and a large unannotated corpus to build an unsupervised NER system that does not require any manual annotations. The method that we developed in this research has two phases. In the first phase, a biomedical corpus is automatically annotated with some named entities using UMLS through unambiguous exact matching which we call weakly-labeled data. In this data, positive examples are the entities in the text that exactly match in UMLS and have only one semantic type which belongs to the desired entity class to be extracted (for example, diseases and disorders). Negative examples are the entities in the text that exactly match in UMLS but are of semantic types other than those that belong to the desired entity class. These examples are then used to train a machine learning classifier using features that represent the contexts in which they appeared in the text. The trained classifier is applied back to the text to gather more examples iteratively through the process of self-training. The trained classifier is then capable of classifying mentions in an unseen text as of the desired entity class or not from the contexts in which they appear. Although the trained named entity detector is good at detecting the presence of entities of the desired class in text, it cannot determine their correct boundaries. In the second phase of our method, called “Boundary Expansion”, the correct boundaries of the entities are determined. This method is based on a novel idea that utilizes machine learning and UMLS. Training examples for boundary expansion are gathered directly from UMLS and do not require any manual annotations. We also developed a new WordNet based approach for boundary expansion. Our developed method was evaluated on three datasets - SemEval 2014 Task 7 dataset that has diseases and disorders as the desired entity class, GENIA dataset that has proteins, DNAs, RNAs, cell types, and cell lines as the desired entity classes, and i2b2 dataset that has problems, tests, and treatments as the desired entity classes. Our method performed well and obtained performance close to supervised methods on the SemEval dataset. On the other datasets, it outperformed an existing unsupervised method on most entity classes. Availability of a list of entity names with their semantic types and a large unannotated corpus are the only requirements of our method to work well. Given these, our method generalizes across different types of entities and different types of biomedical text. Being unsupervised, the method can be easily applied to new NER tasks without needing costly annotations

    Extracting Web Information using Representation Patterns

    Get PDF
    Feeding decision support systems with Web information typically requires sifting through an unwieldy amount of information that is available in human-friendly formats only. Our focus is on a scalable proposal to extract information from semi-structured documents in a structured format, with an emphasis on it being scalable and open. By semi-structured we mean that it must focus on informa tion that is rendered using regular formats, not free text; by scal able, we mean that the system must require a minimum amount of human intervention and it must not be targeted to extracting in formation from a particular domain or web site; by open, we mean that it must extract as much useful information as possible and not be subject to any pre-defined data model. In the literature, there is only one open but not scalable proposal, since it requires human supervision on a per-domain basis. In this paper, we present a new proposal that relies on a number of heuristics to identify patterns that are typically used to represent the information in a web docu ment. Our experimental results confirm that our proposal is very competitive in terms of effectiveness and efficiency.Ministerio de EconomĂ­a y Competitividad TIN2016-75394-RMinisterio de EconomĂ­a y Competitividad TIN2013-40848-

    Privacy considerations for secure identification in social wireless networks

    Get PDF
    This thesis focuses on privacy aspects of identification and key exchange schemes for mobile social networks. In particular, we consider identification schemes that combine wide area mobile communication with short range communication such as Bluetooth, WiFi. The goal of the thesis is to identify possible security threats to personal information of users and to define a framework of security and privacy requirements in the context of mobile social networking. The main focus of the work is on security in closed groups and the procedures of secure registration, identification and invitation of users in mobile social networks. The thesis includes an evaluation of the proposed identification and key exchange schemes and a proposal for a series of modifications that augments its privacy-preserving capabilities. The ultimate design provides secure and effective identity management in the context of, and in respect to, the protection of user identity privacy in mobile social networks

    APFA: Automated Product Feature Alignment for Duplicate Detection

    Get PDF
    To keep up with the growing interest of using Web shops for product comparison, we have developed a method that targets the problem of product duplicate detection. If duplicates can be discovered correctly and quickly, customers can compare products in an efficient manner. We build upon the state-of-the-art Multi-component Similarity Method (MSM) for product duplicate detection by developing an automated pre-processing phase that occurs before the similarities between products are calculated. Specifically, in this prior phase the features of products are aligned between Web shops, using metrics such as the data type, coverage, and diversity of each key, as well as the distribution and used measurement units of their corresponding values. With this information, the values of these keys can be more meaningfully and efficiently employed in the process of comparing products. Applying our method to a real-world dataset of 1629 TV's across 4 Web shops, we find that we increase the speed of the product similarity phase by roughly a factor 3 due to fewer meaningless comparisons, an improved brand analyzer, and a renewed title analyzer. Moreover, in terms of quality of duplicate detection, we significantly outperform MSM with an F 1-measure of 0.746 versus 0.525. </p
    • …
    corecore