4,435 research outputs found

    Vector representation of Internet domain names using Word embedding techniques

    Get PDF
    Word embeddings is a well-known set of techniques widely used in natural language processing ( NLP ). This thesis explores the use of word embeddings in a new scenario. A vector space model ( VSM) for Internet domain names ( DNS) is created by taking core ideas from NLP techniques and applying them to real anonymized DNS log queries from a large Internet Service Provider ( ISP) . The main goal is to find semantically similar domains only using information of DNS queries without any other knowledge about the content of those domains. A set of transformations through a detailed preprocessing pipeline with eight specific steps is defined to move the original problem to a problem in the NLP field. Once the preprocessing pipeline is applied and the DNS log files are transformed to a standard text corpus, we show that state-of-the-art techniques for word embeddings can be successfully applied in order to build what we called a DNS-VSM (a vector space model for Internet domain names). Different word embeddings techniques are evaluated in this work: Word2Vec (with Skip-Gram and CBOW architectures), App2Vec (with a CBOW architecture and adding time gaps between DNS queries), and FastText (which includes sub-word information). The obtained results are compared using various metrics from Information Retrieval theory and the quality of the learned vectors is validated with a third party source, namely, similar sites service offered by Alexa Internet, Inc2 . Due to intrinsic characteristics of domain names, we found that FastText is the best option for building a vector space model for DNS. Furthermore, its performance (considering the top 3 most similar learned vectors to each domain) is compared against two baseline methods: Random Guessing (returning randomly any domain name from the dataset) and Zero Rule (returning always the same most popular domains), outperforming both of them considerably. The results presented in this work can be useful in many engineering activities, with practical application in many areas. Some examples include websites recommendations based on similar sites, competitive analysis, identification of fraudulent or risky sites, parental-control systems, UX improvements (based on recommendations, spell correction, etc.), click-stream analysis, representation and clustering of users navigation profiles, optimization of cache systems in recursive DNS resolvers (among others). Finally, as a contribution to the research community a set of vectors of the DNS-VSM trained on a similar dataset to the one used in this thesis is released and made available for download through the github page in [1]. With this we hope that further work and research can be done using these vectors.La vectorización de palabras es un conjunto de técnicas bien conocidas y ampliamente usadas en el procesamiento del lenguaje natural ( PLN ). Esta tesis explora el uso de vectorización de palabras en un nuevo escenario. Un modelo de espacio vectorial ( VSM) para nombres de dominios de Internet ( DNS ) es creado tomando ideas fundamentales de PLN, l as cuales son aplicadas a consultas reales anonimizadas de logs de DNS de un gran proveedor de servicios de Internet ( ISP) . El objetivo principal es encontrar dominios relacionados semánticamente solamente usando información de consultas DNS sin ningún otro conocimiento sobre el contenido de esos dominios. Un conjunto de transformaciones a través de un detallado pipeline de preprocesamiento con ocho pasos específicos es definido para llevar el problema original a un problema en el campo de PLN. Una vez aplicado el pipeline de preprocesamiento y los logs de DNS son transformados a un corpus de texto estándar, se muestra que es posible utilizar con éxito técnicas del estado del arte respecto a vectorización de palabras para construir lo que denominamos un DNS-VSM (un modelo de espacio vectorial para nombres de dominio de Internet). Diferentes técnicas de vectorización de palabras son evaluadas en este trabajo: Word2Vec (con arquitectura Skip-Gram y CBOW) , App2Vec (con arquitectura CBOW y agregando intervalos de tiempo entre consultas DNS ), y FastText (incluyendo información a nivel de sub-palabra). Los resultados obtenidos se comparan usando varias métricas de la teoría de Recuperación de Información y la calidad de los vectores aprendidos es validada por una fuente externa, un servicio para obtener sitios similares ofrecido por Alexa Internet, Inc . Debido a características intrínsecas de los nombres de dominio, encontramos que FastText es la mejor opción para construir un modelo de espacio vectorial para DNS . Además, su performance es comparada contra dos métodos de línea base: Random Guessing (devolviendo cualquier nombre de dominio del dataset de forma aleatoria) y Zero Rule (devolviendo siempre los mismos dominios más populares), superando a ambos de manera considerable. Los resultados presentados en este trabajo pueden ser útiles en muchas actividades de ingeniería, con aplicación práctica en muchas áreas. Algunos ejemplos incluyen recomendaciones de sitios web, análisis competitivo, identificación de sitios riesgosos o fraudulentos, sistemas de control parental, mejoras de UX (basada en recomendaciones, corrección ortográfica, etc.), análisis de flujo de clics, representación y clustering de perfiles de navegación de usuarios, optimización de sistemas de cache en resolutores de DNS recursivos (entre otros). Por último, como contribución a la comunidad académica, un conjunto de vectores del DNS-VSM entrenado sobre un juego de datos similar al utilizado en esta tesis es liberado y hecho disponible para descarga a través de la página github en [1]. Con esto esperamos a que más trabajos e investigaciones puedan realizarse usando estos vectores

    Performance assessment of an architecture with adaptative interfaces for people with special needs

    Get PDF
    People in industrial societies carry more and more portable electronic devices (e.g., smartphone or console) with some kind of wireles connectivity support. Interaction with auto-discovered target devices present in the environment (e.g., the air conditioning of a hotel) is not so easy since devices may provide inaccessible user interfaces (e.g., in a foreign language that the user cannot understand). Scalability for multiple concurrent users and response times are still problems in this domain. In this paper, we assess an interoperable architecture, which enables interaction between people with some kind of special need and their environment. The assessment, based on performance patterns and antipatterns, tries to detect performance issues and also tries to enhance the architecture design for improving system performance. As a result of the assessment, the initial design changed substantially. We refactorized the design according to the Fast Path pattern and The Ramp antipattern. Moreover, resources were correctly allocated. Finally, the required response time was fulfilled in all system scenarios. For a specific scenario, response time was reduced from 60 seconds to less than 6 seconds

    Behavior modeling for a beacon-based indoor location system

    Get PDF
    In this work we performed a comparison between two different approaches to track a person in indoor environments using a locating system based on BLE technology with a smartphone and a smartwatch as monitoring devices. To do so, we provide the system architecture we designed and describe how the different elements of the proposed system interact with each other. Moreover, we have evaluated the system’s performance by computing the mean percentage error in the detection of the indoor position. Finally, we present a novel location prediction system based on neural embeddings, and a soft-attention mechanism, which is able to predict user’s next location with 67% accuracy

    Mapping Instructions and Visual Observations to Actions with Reinforcement Learning

    Full text link
    We propose to directly map raw visual observations and text input to actions for instruction execution. While existing approaches assume access to structured environment representations or use a pipeline of separately trained models, we learn a single model to jointly reason about linguistic and visual input. We use reinforcement learning in a contextual bandit setting to train a neural network agent. To guide the agent's exploration, we use reward shaping with different forms of supervision. Our approach does not require intermediate representations, planning procedures, or training different models. We evaluate in a simulated environment, and show significant improvements over supervised learning and common reinforcement learning variants.Comment: In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 201

    Detection of spam review on mobile app stores, evaluation of helpfulness of user reviews and extraction of quality aspects using machine learning techniques

    Get PDF
    As mobile devices have overtaken fixed Internet access, mobile applications and distribution platforms have gained in importance. App stores enable users to search and purchase mobile applications and then to give feedback in the form of reviews and ratings. A review might contain critical information about user experience, feature requests and bug reports. User reviews are valuable not only to developers and software organizations interested in learning the opinion of their customers but also to prospective users who would like to find out what others think about an app. Even though some surveys have inventoried techniques and methods in opinion mining and sentiment analysis, no systematic literature review (SLR) study had yet reported on mobile app store opinion mining and spam review detection problems. Mining opinions from app store reviews requires pre-processing at the text and content levels, including filtering-out nonopinionated content and evaluating trustworthiness and genuineness of the reviews. In addition, the relevance of the extracted features are not cross-validated with main software engineering concepts. This research project first conducted a systematic literature review (SLR) on the evaluation of mobile app store opinion mining studies. Next, to fill the identified gaps in the literature, we used a novel convolutional neural network to learn document representation for deceptive spam review detection by characterizing an app store review dataset which includes truthful and spam reviews for the first time in the literature. Our experiments reported that our neural network based method achieved 82.5% accuracy, while a baseline Support Vector Machine (SVM) classification model reached only 70% accuracy despite leveraging various feature combinations. We next compared four classification models to assess app store user review helpfulness and proposed a predictive model which makes use of review meta-data along with structural and lexical features for helpfulness prediction. In the last part of this research study, we constructed an annotated app store review dataset for the aspect extraction task, based on ISO 25010 - Systems and software Product Quality Requirements and Evaluation standard and two deep neural network models: Bi-directional Long-Short Term Memory and Conditional Random Field (Bi-LSTM+CRF) and Deep Convolutional Neural Networks and Conditional Random Field (CNN+CRF) for aspect extraction from app store user reviews. Both models achieved nearly 80% F1 score (the weighted average of precision and recall which takes both false positives and false negatives into account) in exact aspect matching and 86% F1 score in partial aspect matching

    Recommendation & mobile systems - a state of the art for tourism

    Get PDF
    Recommendation systems have been growing in number over the last fifteen years. To evolve and adapt to the demands of the actual society, many paradigms emerged giving birth to even more paradigms and hybrid approaches. These approaches contain strengths and weaknesses that need to be evaluated according to the knowledge area in which the system is going to be implemented. Mobile devices have also been under an incredible growth rate in every business area, and there are already lots of mobile based systems to assist tourists. This explosive growth gave birth to different mobile applications, each having their own advantages and disadvantages. Since recommendation and mobile systems might as well be integrated, this work intends to present the current state of the art in tourism mobile and recommendation systems, as well as to state their advantages and disadvantages

    Application of Artificial Intelligence and Data Science in Detecting the Impact of Usability from Evaluation of Mobile Health Applications

    Get PDF
    Mobile health (mHealth) applications have demonstrated immense potential for facilitating preventative care and disease management through intuitive platforms. However, realizing transformational health objectives relies on creating accessible tools optimized for different users. This research analyses mHealth app usability data sourced from online repositories to reveal the impact of usability (ease of use) from evaluating mobile health applications. Thoroughly examining interfaces with a utilization of statistical tests of significance, platform, integra-tions, and various application features shows complex relationships between usability and users experience. This work shows that applying random forest models can accurately classify the ease-of-use of mHealth applications. This work sheds light on the connections between design choices and their effects, guiding intentional improvements to expand the reach of mHealth. It does so by providing insights into the subtle ways that people interact with mHealth applications. The methodologies and findings provide actionable insights for developers and practitioners passionate about advancing digital healthcare

    Application of a Layered Hidden Markov Model in the Detection of Network Attacks

    Get PDF
    Network-based attacks against computer systems are a common and increasing problem. Attackers continue to increase the sophistication and complexity of their attacks with the goal of removing sensitive data or disrupting operations. Attack detection technology works very well for the detection of known attacks using a signature-based intrusion detection system. However, attackers can utilize attacks that are undetectable to those signature-based systems whether they are truly new attacks or modified versions of known attacks. Anomaly-based intrusion detection systems approach the problem of attack detection by detecting when traffic differs from a learned baseline. In the case of this research, the focus was on a relatively new area known as payload anomaly detection. In payload anomaly detection, the system focuses exclusively on the payload of packets and learns the normal contents of those payloads. When a payload\u27s contents differ from the norm, an anomaly is detected and may be a potential attack. A risk with anomaly-based detection mechanisms is they suffer from high false positive rates which reduce their effectiveness. This research built upon previous research in payload anomaly detection by combining multiple techniques of detection in a layered approach. The layers of the system included a high-level navigation layer, a request payload analysis layer, and a request-response analysis layer. The system was tested using the test data provided by some earlier payload anomaly detection systems as well as new data sets. The results of the experiments showed that by combining these layers of detection into a single system, there were higher detection rates and lower false positive rates

    Responsive and Minimalist App Based on Explainable AI to Assess Palliative Care Needs during Bedside Consultations on Older Patients

    Full text link
    [EN] Palliative care is an alternative to standard care for gravely ill patients that has demonstrated many clinical benefits in cost-effective interventions. It is expected to grow in demand soon, so it is necessary to detect those patients who may benefit from these programs using a personalised objective criterion at the correct time. Our goal was to develop a responsive and minimalist web application embedding a 1-year mortality explainable predictive model to assess palliative care at bedside consultation. A 1-year mortality predictive model has been trained. We ranked the input variables and evaluated models with an increasing number of variables. We selected the model with the seven most relevant variables. Finally, we created a responsive, minimalist and explainable app to support bedside decision making for older palliative care. The selected variables are age, medication, Charlson, Barthel, urea, RDW-SD and metastatic tumour. The predictive model achieved an AUC ROC of 0.83 [CI: 0.82, 0.84]. A Shapley value graph was used for explainability. The app allows identifying patients in need of palliative care using the bad prognosis criterion, which can be a useful, easy and quick tool to support healthcare professionals in obtaining a fast recommendation in order to allocate health resources efficiently.This work was supported by the InAdvance project (H2020-SC1-BHC-2018-2020 grant agreement number 825750.) and the CANCERLEss project (H2020-SC1-2020-Single-Stage-RTD grant agreement number 965351), both funded by the European Union's Horizon 2020 research and innovation programme.Blanes-Selva, V.; Doñate-Martínez, A.; Linklater, G.; Garcés-Ferrer, J.; Garcia-Gomez, JM. (2021). Responsive and Minimalist App Based on Explainable AI to Assess Palliative Care Needs during Bedside Consultations on Older Patients. Sustainability. 13(17):1-11. https://doi.org/10.3390/su13179844111131
    corecore