9 research outputs found

    AI Deep Learning with Convolutional Neural Networks on Google Cloud Platform

    Get PDF
    Big data can help established firms drastically transformed themselves, bring forth a whole new industry, and enable companies of any size to innovate, gain competitive advantage, and enhance business performance. However, how the valuable information hidden in vast volumes of data can be used to provide solutions to difficult practical problems such as computer vision, natural language processing problems, to name a few, is the matter. AI Deep learning, a subset of traditional machine learning, is based on the concepts of artificial neural networks and offers powerful methods to solve these problems

    Do people communicate about their whereabouts? Investigating the relation between user-generated text messages and Foursquare check-in places

    Get PDF
    The social functionality of places (e.g. school, restaurant) partly determines human behaviors and reflects a region’s functional configuration. Semantic descriptions of places are thus valuable to a range of studies of humans and geographic spaces. Assuming their potential impacts on human verbalization behaviors, one possibility is to link the functions of places to verbal representations such as users’ postings in location-based social networks (LBSNs). In this study, we examine whether the heterogeneous user-generated text snippets found in LBSNs reliably reflect the semantic concepts attached with check-in places. We investigate Foursquare because its available categorization hierarchy provides rich a-priori semantic knowledge about its check-in places, which enables a reliable verification of the semantic concepts identified from user-generated text snippets. A latent semantic analysis is conducted on a large Foursquare check-in dataset. The results confirm that attached text messages can represent semantic concepts by demonstrating their large correspondence to the official Foursquare venue categorization. To further elaborate the representativeness of text messages, this work also performs an investigation on the textual terms to quantify their abilities of representing semantic concepts (i.e., representativeness), and another investigation on semantic concepts to quantify how well they can be represented by text messages (i.e., representability). The results shed light on featured terms with strong locational characteristics, as well as on distinctive semantic concepts with potentially strong impacts on human verbalizations

    BRHIM - Base de Registros Hospitalares para Informações e Metadados

    Get PDF
    Os riscos de reidentificação de dados hospitalares são altos e há uma demanda por eles em projetos de desenvolvimento e validação de Inteligência Artificial (IA). Este trabalho aborda os principais métodos de preparação de registros hospitalares usados para realizar estudos observacionais de maneira direcionada de avaliar o risco de reidentificação e o impacto da perda de informações que a anonimização produz nos resultados da IA. Uma revisão sobre o assunto é apresentada no início e após são apresentados dois artigos, sempre considerando o contexto da utilização de registros hospitalares em estudos epidemiológicos. O primeiro artigo propõe uma ontologia de domínio para definir um escopo para a tratar a anonimização. Os tipos de ataques, os tipos de dados e atributos, os modelos de privacidade, os tipos de uso da inteligência artificial e os diferentes delineamentos são apresentados. Foi feito um exemplo de instância da ontologia na ferramenta Web Protegé, disponível pela Universidade de Stanford para a construção de ontologias e que permite replica-la. O segundo artigo visa definir uma receita de preparação de prontuário hospitalar com 5 etapas para implementar a pseudo-anonimização, desidentificação e anonimização de dados e comparar os efeitos dessas etapas em uma aplicação da IA. Para isto, um evento Datathon foi realizado para desenvolver um preditor de IA de mortalidade hospitalar. Comparando os resultados da IA usando os dados originais e os dados anônimos, demonstrando uma diferenca inferior a 1% nos resultados da AUC-ROC, enquanto o risco de um paciente ser identificado foi reduzido em 95%, demonstrando que o preparo pode ser sistematizado agregando privacidade e computando a perda de informações, a fim de torná-los transparentes.The risks of re-identifying hospital data is high and there is a demand for them in projects for the development and validation of Artificial Intelligence (AI). This approach addresses the main methods of preparing hospital records used to carry out observational studies and in a directed way to assess the risk of re-identification and the impact of the loss of information that anonymization produces on AI results. A review of the review on the subject is presented at the beginning and after the literature is presented two articles, always considering the context of the use of hospital records in epidemiological studies. The first article proposes a domain ontology to define a scope for the search for anonymization. The types of attacks, the types of attacks, the types of data and attributes, the privacy models, the types of use that artificial intelligence devices and the different delineations are presented. An example of an ontology instance was made in the Web Protegé tool, made available by Stanford University for building ontologies and which allows replicating pregnant children and thus disseminating anonymization atology. The article aims to define a second hospital record preparation recipe with 5 steps for implementing pseudo-anonymization, de-identification and data anonymization and to compare the effects of these steps in an AI application. A Datathon event was conducted to develop an AI predictor of hospital mortality. Comparing the AI results using the original data and the anonymized data, which were identified as less than 1% results on the AUC-ROC, while the risk of a registered patient was recorded at 95%, demonstrating that the preparation can be systematized with privacy privacy and information loss in order to make them transparent

    The implications and impact of artificial intelligence, big data and HR analytics in HRM: A critical analysis of EU enterprises

    Get PDF
    This study offers a critical evaluation of HR analytics. Specifically, the ideas and concepts surrounding HR analytics, such as what is HR analytics, the development of HR analytics in organizations and how it may impact organizational performance. To advance and answer these research questions, this study relied on systematic reviews, logistic regression, interaction effect analysis, and interviews with the European Company Survey (ECS) to assess the interrelationship between HR analytics and organizational factors. Based on the findings, certain key areas are addressed. Firstly, research question 1 has succeeded in developing a more systematic and coherent definition of HR analytics and artificial intelligence in HR. It has also successfully identified some factors that influence the use of HR analytics in organisations. In particular, the results of study two found that factors such as firm age, firm size, the complexity of the firm process and the type of variable pay systems have been shown to be key indicators of why certain companies use HR analytics while others do not. Furthermore, the results for study three also provided a bigger picture of how organizational factors might be the reasons for explaining firms’ financial returns when examining the relationship between variables. In particular, factors such as employee motivation, the use of HR analytics, and variable pay systems are also believed to be critical in determining which factors affect a company’s financial returns. In addition, the study provides additional knowledge for five specific areas in analytics and artificial intelligence in HR, namely firm characteristics, challenges, key reasons to adopt HR software, new trends and user traits

    Big-Data Analytics and Cloud Computing: Theory, Algorithms and Applications

    No full text
    Discusses and explores theoretical concepts, principles, tools, techniques and deployment models in the context of Big Data Focuses on the latest developments in Data Science (aka Analytics) and, especially, their applications to real-world challenges Includes numerous cases studies for in-class analysis and assignment

    Big-Data analytics and cloud computing: Theory, algorithms and applications

    No full text
    This book reviews the theoretical concepts, leading-edge techniques and practical tools involved in the latest multi-disciplinary approaches addressing the challenges of big data. Illuminating perspectives from both academia and industry are presented by an international selection of experts in big data science. Topics and features: describes the innovative advances in theoretical aspects of big data, predictive analytics and cloud-based architectures; examines the applications and implementations that utilize big data in cloud architectures; surveys the state of the art in architectural approaches to the provision of cloud-based big data analytics functions; identifies potential research directions and technologies to facilitate the realization of emerging business models through big data approaches; provides relevant theoretical frameworks, empirical research findings, and numerous case studies; discusses real-world applications of algorithms and techniques to address the challenges of big datasets

    Reduzindo custos da deduplicação de dados utilizando heurísticas e computação em nuvem.

    Get PDF
    Na era de Big Data, na qual a escala dos dados provê inúmeros desafios para algoritmos clássicos, a tarefa de avaliar a qualidade dos dados pode se tornar custosa e apresentar tempos de execução elevados. Por este motivo, gerentes de negócio podem optar por terceirizar o monitoramento da qualidade de bancos de dados para um serviço específico, usualmente baseado em computação em nuvem. Neste contexto, este trabalho propõe abordagens para redução de custos da tarefa de deduplicação de dados, a qual visa detectar entidades duplicadas em bases de dados, no contexto de um serviço de qualidade de dados em nuvem. O trabalho tem como foco a tarefa de deduplicação de dados devido a sua importância em diversos contextos e sua elevada complexidade. É proposta a arquitetura em alto nível de um serviço de monitoramento de qualidade de dados que emprega o provisionamento dinâmico de recursos computacionais por meio da utilização de heurísticas e técnicas de aprendizado de máquina. Além disso, são propostas abordagens para a adoção de algoritmos incrementais de deduplicação de dados e controle do tamanho de blocos gerados na etapa de indexação do problema investigado. Foram conduzidos quatro experimentos diferentes visando avaliar a eficácia dos algoritmos de provisionamento de recursos propostos e das heurísticas empregadas no contexto de algoritmos incrementais de deduplicação de dados e de controle de tamanho dos blocos. Os resultados dos experimentos apresentam uma gama de opções englobando diferentes relações de custo e benefício, envolvendo principalmente: custo de infraestrutura do serviço e quantidade de violações de SLA ao longo do tempo. Outrossim, a avaliação empírica das heurísticas propostas para o problema de deduplicação incremental de dados também apresentou uma série de padrões nos resultados, envolvendo principalmente o tempo de execução das heurísticas e os resultados de eficácia produzidos. Por fim, foram avaliadas diversas heurísticas para controlar o tamanho dos blocos produzidos em uma tarefa de deduplicação de dados, cujos resultados de eficácia são bastante influenciados pelos valores dos parâmetros empregados. Além disso, as heurísticas apresentaram resultados de eficiência que variam significativamente, dependendo da estratégia de poda de blocos adotada. Os resultados dos quatro experimentos conduzidos apresentam suporte para demonstrar que diferentes estratégias (associadas ao provisionamento de recursos computacionais e aos algoritmos de qualidade de dados) adotadas por um serviço de qualidade de dados podem influenciar significativamente nos custos do serviço e, consequentemente, os custos repassados aos usuários do serviço.In the era of Big Data, in which the scale of the data provides many challenges for classical algorithms, the task of assessing the quality of datasets may become costly and complex. For this reason, business managers may opt to outsource the data quality monitoring for a specific cloud service for this purpose. In this context, this work proposes approaches for reducing the costs generated from solutions for the data deduplication problem, which aims to detect duplicate entities in datasets, in the context of a service for data quality monitoring. This work investigates the deduplication task due to its importance in a variety of contexts and its high complexity. We propose a high-level architecture of a service for data quality monitoring, which employs provisioning algorithms that use heuristics and machine learning techniques. Furthermore, we propose approaches for the adoption of incremental data quality algorithms and heuristics for controlling the size of the blocks produced in the indexing phase of the investigated problem. Four different experiments have been conducted to evaluate the effectiveness of the proposed provisioning algorithms, the heuristics for incremental record linkage and the heuristics to control block sizes for entity resolution. The results of the experiments show a range of options covering different tradeoffs, which involves: infrastructure costs of the service and the amount of SLA violations over time. In turn, the empirical evaluation of the proposed heuristics for incremental record linkage also presented a number of patterns in the results, which involves tradeoffs between the runtime of the heuristics and the obtained efficacy results. Lastly, the evaluation of the heuristics proposed to control block sizes have presented a large number of tradeoffs regarding execution time, amount of pruning approaches and the obtained efficacy results. Besides, the efficiency results of these heuristics may vary significantly, depending of the adopted pruning strategy. The results from the conducted experiments support the fact that different approaches (associated with cloud computing provisioning and the employed data quality algorithms) adopted by a data quality service may produce significant influence over the generated service costs, and thus, the final costs forwarded to the service customers

    BIG DATA и анализ высокого уровня

    Get PDF
    В сборнике опубликованы результаты научных исследований и разработок в области BIG DATA and Advanced Analytics для оптимизации IT-решений и бизнес-решений, а также тематических исследований в области медицины, образования и экологии
    corecore