4,894 research outputs found

    Security in Data Mining- A Comprehensive Survey

    Get PDF
    Data mining techniques, while allowing the individuals to extract hidden knowledge on one hand, introduce a number of privacy threats on the other hand. In this paper, we study some of these issues along with a detailed discussion on the applications of various data mining techniques for providing security. An efficient classification technique when used properly, would allow an user to differentiate between a phishing website and a normal website, to classify the users as normal users and criminals based on their activities on Social networks (Crime Profiling) and to prevent users from executing malicious codes by labelling them as malicious. The most important applications of Data mining is the detection of intrusions, where different Data mining techniques can be applied to effectively detect an intrusion and report in real time so that necessary actions are taken to thwart the attempts of the intruder. Privacy Preservation, Outlier Detection, Anomaly Detection and PhishingWebsite Classification are discussed in this paper

    An Overview of the Use of Neural Networks for Data Mining Tasks

    Get PDF
    In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks

    Cloud-based machine learning for the detection of anonymous web proxies

    Get PDF

    Back to the past to charter the vinyl electronic market

    Get PDF
    For the past decades, the astounding rhythm of technological evolution and the massification of advanced digital devices, have forced several companies and entire market sectors to choose between reinventing themselves or perishing into obsoleteness. A great example is the Entertainment industry, where entire sectors were replaced by digital content platforms. This thesis focuses on, perhaps the most iconic media format of all time, the vinyl record. Vinyl was one of the first formats for audio reproduction, created around 1920, managing to become increasingly more popular until the 80’s, when the invention of the Compact Disc finally replaced the vinyl. This was mainly due to the lower costs of production of the CD, as well as requiring less space and maintenance, becoming easier to distribute. Interestingly, vinyl made a small comeback, mainly due to avid collectors who unknowingly created a community that kept the format alive. The goal of this study is to understand which factors, involved in the buying and selling of vinyl, influenced its price, with the initial hypothesis considering record labels and popular rankings to be some of the most contributing variables. To be able to evaluate this, four datasets were created in an endeavor to represent recent and past records of two different genres, Rock and Jazz, by extracting data from Discogs’ marketplace and Billboard’s Hot 100 chart. For this research, the chosen work methodology was CRISP-DM and the software programs for data analysis were SAS Enterprise Guide and SAS Enterprise Miner. Such approach allowed unveiling that an artist’s presence in the charts and their labels belonging to one of the ‘Big three’, do not always dictate their records at highest prices. The results also showed that features which measure popularity become more relevant in the ‘era’ where the record’s genre is more popular and that big record labels have been losing market share to an increasing number of independent labels.Nas últimas décadas, o ritmo alucinante da evolução tecnológica e a massificação do uso de dispositivos digitais avançados, forçou múltiplas empresas e mercados inteiros a escolher entre reinventarem a sua estratégia ou perecerem perante a possibilidade de se tornarem obsoletos. Um ótimo exemplo de uma destas indústrias é o ramo do Entretenimento, onde setores inteiros foram substituídos por plataformas de conteúdo digitais. Esta tese foca-se, no formato de média provavelmente mais icónico de todos os tempos, o vinil. Sendo um dos primeiros formatos para reprodução de áudio, foi criado por volta de 1920 e dominou o mercado até aos anos 80, altura em que a invenção do Compact Disc substituiu finalmente o vinil. Esta mudança foi, principalmente devida ao baixo custo de produção do CD, bem como, um formato mais portátil, facilitando assim a distribuição. Contudo, mais recentemente, o vinil recuperou alguma da sua popularidade, em grande parte por causa de colecionadores dedicados que criaram uma comunidade de entusiastas que mantiveram o formato vivo. Com este estudo pretende-se perceber, quais dos fatores envolvidos na compra e venda, são os mais importantes para o valor de venda de um vinil. De modo a conseguir avaliar esta hipótese, foram criados quatro conjuntos de dados para representar registos antigos e recentes de dois géneros musicais diferentes, Rock e Jazz. Os dados de base foram obtidos através de webscraping do mercado online no site Discogs e no ranking Hot 100 da Billboard. Durante todo o presente estudo, a metodologia de trabalho escolhida foi a CRISP-DM e os programas de software usados para a análise dos dados foram o SAS Enterprise Guide e o SAS Enterprise Miner. Esta abordagem revelou que, tanto a presença de um artista nos charts, como a editora correspondente pertencer a uma das “Big Three”, não garante que o preço dos discos seja muito elevado. Os resultados mostraram também que as variáveis que medem a popularidade tornam-se mais relevantes na “era” em que o género do disco é mais popular e que as grandes editoras têm vindo a perder quota de mercado para um número maior de editoras independentes

    Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop

    Get PDF

    Predictive analytics applied to Alzheimer’s disease : a data visualisation framework for understanding current research and future challenges

    Get PDF
    Dissertation as a partial requirement for obtaining a master’s degree in information management, with a specialisation in Business Intelligence and Knowledge Management.Big Data is, nowadays, regarded as a tool for improving the healthcare sector in many areas, such as in its economic side, by trying to search for operational efficiency gaps, and in personalised treatment, by selecting the best drug for the patient, for instance. Data science can play a key role in identifying diseases in an early stage, or even when there are no signs of it, track its progress, quickly identify the efficacy of treatments and suggest alternative ones. Therefore, the prevention side of healthcare can be enhanced with the usage of state-of-the-art predictive big data analytics and machine learning methods, integrating the available, complex, heterogeneous, yet sparse, data from multiple sources, towards a better disease and pathology patterns identification. It can be applied for the diagnostic challenging neurodegenerative disorders; the identification of the patterns that trigger those disorders can make possible to identify more risk factors, biomarkers, in every human being. With that, we can improve the effectiveness of the medical interventions, helping people to stay healthy and active for a longer period. In this work, a review of the state of science about predictive big data analytics is done, concerning its application to Alzheimer’s Disease early diagnosis. It is done by searching and summarising the scientific articles published in respectable online sources, putting together all the information that is spread out in the world wide web, with the goal of enhancing knowledge management and collaboration practices about the topic. Furthermore, an interactive data visualisation tool to better manage and identify the scientific articles is develop, delivering, in this way, a holistic visual overview of the developments done in the important field of Alzheimer’s Disease diagnosis.Big Data é hoje considerada uma ferramenta para melhorar o sector da saúde em muitas áreas, tais como na sua vertente mais económica, tentando encontrar lacunas de eficiência operacional, e no tratamento personalizado, selecionando o melhor medicamento para o paciente, por exemplo. A ciência de dados pode desempenhar um papel fundamental na identificação de doenças em um estágio inicial, ou mesmo quando não há sinais dela, acompanhar o seu progresso, identificar rapidamente a eficácia dos tratamentos indicados ao paciente e sugerir alternativas. Portanto, o lado preventivo dos cuidados de saúde pode ser bastante melhorado com o uso de métodos avançados de análise preditiva com big data e de machine learning, integrando os dados disponíveis, geralmente complexos, heterogéneos e esparsos provenientes de múltiplas fontes, para uma melhor identificação de padrões patológicos e da doença. Estes métodos podem ser aplicados nas doenças neurodegenerativas que ainda são um grande desafio no seu diagnóstico; a identificação dos padrões que desencadeiam esses distúrbios pode possibilitar a identificação de mais fatores de risco, biomarcadores, em todo e qualquer ser humano. Com isso, podemos melhorar a eficácia das intervenções médicas, ajudando as pessoas a permanecerem saudáveis e ativas por um período mais longo. Neste trabalho, é feita uma revisão do estado da arte sobre a análise preditiva com big data, no que diz respeito à sua aplicação ao diagnóstico precoce da Doença de Alzheimer. Isto foi realizado através da pesquisa exaustiva e resumo de um grande número de artigos científicos publicados em fontes online de referência na área, reunindo a informação que está amplamente espalhada na world wide web, com o objetivo de aprimorar a gestão do conhecimento e as práticas de colaboração sobre o tema. Além disso, uma ferramenta interativa de visualização de dados para melhor gerir e identificar os artigos científicos foi desenvolvida, fornecendo, desta forma, uma visão holística dos avanços científico feitos no importante campo do diagnóstico da Doença de Alzheimer

    Advanced Information Systems and Technologies

    Get PDF
    This book comprises the proceedings of the V International Scientific Conference "Advanced Information Systems and Technologies, AIST-2017". The proceeding papers cover issues related to system analysis and modeling, project management, information system engineering, intelligent data processing computer networking and telecomunications. They will be useful for students, graduate students, researchers who interested in computer science

    Advanced Information Systems and Technologies

    Get PDF
    This book comprises the proceedings of the V International Scientific Conference "Advanced Information Systems and Technologies, AIST-2017". The proceeding papers cover issues related to system analysis and modeling, project management, information system engineering, intelligent data processing computer networking and telecomunications. They will be useful for students, graduate students, researchers who interested in computer science
    corecore