Search CORE

1,776 research outputs found

SEGSys: A mapping system for segmentation analysis in energy

Author: Li Rongling
Liu Xiufeng
Nielsen Per Sieverts
Wang Yi
Publication venue
Publication date: 01/01/2019
Field of study

Customer segmentation analysis can give valuable insights into the energy efficiency of residential buildings. This paper presents a mapping system, SEGSys that enables segmentation analysis at the individual and the neighborhood levels. SEGSys supports the online and offline classification of customers based on their daily consumption patterns and consumption intensity. It also supports the segmentation analysis according to the social characteristics of customers of individual households or neighborhoods, as well as spatial geometries. SEGSys uses a three-layer architecture to model the segmentation system, including the data layer, the service layer, and the presentation layer. The data layer models data into a star schema within a data warehouse, the service layer provides data service through a RESTful interface, and the presentation layer interacts with users through a visual map. This paper showcases the system on the segmentation analysis using an electricity consumption data set and validates the effectiveness of the system

arXiv.org e-Print Archive

Online Research Database In Technology

Customer Segmentation with Subscription-based Online Media Customers

Author: Haatanen Henri
Publication venue: Helsingfors universitet
Publication date: 01/01/2022
Field of study

In the modern era, using personalization when reaching out to potential or current customers is essential for businesses to compete in their area of business. With large customer bases, this personalization becomes more difficult, thus segmenting entire customer bases into smaller groups helps businesses focus better on personalization and targeted business decisions. These groups can be straightforward, like segmenting solely based on age, or more complex, like taking into account geographic, demographic, behavioral, and psychographic differences among the customers. In the latter case, customer segmentation should be performed with Machine Learning, which can help find more hidden patterns within the data. Often, the number of features in the customer data set is so large that some form of dimensionality reduction is needed. That is also the case with this thesis, which includes 12802 unique article tags that are desired to be included in the segmentation. A form of dimensionality reduction called feature hashing is selected for hashing the tags for its ability to be introduced new tags in the future. Using hashed features in customer segmentation is a balancing act. With more hashed features, the evaluation metrics might give better results and the hashed features resemble more closely the unhashed article tag data, but with less hashed features the clustering process is faster, more memory-efficient and the resulting clusters are more interpretable to the business. Three clustering algorithms, K-means, DBSCAN, and BIRCH, are tested with eight feature hashing bin sizes for each, with promising results for K-means and BIRCH

Helsingin yliopiston digitaalinen arkisto

Autoencoder Based Iterative Modeling and Multivariate Time-Series Subsequence Clustering Algorithm

Author: Gühmann Clemens
Henning Lars
Köhne Jonas
Publication venue
Publication date: 23/09/2022
Field of study

This paper introduces an algorithm for the detection of change-points and the identification of the corresponding subsequences in transient multivariate time-series data (MTSD). The analysis of such data has become more and more important due to the increase of availability in many industrial fields. Labeling, sorting or filtering highly transient measurement data for training condition based maintenance (CbM) models is cumbersome and error-prone. For some applications it can be sufficient to filter measurements by simple thresholds or finding change-points based on changes in mean value and variation. But a robust diagnosis of a component within a component group for example, which has a complex non-linear correlation between multiple sensor values, a simple approach would not be feasible. No meaningful and coherent measurement data which could be used for training a CbM model would emerge. Therefore, we introduce an algorithm which uses a recurrent neural network (RNN) based Autoencoder (AE) which is iteratively trained on incoming data. The scoring function uses the reconstruction error and latent space information. A model of the identified subsequence is saved and used for recognition of repeating subsequences as well as fast offline clustering. For evaluation, we propose a new similarity measure based on the curvature for a more intuitive time-series subsequence clustering metric. A comparison with seven other state-of-the-art algorithms and eight datasets shows the capability and the increased performance of our algorithm to cluster MTSD online and offline in conjunction with mechatronic systems.Comment: 26 pages, 11 figures, for associated python code repositories see https://github.com/Jokonu/mt3scm and https://github.com/Jokonu/abimca; Minor spelling and grammar corrections, fixed wrong bibtex entry for SOStream, some improvements and corrections in formulas of section

arXiv.org e-Print Archive

Simple but Effective Unsupervised Classification for Specified Domain Images: A Case Study on Fungi Images

Author: Cheng Lin
Deng Huanxi
liu Zhaocong
Yang Xiaoyan
Zhang Fa
Zhang Zhenyu
Zhou Chichun
Publication venue
Publication date: 15/11/2023
Field of study

High-quality labeled datasets are essential for deep learning. Traditional manual annotation methods are not only costly and inefficient but also pose challenges in specialized domains where expert knowledge is needed. Self-supervised methods, despite leveraging unlabeled data for feature extraction, still require hundreds or thousands of labeled instances to guide the model for effective specialized image classification. Current unsupervised learning methods offer automatic classification without prior annotation but often compromise on accuracy. As a result, efficiently procuring high-quality labeled datasets remains a pressing challenge for specialized domain images devoid of annotated data. Addressing this, an unsupervised classification method with three key ideas is introduced: 1) dual-step feature dimensionality reduction using a pre-trained model and manifold learning, 2) a voting mechanism from multiple clustering algorithms, and 3) post-hoc instead of prior manual annotation. This approach outperforms supervised methods in classification accuracy, as demonstrated with fungal image data, achieving 94.1% and 96.7% on public and private datasets respectively. The proposed unsupervised classification method reduces dependency on pre-annotated datasets, enabling a closed-loop for data classification. The simplicity and ease of use of this method will also bring convenience to researchers in various fields in building datasets, promoting AI applications for images in specialized domains

arXiv.org e-Print Archive

Modelling the head and neck region for microwave imaging of cervical lymph nodes

Author: Pelicano Ana Catarina Domingos
Publication venue
Publication date: 01/01/2019
Field of study

Tese de mestrado integrado, Engenharia Biomédica e Biofísica (Radiações em Diagnóstico e Terapia), Universidade de Lisboa, Faculdade de Ciências, 2020O termo “cancro da cabeça e pescoço” refere-se a um qualquer tipo de cancro com início nas células epiteliais das cavidades oral e nasal, seios perinasais, glândulas salivares, faringe e laringe. Estes tumores malignos apresentaram, em 2018, uma incidência mundial de cerca de 887.659 novos casos e taxa de mortalidade superior a 51%. Aproximadamente 80% dos novos casos diagnosticados nesse ano revelaram a proliferação de células cancerígenas dos tumores para outras regiões do corpo através dos vasos sanguíneos e linfáticos das redondezas. De forma a determinar o estado de desenvolvimento do cancro e as terapias a serem seguidas, é fundamental a avaliação dos primeiros gânglios linfáticos que recebem a drenagem do tumor primário – os gânglios sentinela – e que, por isso, apresentam maior probabilidade de se tornarem os primeiros alvos das células tumorais. Gânglios sentinela saudáveis implicam uma menor probabilidade de surgirem metástases, isto é, novos focos tumorais decorrentes da disseminação do cancro para outros órgãos. O procedimento standard que permite o diagnóstico dos gânglios linfáticos cervicais, gânglios que se encontram na região da cabeça e pescoço, e o estadiamento do cancro consiste na remoção cirúrgica destes gânglios e subsequente histopatologia. Para além de ser um procedimento invasivo, a excisão cirúrgica dos gânglios linfáticos representa perigos tanto para a saúde mental e física dos pacientes, como para a sua qualidade de vida. Dores, aparência física deformada (devido a cicatrizes), perda da fala ou da capacidade de deglutição são algumas das repercussões que poderão advir da remoção de gânglios linfáticos da região da cabeça e pescoço. Adicionalmente, o risco de infeção e linfedema – acumulação de linfa nos tecidos intersticiais – aumenta significativamente com a remoção de uma grande quantidade de gânglios linfáticos saudáveis. Também os encargos para os sistemas de saúde são elevados devido à necessidade de monitorização destes pacientes e subsequentes terapias e cuidados associados à morbilidade, como é o caso da drenagem linfática manual e da fisioterapia. O desenvolvimento de novas tecnologias de imagem da cabeça e pescoço requer o uso de modelos realistas que simulem o comportamento e propriedades dos tecidos biológicos. A imagem médica por micro-ondas é uma técnica promissora e não invasiva que utiliza radiação não ionizante, isto é, sinais com frequências na gama das micro-ondas cujo comportamento depende do contraste dielétrico entre os diferentes tecidos atravessados, pelo que é possível identificar regiões ou estruturas de interesse e, consequentemente, complementar o diagnóstico. No entanto, devido às suas características, este tipo de modalidade apenas poderá ser utilizado para a avaliação de regiões anatómicas pouco profundas. Estudos indicam que os gânglios linfáticos com células tumorais possuem propriedades dielétricas distintas dos gânglios linfáticos saudáveis. Por esta razão e juntamente pelo facto da sua localização pouco profunda, consideramos que os gânglios linfáticos da região da cabeça e pescoço constituem um excelente candidato para a utilização de imagem médica por radar na frequência das micro-ondas como ferramenta de diagnóstico. Até à data, não foram efetuados estudos de desenvolvimento de modelos da região da cabeça e pescoço focados em representar realisticamente os gânglios linfáticos cervicais. Por este motivo, este projeto consistiu no desenvolvimento de dois geradores de fantomas tridimensionais da região da cabeça e pescoço – um gerador de fantomas numéricos simples (gerador I) e um gerador de fantomas numéricos mais complexos e anatomicamente realistas, que foi derivado de imagens de ressonância magnética e que inclui as propriedades dielétricas realistas dos tecidos biológicos (gerador II). Ambos os geradores permitem obter fantomas com diferentes níveis de complexidade e assim acompanhar diferentes fases no processo de desenvolvimento de equipamentos médicos de imagiologia por micro-ondas. Todos os fantomas gerados, e principalmente os fantomas anatomicamente realistas, poderão ser mais tarde impressos a três dimensões. O processo de construção do gerador I compreendeu a modelação da região da cabeça e pescoço em concordância com a anatomia humana e distribuição dos principais tecidos, e a criação de uma interface para a personalização dos modelos (por exemplo, a inclusão ou remoção de alguns tecidos é dependente do propósito para o qual cada modelo é gerado). O estudo minucioso desta região levou à inclusão de tecidos ósseos, musculares e adiposos, pele e gânglios linfáticos nos modelos. Apesar destes fantomas serem bastante simples, são essenciais para o início do processo de desenvolvimento de dispositivos de imagem médica por micro-ondas dedicados ao diagnóstico dos gânglios linfáticos cervicais. O processo de construção do gerador II foi fracionado em 3 grandes etapas devido ao seu elevado grau de complexidade. A primeira etapa consistiu na criação de uma pipeline que permitiu o processamento das imagens de ressonância magnética. Esta pipeline incluiu: a normalização dos dados, a subtração do background com recurso a máscaras binárias manualmente construídas, o tratamento das imagens através do uso de filtros lineares (como por exemplo, filtros passa-baixo ideal, Gaussiano e Butterworth) e não-lineares (por exemplo, o filtro mediana), e o uso de algoritmos não supervisionados de machine learning para a segmentação dos vários tecidos biológicos presentes na região cervical, tais como o K-means, Agglomerative Hierarchical Clustering, DBSCAN e BIRCH. Visto que cada algoritmo não supervisionado de machine learning anteriormente referido requer diferentes hiperparâmetros, é necessário proceder a um estudo pormenorizado que permita a compreensão do modo de funcionamento de cada algoritmo individualmente e a sua interação / performance com o tipo de dados tratados neste projeto (isto é, dados de exames de ressonâncias magnéticas) com vista a escolher empiricamente o leque de valores de cada hiperparâmetro que deve ser considerado, e ainda as combinações que devem ser testadas. Após esta fase, segue-se a avaliação da combinação de hiperparâmetros que resulta na melhor segmentação das estruturas anatómicas. Para esta avaliação são consideradas duas metodologias que foram combinadas: a utilização de métricas que permitam avaliar a qualidade do clustering (como por exemplo, o Silhoeutte Coefficient, o índice de Davies-Bouldin e o índice de Calinski-Harabasz) e ainda a inspeção visual. A segunda etapa foi dedicada à introdução manual de algumas estruturas, como a pele e os gânglios linfáticos, que não foram segmentadas pelos algoritmos de machine learning devido à sua fina espessura e pequena dimensão, respetivamente. Finalmente, a última etapa consistiu na atribuição das propriedades dielétricas, para uma frequência pré-definida, aos tecidos biológicos através do Modelo de Cole-Cole de quatro pólos. Tal como no gerador I, foi criada uma interface que permitiu ao utilizador decidir que características pretende incluir no fantoma, tais como: os tecidos a incluir (tecido adiposo, tecido muscular, pele e / ou gânglios linfáticos), relativamente aos gânglios linfáticos o utilizador poderá ainda determinar o seu número, dimensões, localização em níveis e estado clínico (saudável ou metastizado) e finalmente, o valor de frequência para o qual pretende obter as propriedades dielétricas (permitividade relativa e condutividade) de cada tecido biológico. Este projeto resultou no desenvolvimento de um gerador de modelos realistas da região da cabeça e pescoço com foco nos gânglios linfáticos cervicais, que permite a inserção de tecidos biológicos, tais como o tecidos muscular e adiposo, pele e gânglios linfáticos e aos quais atribui as propriedades dielétricas para uma determinada frequência na gama de micro-ondas. Estes modelos computacionais resultantes do gerador II, e que poderão ser mais tarde impressos em 3D, podem vir a ter grande impacto no processo de desenvolvimento de dispositivos médicos de imagem por micro-ondas que visam diagnosticar gânglios linfáticos cervicais, e consequentemente, contribuir para um processo não invasivo de estadiamento do cancro da cabeça e pescoço.Head and neck cancer is a broad term referring to any epithelial malignancies arising in the paranasal sinuses, nasal and oral cavities, salivary glands, pharynx, and larynx. In 2018, approximately 80% of the newly diagnosed head and neck cancer cases resulted in tumour cells spreading to neighbouring lymph and blood vessels. In order to determine cancer staging and decide which follow-up exams and therapy to follow, physicians excise and assess the Lymph Nodes (LNs) closest to the primary site of the head and neck tumour – the sentinel nodes – which are the ones with highest probability of being targeted by cancer cells. The standard procedure to diagnose the Cervical Lymph Nodes (CLNs), i.e. lymph nodes within the head and neck region, and determine the cancer staging frequently involves their surgical removal and subsequent histopathology. Besides being invasive, the removal of the lymph nodes also has negative impact on patients’ quality of life, it can be health threatening, and it is costly to healthcare systems due to the patients’ needs for follow-up treatments/cares. Anatomically realistic phantoms are required to develop novel technologies tailored to image head and neck regions. Medical MicroWave Imaging (MWI) is a promising non-invasive approach which uses non-ionizing radiation to screen shallow body regions, therefore cervical lymph nodes are excellent candidates to this imaging modality. In this project, a three-dimensional (3D) numerical phantom generator (generator I) and a Magnetic Resonance Imaging (MRI)-derived anthropomorphic phantom generator (generator II) of the head and neck region were developed to create phantoms with different levels of complexity and realism, which can be later 3D printed to test medical MWI devices. The process of designing the numerical phantom generator included the modelling of the head and neck regions according to their anatomy and the distribution of their main tissues, and the creation of an interface which allowed the users to personalise the model (e.g. include or remove certain tissues, depending on the purpose of each generated model). To build the anthropomorphic phantom generator, the modelling process included the creation of a pipeline of data processing steps to be applied to MRIs of the head and neck, followed by the development of algorithms to introduce additional tissues to the models, such as skin and lymph nodes, and finally, the assignment of the dielectric properties to the biological tissues. Similarly, this generator allowed users to decide the features they wish to include in the phantoms. This project resulted in the creation of a generator of 3D anatomically realistic head and neck phantoms which allows the inclusion of biological tissues such as skin, muscle tissue, adipose tissue, and LNs, and assigns state-of-the-art dielectric properties to the tissues. These phantoms may have a great impact in the development process of MWI devices aimed at screening and diagnosing CLNs, and consequently, contribute to a non-invasive staging of the head and neck cancer

Universidade de Lisboa: Repositório.UL

A survey of outlier detection methodologies

Author: Austin J.
Hodge V.J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

CiteSeerX

Crossref

White Rose Research Online

Psychographic And Behavioral Segmentation Of Food Delivery Application Customers To Increase Intention To Use

Author: Zárate Jorge Luis Bocanegra
Publication venue
Publication date: 02/02/2022
Field of study

Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThis study presents a framework for segmenting Food Delivery Application (FDA) customers based on psychographic and behavioral variables as an alternative to existing segmentation. Customer segments are proposed by applying clustering methods to primary data from an electronic survey. Psychographic and behavioral constructs are formulated as hypotheses based on existing literature, and then evaluated as segmentation variables regarding their discriminatory power for customer segmentation. Detected relevant variables are used in the application of clustering techniques to find adequate boundaries within customer groupings for segmentation purposes. Characterization of customer segments is performed and enriched with implications of findings in FDA marketing strategies. This paper contributes to theory by providing new findings on segmentation that are relevant for an online context. In addition, it contributes to practice by detailing implications of customer segments in an online sales strategy, allowing marketing managers and FDA businesses to capitalize knowledge in their conversion funnel designs

Repositório da Universidade Nova de Lisboa