58 research outputs found

    Multi-Objective Genetic Algorithm for Multi-View Feature Selection

    Full text link
    Multi-view datasets offer diverse forms of data that can enhance prediction models by providing complementary information. However, the use of multi-view data leads to an increase in high-dimensional data, which poses significant challenges for the prediction models that can lead to poor generalization. Therefore, relevant feature selection from multi-view datasets is important as it not only addresses the poor generalization but also enhances the interpretability of the models. Despite the success of traditional feature selection methods, they have limitations in leveraging intrinsic information across modalities, lacking generalizability, and being tailored to specific classification tasks. We propose a novel genetic algorithm strategy to overcome these limitations of traditional feature selection methods for multi-view data. Our proposed approach, called the multi-view multi-objective feature selection genetic algorithm (MMFS-GA), simultaneously selects the optimal subset of features within a view and between views under a unified framework. The MMFS-GA framework demonstrates superior performance and interpretability for feature selection on multi-view datasets in both binary and multiclass classification tasks. The results of our evaluations on three benchmark datasets, including synthetic and real data, show improvement over the best baseline methods. This work provides a promising solution for multi-view feature selection and opens up new possibilities for further research in multi-view datasets

    Autoencoder-assisted latent representation learning for survival prediction and multi-view clustering on multi-omics cancer subtyping

    Get PDF
    Cancer subtyping (or cancer subtypes identification) based on multi-omics data has played an important role in advancing diagnosis, prognosis and treatment, which triggers the development of advanced multi-view clustering algorithms. However, the high-dimension and heterogeneity of multi-omics data make great effects on the performance of these methods. In this paper, we propose to learn the informative latent representation based on autoencoder (AE) to naturally capture nonlinear omic features in lower dimensions, which is helpful for identifying the similarity of patients. Moreover, to take advantage of survival information or clinical information, a multi-omic survival analysis approach is embedded when integrating the similarity graph of heterogeneous data at the multi-omics level. Then, the clustering method is performed on the integrated similarity to generate subtype groups. In the experimental part, the effectiveness of the proposed framework is confirmed by evaluating five different multi-omics datasets, taken from The Cancer Genome Atlas. The results show that AE-assisted multi-omics clustering method can identify clinically significant cancer subtypes

    Computational strategies for single-cell multi-omics integration

    Get PDF
    Single-cell omics technologies are currently solving biological and medical problems that earlier have remained elusive, such as discovery of new cell types, cellular differentiation trajectories and communication networks across cells and tissues. Current advances especially in single-cell multi-omics hold high potential for breakthroughs by integration of multiple different omics layers. To pair with the recent biotechnological developments, many computational approaches to process and analyze single-cell multi-omics data have been proposed. In this review, we first introduce recent developments in single-cell multi-omics in general and then focus on the available data integration strategies. The integration approaches are divided into three categories: early, intermediate, and late data integration. For each category, we describe the underlying conceptual principles and main characteristics, as well as provide examples of currently available tools and how they have been applied to analyze single-cell multi-omics data. Finally, we explore the challenges and prospective future directions of single-cell multi-omics data integration, including examples of adopting multi-view analysis approaches used in other disciplines to single-cell multi-omics.</p

    An inter-domain supervision framework for collaborative clustering of data with mixed types.

    Get PDF
    We propose an Inter-Domain Supervision (IDS) clustering framework to discover clusters within diverse data formats, mixed-type attributes and different sources of data. This approach can be used for combined clustering of diverse representations of the data, in particular where data comes from different sources, some of which may be unreliable or uncertain, or for exploiting optional external concept set labels to guide the clustering of the main data set in its original domain. We additionally take into account possible incompatibilities in the data via an automated inter-domain compatibility analysis. Our results in clustering real data sets with mixed numerical, categorical, visual and text attributes show that the proposed IDS clustering framework gives improved clustering results compared to conventional methods, over a wide range of parameters. Thus the automatically extracted knowledge, in the form of seeds or constraints, obtained from clustering one domain, can provide additional knowledge to guide the clustering in another domain. Additional empirical evaluations further show that our approach, especially when using selective mutual guidance between domains, outperforms common baselines such as clustering either domain on its own or clustering all domains converted to a single target domain. Our approach also outperforms other specialized multiple clustering methods, such as the fully independent ensemble clustering and the tightly coupled multiview clustering, after they were adapted to the task of clustering mixed data. Finally, we present a real life application of our IDS approach to the cluster-based automated image annotation problem and present evaluation results on a benchmark data set, consisting of images described with their visual content along with noisy text descriptions, generated by users on the social media sharing website, Flickr

    Otimização multi-objetivo em aprendizado de máquina

    Get PDF
    Orientador: Fernando José Von ZubenTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Regressão logística multinomial regularizada, classificação multi-rótulo e aprendizado multi-tarefa são exemplos de problemas de aprendizado de máquina em que objetivos conflitantes, como funções de perda e penalidades que promovem regularização, devem ser simultaneamente minimizadas. Portanto, a perspectiva simplista de procurar o modelo de aprendizado com o melhor desempenho deve ser substituída pela proposição e subsequente exploração de múltiplos modelos de aprendizado eficientes, cada um caracterizado por um compromisso (trade-off) distinto entre os objetivos conflitantes. Comitês de máquinas e preferências a posteriori do tomador de decisão podem ser implementadas visando explorar adequadamente este conjunto diverso de modelos de aprendizado eficientes, em busca de melhoria de desempenho. A estrutura conceitual multi-objetivo para aprendizado de máquina é suportada por três etapas: (1) Modelagem multi-objetivo de cada problema de aprendizado, destacando explicitamente os objetivos conflitantes envolvidos; (2) Dada a formulação multi-objetivo do problema de aprendizado, por exemplo, considerando funções de perda e termos de penalização como objetivos conflitantes, soluções eficientes e bem distribuídas ao longo da fronteira de Pareto são obtidas por um solver determinístico e exato denominado NISE (do inglês Non-Inferior Set Estimation); (3) Esses modelos de aprendizado eficientes são então submetidos a um processo de seleção de modelos que opera com preferências a posteriori, ou a filtragem e agregação para a síntese de ensembles. Como o NISE é restrito a problemas de dois objetivos, uma extensão do NISE capaz de lidar com mais de dois objetivos, denominada MONISE (do inglês Many-Objective NISE), também é proposta aqui, sendo uma contribuição adicional que expande a aplicabilidade da estrutura conceitual proposta. Para atestar adequadamente o mérito da nossa abordagem multi-objetivo, foram realizadas investigações mais específicas, restritas à aprendizagem de modelos lineares regularizados: (1) Qual é o mérito relativo da seleção a posteriori de um único modelo de aprendizado, entre os produzidos pela nossa proposta, quando comparado com outras abordagens de modelo único na literatura? (2) O nível de diversidade dos modelos de aprendizado produzidos pela nossa proposta é superior àquele alcançado por abordagens alternativas dedicadas à geração de múltiplos modelos de aprendizado? (3) E quanto à qualidade de predição da filtragem e agregação dos modelos de aprendizado produzidos pela nossa proposta quando aplicados a: (i) classificação multi-classe, (ii) classificação desbalanceada, (iii) classificação multi-rótulo, (iv) aprendizado multi-tarefa, (v) aprendizado com multiplos conjuntos de atributos? A natureza determinística de NISE e MONISE, sua capacidade de lidar adequadamente com a forma da fronteira de Pareto em cada problema de aprendizado, e a garantia de sempre obter modelos de aprendizado eficientes são aqui pleiteados como responsáveis pelos resultados promissores alcançados em todas essas três frentes de investigação específicasAbstract: Regularized multinomial logistic regression, multi-label classification, and multi-task learning are examples of machine learning problems in which conflicting objectives, such as losses and regularization penalties, should be simultaneously minimized. Therefore, the narrow perspective of looking for the learning model with the best performance should be replaced by the proposition and further exploration of multiple efficient learning models, each one characterized by a distinct trade-off among the conflicting objectives. Committee machines and a posteriori preferences of the decision-maker may be implemented to properly explore this diverse set of efficient learning models toward performance improvement. The whole multi-objective framework for machine learning is supported by three stages: (1) The multi-objective modelling of each learning problem, explicitly highlighting the conflicting objectives involved; (2) Given the multi-objective formulation of the learning problem, for instance, considering loss functions and penalty terms as conflicting objective functions, efficient solutions well-distributed along the Pareto front are obtained by a deterministic and exact solver named NISE (Non-Inferior Set Estimation); (3) Those efficient learning models are then subject to a posteriori model selection, or to ensemble filtering and aggregation. Given that NISE is restricted to two objective functions, an extension for many objectives, named MONISE (Many Objective NISE), is also proposed here, being an additional contribution and expanding the applicability of the proposed framework. To properly access the merit of our multi-objective approach, more specific investigations were conducted, restricted to regularized linear learning models: (1) What is the relative merit of the a posteriori selection of a single learning model, among the ones produced by our proposal, when compared with other single-model approaches in the literature? (2) Is the diversity level of the learning models produced by our proposal higher than the diversity level achieved by alternative approaches devoted to generating multiple learning models? (3) What about the prediction quality of ensemble filtering and aggregation of the learning models produced by our proposal on: (i) multi-class classification, (ii) unbalanced classification, (iii) multi-label classification, (iv) multi-task learning, (v) multi-view learning? The deterministic nature of NISE and MONISE, their ability to properly deal with the shape of the Pareto front in each learning problem, and the guarantee of always obtaining efficient learning models are advocated here as being responsible for the promising results achieved in all those three specific investigationsDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétrica2014/13533-0FAPES

    Integrating Deep Learning into CAD/CAE System: Generative Design and Evaluation of 3D Conceptual Wheel

    Full text link
    Engineering design research integrating artificial intelligence (AI) into computer-aided design (CAD) and computer-aided engineering (CAE) is actively being conducted. This study proposes a deep learning-based CAD/CAE framework in the conceptual design phase that automatically generates 3D CAD designs and evaluates their engineering performance. The proposed framework comprises seven stages: (1) 2D generative design, (2) dimensionality reduction, (3) design of experiment in latent space, (4) CAD automation, (5) CAE automation, (6) transfer learning, and (7) visualization and analysis. The proposed framework is demonstrated through a road wheel design case study and indicates that AI can be practically incorporated into an end-use product design project. Engineers and industrial designers can jointly review a large number of generated 3D CAD models by using this framework along with the engineering performance results estimated by AI and find conceptual design candidates for the subsequent detailed design stage

    Semi-Automated Image Analysis for the Assessment of Megafaunal Densities at the Arctic Deep-Sea Observatory HAUSGARTEN

    Get PDF
    Megafauna play an important role in benthic ecosystem function and are sensitive indicators of environmental change. Non-invasive monitoring of benthic communities can be accomplished by seafloor imaging. However, manual quantification of megafauna in images is labor-intensive and therefore, this organism size class is often neglected in ecosystem studies. Automated image analysis has been proposed as a possible approach to such analysis, but the heterogeneity of megafaunal communities poses a non-trivial challenge for such automated techniques. Here, the potential of a generalized object detection architecture, referred to as iSIS (intelligent Screening of underwater Image Sequences), for the quantification of a heterogenous group of megafauna taxa is investigated. The iSIS system is tuned for a particular image sequence (i.e. a transect) using a small subset of the images, in which megafauna taxa positions were previously marked by an expert. To investigate the potential of iSIS and compare its results with those obtained from human experts, a group of eight different taxa from one camera transect of seafloor images taken at the Arctic deep-sea observatory HAUSGARTEN is used. The results show that inter- and intra-observer agreements of human experts exhibit considerable variation between the species, with a similar degree of variation apparent in the automatically derived results obtained by iSIS. Whilst some taxa (e. g. Bathycrinus stalks, Kolga hyalina, small white sea anemone) were well detected by iSIS (i. e. overall Sensitivity: 87%, overall Positive Predictive Value: 67%), some taxa such as the small sea cucumber Elpidia heckeri remain challenging, for both human observers and iSIS
    corecore