17 research outputs found
Multi-Objective Genetic Algorithm for Multi-View Feature Selection
Multi-view datasets offer diverse forms of data that can enhance prediction
models by providing complementary information. However, the use of multi-view
data leads to an increase in high-dimensional data, which poses significant
challenges for the prediction models that can lead to poor generalization.
Therefore, relevant feature selection from multi-view datasets is important as
it not only addresses the poor generalization but also enhances the
interpretability of the models. Despite the success of traditional feature
selection methods, they have limitations in leveraging intrinsic information
across modalities, lacking generalizability, and being tailored to specific
classification tasks. We propose a novel genetic algorithm strategy to overcome
these limitations of traditional feature selection methods for multi-view data.
Our proposed approach, called the multi-view multi-objective feature selection
genetic algorithm (MMFS-GA), simultaneously selects the optimal subset of
features within a view and between views under a unified framework. The MMFS-GA
framework demonstrates superior performance and interpretability for feature
selection on multi-view datasets in both binary and multiclass classification
tasks. The results of our evaluations on three benchmark datasets, including
synthetic and real data, show improvement over the best baseline methods. This
work provides a promising solution for multi-view feature selection and opens
up new possibilities for further research in multi-view datasets
Metallurgical Process Simulation and Optimization
Metallurgy involves the art and science of extracting metals from their ores and modifying the metals for use. With thousands of years of development, many interdisciplinary technologies have been introduced into this traditional and large-scale industry. In modern metallurgical practices, modelling and simulation are widely used to provide solutions in the areas of design, control, optimization, and visualization, and are becoming increasingly significant in the progress of digital transformation and intelligent metallurgy. This Special Issue (SI), entitled “Metallurgical Process Simulation and Optimization”, has been organized as a platform to present the recent advances in the field of modelling and optimization of metallurgical processes, which covers the processes of electric/oxygen steel-making, secondary metallurgy, (continuous) casting, and processing. Eighteen articles have been included that concern various aspects of the topic
Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations
The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov
Políticas de Copyright de Publicações Científicas em Repositórios Institucionais: O Caso do INESC TEC
A progressiva transformação das práticas científicas, impulsionada pelo desenvolvimento das novas Tecnologias de Informação e Comunicação (TIC), têm possibilitado aumentar o acesso à informação, caminhando gradualmente para uma abertura do ciclo de pesquisa. Isto permitirá resolver a longo prazo uma adversidade que se tem colocado aos investigadores, que passa pela existência de barreiras que limitam as condições de acesso, sejam estas geográficas ou financeiras. Apesar da produção científica ser dominada, maioritariamente, por grandes editoras comerciais, estando sujeita às regras por estas impostas, o Movimento do Acesso Aberto cuja primeira declaração pública, a Declaração de Budapeste (BOAI), é de 2002, vem propor alterações significativas que beneficiam os autores e os leitores. Este Movimento vem a ganhar importância em Portugal desde 2003, com a constituição do primeiro repositório institucional a nível nacional. Os repositórios institucionais surgiram como uma ferramenta de divulgação da produção científica de uma instituição, com o intuito de permitir abrir aos resultados da investigação, quer antes da publicação e do próprio processo de arbitragem (preprint), quer depois (postprint), e, consequentemente, aumentar a visibilidade do trabalho desenvolvido por um investigador e a respetiva instituição. O estudo apresentado, que passou por uma análise das políticas de copyright das publicações científicas mais relevantes do INESC TEC, permitiu não só perceber que as editoras adotam cada vez mais políticas que possibilitam o auto-arquivo das publicações em repositórios institucionais, como também que existe todo um trabalho de sensibilização a percorrer, não só para os investigadores, como para a instituição e toda a sociedade. A produção de um conjunto de recomendações, que passam pela implementação de uma política institucional que incentive o auto-arquivo das publicações desenvolvidas no âmbito institucional no repositório, serve como mote para uma maior valorização da produção científica do INESC TEC.The progressive transformation of scientific practices, driven by the development of new Information and Communication Technologies (ICT), which made it possible to increase access to information, gradually moving towards an opening of the research cycle. This opening makes it possible to resolve, in the long term, the adversity that has been placed on researchers, which involves the existence of barriers that limit access conditions, whether geographical or financial. Although large commercial publishers predominantly dominate scientific production and subject it to the rules imposed by them, the Open Access movement whose first public declaration, the Budapest Declaration (BOAI), was in 2002, proposes significant changes that benefit the authors and the readers. This Movement has gained importance in Portugal since 2003, with the constitution of the first institutional repository at the national level. Institutional repositories have emerged as a tool for disseminating the scientific production of an institution to open the results of the research, both before publication and the preprint process and postprint, increase the visibility of work done by an investigator and his or her institution. The present study, which underwent an analysis of the copyright policies of INESC TEC most relevant scientific publications, allowed not only to realize that publishers are increasingly adopting policies that make it possible to self-archive publications in institutional repositories, all the work of raising awareness, not only for researchers but also for the institution and the whole society. The production of a set of recommendations, which go through the implementation of an institutional policy that encourages the self-archiving of the publications developed in the institutional scope in the repository, serves as a motto for a greater appreciation of the scientific production of INESC TEC
Flexible Job-shop Scheduling Problem with Sequencing Flexibility: Mathematical Models and Solution Algorithms
Marketing strategists usually advocate increased product variety to attend better market demand. Furthermore, companies increasingly acquire more advanced manufacturing systems to take care of the increased product mix. Manufacturing resources with different capabilities give a competitive advantage to the industry. Proper management of the current productions resources is crucial for a thriving industry. Flexible job shop scheduling problem (FJSP) is an extension of the classical Job-shop scheduling problem (JSP) where operations can be performed by a set of candidate capable machines. An extended version of the FJSP, entitled FJSP with sequencing flexibility (FJSPS), is studied in this work. The extension considers precedence between the operations in the form of a directed acyclic graph instead of sequential order. In this work, a mixed integer programming (MILP) formulation is presented. A single objective formulation to minimize the weighted tardiness for the FJSP with sequencing flexibility is proposed. A different objective to minimize makespan is also considered. Due to the NP-hardness of the problem, a novel hybrid bacterial foraging optimization algorithm (HBFOA) is developed to tackle the FJSP with sequencing flexibility. It is inspired by the behaviour of the E. coli bacteria. It mimics the process to seek for food. The HBFOA is enhanced with simulated annealing (SA). The HBFOA has been packaged in the form of a decision support system (DSS). A case study of a small and medium-sized enterprise (SME) manufacturing industry is presented to validate the proposed HBFOA and MILP. Additional numerical experiments with instances provided by the literature are considered. The results demonstrate that the HBFOA outperformed the classical dispatching rules and the best integer solution of MILP when minimizing the weighted tardiness and offered comparable results for the makespan instances. In this dissertation, another critical aspect has been studied. In the industry, skilled workers usually are able to operate a specific set of machines. Hence, managers need to decide the best operation assignments to machines and workers. However, they need also to balance the workload between workers while accomplishing the due dates. In this research, a multi-objective mathematical model that minimizes makespan, maximal worker workload and weighted tardiness is developed. This model is entitled dual-resource FJSP with sequencing flexibility (DRFJSPS). It covers both the machine assignment and also the worker selection. Due to the intractability of the DRFJSPS, an elitist non-dominated sorting genetic algorithm (NSGA-II) is developed to solve this problem efficiently. The algorithm provides a set of Pareto-optimal solutions that the decision makers can use to evaluate the trade-offs of the conflicting objectives. New instances are introduced to demonstrate the applicability of the model and algorithm. A multi-random-start local search algorithm has been developed to assess the effectiveness of the adapted NSGA-II. The comparison of the solutions demonstrates that the modified NSGA-II provides a non-dominated efficient set in a reasonable time. Finally, a situation where there are multiple process plans available for a specific job is considered. This scenario is useful to be able to react to the current status of the shop where unpredictable circumstances (machine breakdown, current product mix, due dates, demand, etc.) can be accurately tackled. The determination of the process plan also depends on its cost. For that, a balance between cost, and the accomplishment of due dates is required. A multi-objective mathematical model that minimizes makespan, total processing cost and weighted tardiness are proposed to determine the sequence and the process plan to be used. This model is entitled flexible job-shop scheduling problem with sequencing and process plan flexibility (FJSP-2F). New instances are generated to show the applicability of the model
Active learning of link specifications using decision tree learning
In this work we presented an implementation that uses decision trees to learn highly accurate link specifications. We compared our approach with three state-of-the-art classifiers on nine datasets and showed, that our approach gives comparable results in a reasonable amount of time. It was also shown, that we outperform the state-of-the-art on four datasets by up to 30%, but are still behind slightly on average. The effect of user feedback on the active learning variant was inspected pertaining to the number of iterations needed to deliver good results. It was shown that we can get FScores
above 0.8 with most datasets after 14 iterations
Solving hard subgraph problems in parallel
This thesis improves the state of the art in exact, practical algorithms for finding subgraphs. We study maximum clique, subgraph isomorphism, and maximum common subgraph problems. These are widely applicable: within computing science, subgraph problems arise in document clustering, computer vision, the design of communication protocols, model checking, compiler code generation, malware detection, cryptography, and robotics; beyond, applications occur in biochemistry, electrical engineering, mathematics, law enforcement, fraud detection, fault diagnosis, manufacturing, and sociology. We therefore consider both the ``pure'' forms of these problems, and variants with labels and other domain-specific constraints.
Although subgraph-finding should theoretically be hard, the constraint-based search algorithms we discuss can easily solve real-world instances involving graphs with thousands of vertices, and millions of edges. We therefore ask: is it possible to generate ``really hard'' instances for these problems, and if so, what can we learn? By extending research into combinatorial phase transition phenomena, we develop a better understanding of branching heuristics, as well as highlighting a serious flaw in the design of graph database systems.
This thesis also demonstrates how to exploit two of the kinds of parallelism offered by current computer hardware. Bit parallelism allows us to carry out operations on whole sets of vertices in a single instruction---this is largely routine. Thread parallelism, to make use of the multiple cores offered by all modern processors, is more complex. We suggest three desirable performance characteristics that we would like when introducing thread parallelism: lack of risk (parallel cannot be exponentially slower than sequential), scalability (adding more processing cores cannot make runtimes worse), and reproducibility (the same instance on the same hardware will take roughly
the same time every time it is run). We then detail the difficulties in guaranteeing these characteristics when using modern algorithmic techniques.
Besides ensuring that parallelism cannot make things worse, we also increase the likelihood of it making things better. We compare randomised work stealing to new tailored strategies, and perform experiments to identify the factors contributing to good speedups. We show that whilst load balancing is difficult, the primary factor influencing the results is the interaction between branching heuristics and parallelism. By using parallelism to explicitly offset the commitment made to weak early branching choices, we obtain parallel subgraph solvers which are substantially and consistently better than the best sequential algorithms
Feature Selection using Tabu Search with Learning Memory: Learning Tabu Search
International audienceFeature selection in classification can be modeled as a com-binatorial optimization problem. One of the main particularities of this problem is the large amount of time that may be needed to evaluate the quality of a subset of features. In this paper, we propose to solve this problem with a tabu search algorithm integrating a learning mechanism. To do so, we adapt to the feature selection problem, a learning tabu search algorithm originally designed for a railway network problem in which the evaluation of a solution is time-consuming. Experiments are conducted and show the benefit of using a learning mechanism to solve hard instances of the literature
Tune your brown clustering, please
Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly unexplored. Accordingly, we present information for practitioners on the behaviour of Brown clustering in order to assist hyper-parametre tuning, in the form of a theoretical model of Brown clustering utility. This model is then evaluated empirically in two sequence labelling tasks over two text types. We explore the dynamic between the input corpus size, chosen number of classes, and quality of the resulting clusters, which has an impact for any approach using Brown clustering. In every scenario that we examine, our results reveal that the values most commonly used for the clustering are sub-optimal
Procesamiento de información mediante Redes NeuroGliales Artificiales en clasificación y predicción
[Resumen]Recientes investigaciones evidencian que los astrocitos del sistema glial juegan un papel esencial en
el procesamiento de la información en el cerebro, existiendo comunicación bidireccional entre
neuronas y astrocitos (Sinapsis Tripartita). Dado que los Sistemas Conexionistas (SS.CC.) solo
consideran neuronas artificiales interconectadas, en esta tesis doctoral multidisciplinar se han
investigado por primera vez las consecuencias de añadirles astrocitos artificiales. Para ello, se ha
realizado un análisis exhaustivo de la eficacia de nuevas Redes NeuroGliales Artificiales (RR.NG.AA.)
versus Redes de Neuronas Artificiales multicapa clásicas (RR.NN.AA.). Los resultados indican que los
astrocitos: mejoran el rendimiento de RR.NN.AA., tanto cuando potencian como cuando deprimen
las conexiones sinápticas; esta mejoría no puede atribuirse al incremento de elementos de
procesado de la red, sino a las propiedades de los astrocitos; la eficacia de RR.NG.AA. con
potenciación vs. RR.NN.AA., aumenta al incrementarse la complejidad de la red; el grado de mejora
inducido por los astrocitos depende del problema tratado y de las propiedades intrínsecas de los
astrocitos.
Se puede concluir que los astrocitos artificiales permiten proponer a las RR.NG.AA. como un posible
nuevo paradigma en SS.CC. Además, han permitido estudiar fenómenos cerebrales aún no
demostrados, colaborando con la Neurociencia en la comprensión del sistema nervioso[Resumo]Recentes investigacións evidencian que os astrocitos do sistema glial xogan un papel esencial no
procesamento da información no cerebro, existindo comunicación bidireccional entre neuronas e
astrocitos (Sinapse Tripartita). Dado que os Sistemas Conexionistas (SS.CC.) só consideran neuronas
artificiais interconectadas, nesta tese doutoral multidisciplinar investigáronse por primeira vez as
consecuencias de engadirlles astrocitos artificiais. Para iso, realizouse unha análise exhaustiva da
eficacia de novas Redes NeuroGliais Artificiais (RR.NG.AA.) versus Redes de Neuronas Artificiais
multicapa clásicas (RR.NN.AA.). Os resultados indican que os astrocitos: melloran o rendemento de
RR.NN.AA., tanto cando potencian coma cando deprimen as conexións sinápticas; esta melloría non
pode atribuírse ao incremento de elementos de procesado da rede, senón ás propiedades dos
astrocitos; a eficacia de RR.NG.AA. con potenciación vs. RR.NN.AA. aumenta ao incrementarse a
complexidade da rede; o grao de mellora inducido polos astrocitos depende do problema tratado e
das propiedades intrínsecas dos astrocitos.
Pódese concluír que os astrocitos artificiais permiten propoñer ás RR.NG.AA. como un posible novo
paradigma en SS.CC. Ademais, permitiron estudar fenómenos cerebrais aínda non demostrados,
colaborando coa Neurociencia na comprensión do sistema nervioso.[Abstract]Recent research shows that glial system astrocytes play an essential role in the information
processing in the brain, as indicates the existence of bidirectional communication between
astrocytes and neurons (Tripartite Synapse). Since Connectionist Systems (CS) only have into account
interconnected artificial neurons, in this multidisciplinary thesis the consequences of adding artificial
astrocytes to them have been investigated for the first time. For this, an exhaustive analysis of the
performance of these new Artificial NeuroGlial Networks (ANGN) vs. classic multilayer ANN has been
carried out. The results indicate that artificial astrocytes: improve the performance of the ANN both
when they enhance and when they depress the synaptic connections; this improvement cannot be
accounted for an increased number of processing elements on the network, but rather for the
properties of astrocytes; the efficacy of ANGN‐potentiation vs. ANN increases as the complexity of
the network; relative network performance improvement by artificial astrocytes depends on the
problem tested and the intrinsic properties of astrocytes.
It can be concluded that artificial astrocytes allow ANGN to be proposed as a possible new paradigm
in CS. Furthermore, they have allowed to study brain behaviours not yet proved, collaborating with
Neuroscience in the understanding of the nervous syste