33 research outputs found
Biclustering Performance Evaluation of Cheng and Church Algorithm and Iterative Signature Algorithm
Biclustering has been widely applied in recent years. Various algorithms have been developed to perform biclustering applied to various cases. However, only a few studies have evaluated the performance of bicluster algorithms. Therefore, this study evaluates the performance of biclustering algorithms, namely the Cheng and Church algorithm (CC algorithm) and the Iterative Signature Algorithm (ISA). Evaluation of the performance of the biclustering algorithm is carried out in the form of a comparative study of biclustering results in terms of membership, characteristics, distribution of biclustering results, and performance evaluation. The performance evaluation uses two evaluation functions: the intra-bicluster and the inter-bicluster. The results show that, from an intra-bicluster evaluation perspective, the optimal bicluster group of the CC algorithm produces bicluster quality which tends to be better than the ISA. The biclustering results between the two algorithms in inter-bicluster evaluation produce a deficient level of similarity (20-31 percent). This is indicated by the differences in the results of regional membership and the characteristics of the identifying variables. The biclustering results of the CC algorithm tend to be homogeneous and have local characteristics. Meanwhile, the results of biclustering ISA tend to be heterogeneous and have global characteristics. In addition, the results of biclustering ISA are also robust
Tweet categorization by combining content and structural knowledge
Twitter is a worldwide social media platform where millions of people frequently express ideas and opinions
about any topic. This widespread success makes the analysis of tweets an interesting and possibly
lucrative task, being those tweets rarely objective and becoming the targeting for large-scale analysis. In
this paper, we explore the idea of integrating two fundamental aspects of a tweet, the proper textual
content and its underlying structural information, when addressing the tweet categorization task. Thus,
not only we analyze textual content of tweets but also analyze the structural information provided by the
relationship between tweets and users, and we propose different methods for effectively combining both
kinds of feature models extracted from the different knowledge sources. In order to test our approach, we
address the specific task of determining the political opinion of Twitter users within their political context,
observing that our most refined knowledge integration approach performs remarkably better (about
5 points above) than the textual-based classic modelMinisterio de Economía y Competitividad TIN2012-38536-C03-02Junta de Andalucía P11-TIC-7684 M
Statistical Techniques for Exploratory Analysis of Structured Three-Way and Dynamic Network Data.
In this thesis, I develop different techniques for the pattern
extraction and visual exploration of a collection of data matrices.
Specifically, I present methods to help home in on and visualize an
underlying structure and its evolution over ordered (e.g., time) or
unordered (e.g., experimental conditions) index sets. The first part
of the thesis introduces a biclustering technique for such three
dimensional data arrays. This technique is capable of discovering
potentially overlapping groups of samples and variables that evolve
similarly with respect to a subset of conditions. To facilitate and
enhance visual exploration, I introduce a framework that utilizes
kernel smoothing to guide the estimation of bicluster responses over
the array. In the second part of the thesis, I introduce two matrix
factorization models. The first is a data integration model that
decomposes the data into two factors: a basis common to all data
matrices, and a coefficient matrix that varies for each data matrix.
The second model is meant for visual clustering of nodes in dynamic
network data, which often contains complex evolving structure. Hence,
this approach is more flexible and additionally lets the basis evolve
for each matrix in the array. Both models utilize a regularization
within the framework of non-negative matrix factorization to encourage
local smoothness of the basis and coefficient matrices, which improves
interpretability and highlights the structural patterns underlying the
data, while mitigating noise effects. I also address computational
aspects of applying regularized non-negative matrix factorization
models to large data arrays by presenting multiple algorithms,
including an approximation algorithm based on alternating least
squares.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/99838/1/smankad_1.pd
Evaluating Overfit and Underfit in Models of Network Community Structure
A common data mining task on networks is community detection, which seeks an
unsupervised decomposition of a network into structural groups based on
statistical regularities in the network's connectivity. Although many methods
exist, the No Free Lunch theorem for community detection implies that each
makes some kind of tradeoff, and no algorithm can be optimal on all inputs.
Thus, different algorithms will over or underfit on different inputs, finding
more, fewer, or just different communities than is optimal, and evaluation
methods that use a metadata partition as a ground truth will produce misleading
conclusions about general accuracy. Here, we present a broad evaluation of over
and underfitting in community detection, comparing the behavior of 16
state-of-the-art community detection algorithms on a novel and structurally
diverse corpus of 406 real-world networks. We find that (i) algorithms vary
widely both in the number of communities they find and in their corresponding
composition, given the same input, (ii) algorithms can be clustered into
distinct high-level groups based on similarities of their outputs on real-world
networks, and (iii) these differences induce wide variation in accuracy on link
prediction and link description tasks. We introduce a new diagnostic for
evaluating overfitting and underfitting in practice, and use it to roughly
divide community detection methods into general and specialized learning
algorithms. Across methods and inputs, Bayesian techniques based on the
stochastic block model and a minimum description length approach to
regularization represent the best general learning approach, but can be
outperformed under specific circumstances. These results introduce both a
theoretically principled approach to evaluate over and underfitting in models
of network community structure and a realistic benchmark by which new methods
may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table
Optimizing Information Gathering for Environmental Monitoring Applications
The goal of environmental monitoring is to collect information from the environment and to generate an accurate model for a specific phenomena of interest. We can distinguish environmental monitoring applications into two macro areas that have different strategies for acquiring data from the environment. On one hand the use of fixed sensors deployed in the environment allows a constant monitoring and a steady flow of information coming from a predetermined set of locations in space. On the other hand the use of mobile platforms allows to adaptively and rapidly choose the sensing locations based on needs. For some applications (e.g. water monitoring) this can significantly reduce costs associated with monitoring compared with classical analysis made by human operators. However, both cases share a common problem to be solved. The data collection process must consider limited resources and the key problem is to choose where to perform observations (measurements) in order to most effectively acquire information from the environment and decrease the uncertainty about the analyzed phenomena. We can generalize this concept under the name of information gathering. In general, maximizing the information that we can obtain from the environment is an NP-hard problem. Hence, optimizing the selection of the sampling locations is crucial in this context. For example, in case of mobile sensors the problem of reducing uncertainty about a physical process requires to compute sensing trajectories constrained by the limited resources available, such as, the battery lifetime of the platform or the computation power available on board. This problem is usually referred to as Informative Path Planning (IPP). In the other case, observation with a network of fixed sensors requires to decide beforehand the specific locations where the sensors has to be deployed. Usually the process of selecting a limited set of informative locations is performed by solving a combinatorial optimization problem that model the information gathering process. This thesis focuses on the above mentioned scenario. Specifically, we investigate diverse problems and propose innovative algorithms and heuristics related to the optimization of information gathering techniques for environmental monitoring applications, both in case of deployment of mobile and fixed sensors. Moreover, we also investigate the possibility of using a quantum computation approach in the context of information gathering optimization
A Comprehensive Survey on Enterprise Financial Risk Analysis: Problems, Methods, Spotlights and Applications
Enterprise financial risk analysis aims at predicting the enterprises' future
financial risk.Due to the wide application, enterprise financial risk analysis
has always been a core research issue in finance. Although there are already
some valuable and impressive surveys on risk management, these surveys
introduce approaches in a relatively isolated way and lack the recent advances
in enterprise financial risk analysis. Due to the rapid expansion of the
enterprise financial risk analysis, especially from the computer science and
big data perspective, it is both necessary and challenging to comprehensively
review the relevant studies. This survey attempts to connect and systematize
the existing enterprise financial risk researches, as well as to summarize and
interpret the mechanisms and the strategies of enterprise financial risk
analysis in a comprehensive way, which may help readers have a better
understanding of the current research status and ideas. This paper provides a
systematic literature review of over 300 articles published on enterprise risk
analysis modelling over a 50-year period, 1968 to 2022. We first introduce the
formal definition of enterprise risk as well as the related concepts. Then, we
categorized the representative works in terms of risk type and summarized the
three aspects of risk analysis. Finally, we compared the analysis methods used
to model the enterprise financial risk. Our goal is to clarify current
cutting-edge research and its possible future directions to model enterprise
risk, aiming to fully understand the mechanisms of enterprise risk
communication and influence and its application on corporate governance,
financial institution and government regulation
Monte Carlo Method with Heuristic Adjustment for Irregularly Shaped Food Product Volume Measurement
Volume measurement plays an important role in the production and processing of food products. Various methods have been
proposed to measure the volume of food products with irregular shapes based on 3D reconstruction. However, 3D reconstruction
comes with a high-priced computational cost. Furthermore, some of the volume measurement methods based on 3D reconstruction
have a low accuracy. Another method for measuring volume of objects uses Monte Carlo method. Monte Carlo method performs
volume measurements using random points. Monte Carlo method only requires information regarding whether random points
fall inside or outside an object and does not require a 3D reconstruction. This paper proposes volume measurement using a
computer vision system for irregularly shaped food products without 3D reconstruction based on Monte Carlo method with
heuristic adjustment. Five images of food product were captured using five cameras and processed to produce binary images.
Monte Carlo integration with heuristic adjustment was performed to measure the volume based on the information extracted from
binary images. The experimental results show that the proposed method provided high accuracy and precision compared to the
water displacement method. In addition, the proposed method is more accurate and faster than the space carving method
AIRO 2016. 46th Annual Conference of the Italian Operational Research Society. Emerging Advances in Logistics Systems Trieste, September 6-9, 2016 - Abstracts Book
The AIRO 2016 book of abstract collects the contributions from the conference participants.
The AIRO 2016 Conference is a special occasion for the Italian Operations Research community, as AIRO annual conferences turn 46th edition in 2016. To reflect this special occasion, the Programme and Organizing Committee, chaired by Walter Ukovich, prepared a high quality Scientific Programme including the first initiative of AIRO Young, the new AIRO poster section that aims to promote the work of students, PhD students, and Postdocs with an interest in Operations Research.
The Scientific Programme of the Conference offers a broad spectrum of contributions covering the variety of OR topics and research areas with an emphasis on “Emerging Advances in Logistics Systems”.
The event aims at stimulating integration of existing methods and systems, fostering communication amongst different research groups, and laying the foundations for OR integrated research projects in the next decade.
Distinct thematic sections follow the AIRO 2016 days starting by initial presentation of the objectives and features of the Conference. In addition three invited internationally known speakers will present Plenary Lectures, by Gianni Di Pillo, Frédéric Semet e Stefan Nickel, gathering AIRO 2016 participants together to offer key presentations on the latest advances and developments in OR’s research
The synergistic effect of operational research and big data analytics in greening container terminal operations: a review and future directions
Container Terminals (CTs) are continuously presented with highly interrelated, complex, and uncertain planning tasks. The ever-increasing intensity of operations at CTs in recent years has also resulted in increasing environmental concerns, and they are experiencing an unprecedented pressure to lower their emissions. Operational Research (OR), as a key player in the optimisation of the complex decision problems that arise from the quay and land side operations at CTs, has been therefore presented with new challenges and opportunities to incorporate environmental considerations into decision making and better utilise the ‘big data’ that is continuously generated from the never-stopping operations at CTs. The state-of-the-art literature on OR's incorporation of environmental considerations and its interplay with Big Data Analytics (BDA) is, however, still very much underdeveloped, fragmented, and divergent, and a guiding framework is completely missing. This paper presents a review of the most relevant developments in the field and sheds light on promising research opportunities for the better exploitation of the synergistic effect of the two disciplines in addressing CT operational problems, while incorporating uncertainty and environmental concerns efficiently. The paper finds that while OR has thus far contributed to improving the environmental performance of CTs (rather implicitly), this can be much further stepped up with more explicit incorporation of environmental considerations and better exploitation of BDA predictive modelling capabilities. New interdisciplinary research at the intersection of conventional CT optimisation problems, energy management and sizing, and net-zero technology and energy vectors adoption is also presented as a prominent line of future research
From metaheuristics to learnheuristics: Applications to logistics, finance, and computing
Un gran nombre de processos de presa de decisions en sectors estratègics com el transport i la producció representen problemes NP-difícils. Sovint, aquests processos es caracteritzen per alts nivells d'incertesa i dinamisme. Les metaheurístiques són mètodes populars per a resoldre problemes d'optimització difícils en temps de càlcul raonables. No obstant això, sovint assumeixen que els inputs, les funcions objectiu, i les restriccions són deterministes i conegudes. Aquests constitueixen supòsits forts que obliguen a treballar amb problemes simplificats. Com a conseqüència, les solucions poden conduir a resultats pobres. Les simheurístiques integren la simulació a les metaheurístiques per resoldre problemes estocàstics d'una manera natural. Anàlogament, les learnheurístiques combinen l'estadística amb les metaheurístiques per fer front a problemes en entorns dinàmics, en què els inputs poden dependre de l'estructura de la solució. En aquest context, les principals contribucions d'aquesta tesi són: el disseny de les learnheurístiques, una classificació dels treballs que combinen l'estadística / l'aprenentatge automàtic i les metaheurístiques, i diverses aplicacions en transport, producció, finances i computació.Un gran número de procesos de toma de decisiones en sectores estratégicos como el transporte y la producción representan problemas NP-difíciles. Frecuentemente, estos problemas se caracterizan por altos niveles de incertidumbre y dinamismo. Las metaheurísticas son métodos populares para resolver problemas difíciles de optimización de manera rápida. Sin embargo, suelen asumir que los inputs, las funciones objetivo y las restricciones son deterministas y se conocen de antemano. Estas fuertes suposiciones conducen a trabajar con problemas simplificados. Como consecuencia, las soluciones obtenidas pueden tener un pobre rendimiento. Las simheurísticas integran simulación en metaheurísticas para resolver problemas estocásticos de una manera natural. De manera similar, las learnheurísticas combinan aprendizaje estadístico y metaheurísticas para abordar problemas en entornos dinámicos, donde los inputs pueden depender de la estructura de la solución. En este contexto, las principales aportaciones de esta tesis son: el diseño de las learnheurísticas, una clasificación de trabajos que combinan estadística / aprendizaje automático y metaheurísticas, y varias aplicaciones en transporte, producción, finanzas y computación.A large number of decision-making processes in strategic sectors such as transport and production involve NP-hard problems, which are frequently characterized by high levels of uncertainty and dynamism. Metaheuristics have become the predominant method for solving challenging optimization problems in reasonable computing times. However, they frequently assume that inputs, objective functions and constraints are deterministic and known in advance. These strong assumptions lead to work on oversimplified problems, and the solutions may demonstrate poor performance when implemented. Simheuristics, in turn, integrate simulation into metaheuristics as a way to naturally solve stochastic problems, and, in a similar fashion, learnheuristics combine statistical learning and metaheuristics to tackle problems in dynamic environments, where inputs may depend on the structure of the solution. The main contributions of this thesis include (i) a design for learnheuristics; (ii) a classification of works that hybridize statistical and machine learning and metaheuristics; and (iii) several applications for the fields of transport, production, finance and computing