6 research outputs found

    An integer programming approach for the 2-class single-group classification problem

    Get PDF
    Two sets XB, XR ⊆ Rd are linearly separable if their convex hulls are disjoint, implying that a hyperplane separating XB from XR exists. Such a hyperplane provides a method for classifying new points, according to the side of the hyperplane in which the new points lie. In this work we consider a particular case of the 2-class classification problem, which asks to select the maximum number of points from XB and XR in such a way that the selected points are linearly separable. We present an integer programming formulation for this problem, explore valid inequalities for the associated polytope, and develop a cutting plane approach coupled with a lazy-constraints scheme.Fil: Corrêa, Ricardo C.. Universidade Federal Rural Do Rio de Janeiro; BrasilFil: Blaum, Manuela. Universidad Nacional de General Sarmiento; ArgentinaFil: Marenco, Javier Leonardo. Universidad Nacional de General Sarmiento; ArgentinaFil: Koch, Ivo Valerio. Universidad Nacional de General Sarmiento; ArgentinaFil: Mydlarz, Marcelo. Universidad Nacional de General Sarmiento; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaLatin-American Algorithms, Graphs and Optimization Symposium (LAGOS 2019)Belo HorizonteBrasilCoordenação de Aperfeiçoamento de Pessoal de Nivel SuperiorConselho Nacional de Desenvolvimento Científico e Técnologico do BrasilUniversidade Federal de Minas Gerai

    Towards a unified methodology for supporting the integration of data sources for use in web applications

    Get PDF
    Organisations are making increasing use of web applications and web-based systems as an integral part of providing services. Examples include personalised dynamic user content on a website, social media plug-ins or web-based mapping tools. For these types of applications to have maximum use for the user where the applications are fully functional, they require the integration of data from multiple sources. The focus of this thesis is in improving this integration process with a focus on web applications with multiple sources of data. Integration of data from multiple sources is problematic for many reasons. Current integration methods tend to be domain specific and application specific. They are often complex, have compatibility issues with different technologies, lack maturity, are difficult to re-use, and do not accommodate new and emerging models and integration technologies. Technologies to achieve integration, such as brokers and translators do exist, but they cannot be used as a generic solution for developing web-applications achieving the integration outcomes required for successful web application development due to their domain specificity. It is because of these difficulties with integration, and the wide variety of integration approaches that there is a need to provide assistance to the developer in selecting the integration approach most appropriate to their needs. This thesis proposes GIWeb, a unified top-down data integration methodology instantiated with a framework that will aid developers in their integration process. It will act as a conceptual structure to support the chosen technical approach. The framework will assist in the integration of data sources to support web application builders. The thesis presents the rationale for the need for the framework based on an examination of the range of applications, associated data sources and the range of potential solutions. The framework is evaluated using four case studies

    Conceptual modelling for integrated decision-making in process systems

    Get PDF
    This Thesis addresses the systematic construction of Decision Making Models (DMMs) from the conceptualization stage to its application in specific situations, with special emphasis on !he treatment of scenarios where there is a hierarchy of decision levels, common in the Process Systems (PS). Although the methodologies developed are generic, the scope of this Thesis is limited to the perspective of Process Engineering. The central component required to construct a DMM is the conceptual description of the reality, which supports the system alisation of management procedures . During this description, two different dom ains can be identified: the PS Domain, useful to describe the structure of the process as such (physical reality and the way in which its elements are related), and the Management Domain, identified in this Thesis as associated with the Conceptual Constraints (CC) that describe the restrictions associated with the management of the process . In this way, the PS Domain includes concepts and relationships that appear in the control standards of the process followed by the company: the description of the process to be developed, the description of the physical equipment in which it is developed , and that of its interactions, giving rise to the control of the execution of the procedures; this domain should allow managing the construction, design, operation and control of any manufacturing system. On the other hand, the CC Domain contains the information associated with the concepts and relationships that m ust be fulfilled to ensure a coherent set of decisions, with the purpose of identifying and representing the systematics to follow during the decision-making process, giving rise to the conceptual representation of this system and, finally, the construction of the corresponding DMM. The first challenge addressed in this thesis is associated with the systematisation of conceptual modelling from semantic information, for the construction ofontologies from textual sources and a procedure to verify the interna! coherence of lhese sources. The application of this methodology has been used for the identification of the essential concepts and relationships in the PS Domain, allowing creating a generic, common and shared model, unlike the existing models. In the next step, this PS Domain has been used to solve management problems in systems that comprise multi-level hierarchies. The resulting decision-making process allows integrating the decisions made al each level, ensuring their consistency from an approach that simultaneously considers the management of all available information (data and knowledge). On the other hand, the introduction of the necessary concepts and relationships to ensure the feasibility of the process management decisions, through the CC Domain, allows the development of systematic DMM creation procedures: this domain classifies the constrains (balances, sequence, etc.), adds abstrae! elements to them (e.g.: produced and consumed amounts) and allows to generalize the relation of its compone nis with the information associated to the PS Domain. The last part of this Thesis deals with the integration of the PS and CC Domains, and their application for the generation of new decision-making systems . For this, algorithms have been designed that, starting from the previously identified and classified restrictions, and patterns of DMMs also previously identified from existing cases, exploit the information available through the instances in the PS Domain, to generate new DMMs according to the user's specifications. lts use is illustrated through cases from different environments, demonstrating the generalisation capacity of the created systematics.Esta Tesis aborda la construcción sistemática de Modelos para la toma de Decisiones (DMMs) desde la etapa de conceptualización hasta su aplicación en situaciones concretas, con especial énfasis en el tratamiento de escenarios en los que existe una jerarquía de niveles de decisión, habitual en la Industria de Proceso (PS). Aunque las metodologías desarrolladas son genéricas, el alcance de esta Tesis se limita a la perspectiva de la Ingeniería de Procesos. El componente central requerido para construir un DMMs es la descripción conceptual de la realidad a la que se orienta, que a su vez respalda la sistematización de los procedimientos de gestión. Durante esta descripción, se pueden identificar planteamientos asociados a dos dominios diferentes: el Dominio del Proceso (PS), útil para describir la estructura del proceso como tal (realidad física y forma en la que se relacionan sus elementos), y el Dominio de Gestión, asociado a las Restricciones Conceptuales (CC) que describen las restricciones asociadas a la gestión del proceso. El Dominio PS incluye conceptos y relaciones que aparecen en los estándares de control del proceso que sigue la empresa: la descripción del proceso a desarrollar, la descripción de los equipos físicos en los que se desarrolla, y la de sus interacciones, que dan lugar al control de ejecución de los procedimientos; este dominio debe permitir la construcción, el diseño, la operación y el control de cualquier sistema de fabricación. Por su parte, el Dominio CC contiene la información asociada a los conceptos y las relaciones que deben cumplirse para asegurar un conjunto coherente de decisiones, con el propósito de identificar y representar la sistemática a seguir durante el proceso de toma de decisiones, dando lugar a la representación conceptual de esta sistemática y, finalmente, a la construcción del correspondiente DMM. El primer reto abordado en esta Tesis está asociado a la sistematización del modelado conceptual a partir de información semántica, para construcción de ontologías a partir de fuentes textuales y de un procedimiento para verificar la coherencia interna de dichas fuentes. La aplicación de esta metodología se ha utilizado para la identificación de los conceptos y las relaciones esenciales en el Dominio PS, permitiendo crear un modelo genérico, común y compartido, a diferencia de los modelos existentes. En el siguiente paso, este Dominio PS se ha utilizado para la resolución de problemas de gestión en sistemas que comprenden múltiples niveles de jerarquías funcionales. El proceso de toma de decisiones resultante permite integrar las decisiones tomadas en cada nivel, asegurando su coherencia a partir de un enfoque que contempla simultáneamente la gestión de toda la información disponible (datos y conocimiento). Por su parte, la introducción de los conceptos y relaciones necesarios para asegurar la factibilidad de las decisiones de gestión del proceso, a través del Dominio CC, permite el desarrollo de procedimientos sistemáticos de creación de DMMs: este Dominio clasifica las restricciones (balances, secuencia, etc.), agrega elementos abstractos a dichas restricciones (p.e.: cantidad producida y consumida) y permite generalizar la relación de sus componentes con la información asociada al Dominio PS. En la última parte de esta Tesis se aborda la integración de los Dominios PS y CC, y su aplicación para la generación de nuevos sistemas de toma de decisiones. Para ello, se han diseñado algoritmos que, partiendo de las restricciones anteriormente identificadas y clasificadas, y patrones de DMMs también previamente identificados a partir de casos ya existentes, explotan la información disponible a través de las instancias del Dominio PS, para generar de nuevos modelos de toma de decisión de acuerdo con las especificaciones del usuario. Su utilización se ilustra a través de casos procedentes de diferentes entornos, demostrando la capacidad de generalización de la sistemática creada.Postprint (published version

    Optimisation approaches for data mining in biological systems

    Get PDF
    The advances in data acquisition technologies have generated massive amounts of data that present considerable challenge for analysis. How to efficiently and automatically mine through the data and extract the maximum value by identifying the hidden patterns is an active research area, called data mining. This thesis tackles several problems in data mining, including data classification, regression analysis and community detection in complex networks, with considerable applications in various biological systems. First, the problem of data classification is investigated. An existing classifier has been adopted from literature and two novel solution procedures have been proposed, which are shown to improve the predictive accuracy of the original method and significantly reduce the computational time. Disease classification using high throughput genomic data is also addressed. To tackle the problem of analysing large number of genes against small number of samples, a new approach of incorporating extra biological knowledge and constructing higher level composite features for classification has been proposed. A novel model has been introduced to optimise the construction of composite features. Subsequently, regression analysis is considered where two piece-wise linear regression methods have been presented. The first method partitions one feature into multiple complementary intervals and ts each with a distinct linear function. The other method is a more generalised variant of the previous one and performs recursive binary partitioning that permits partitioning of multiple features. Lastly, community detection in complex networks is investigated where a new optimisation framework is introduced to identify the modular structure hidden in directed networks via optimisation of modularity. A non-linear model is firstly proposed before its linearised variant is presented. The optimisation framework consists of two major steps, including solving the non-linear model to identify a coarse initial partition and a second step of solving repeatedly the linearised models to re fine the network partition
    corecore