Search CORE

34 research outputs found

A new mixed-integer programming model for irregular strip packing based on vertical slices with a reproducible survey

Author: Lastra Díaz Juan J.
Ortuño M. T.
Publication venue
Publication date: 01/01/2022
Field of study

The irregular strip-packing problem, also known as nesting or marker making, is defined as the automatic computation of a non-overlapping placement of a set of non-convex polygons onto a rectangular strip of fixed width and unbounded length, such that the strip length is minimized. Nesting methods based on heuristics are a mature technology, and currently, the only practical solution to this problem. However, recent performance gains of the Mixed-Integer Programming (MIP) solvers, together with the known limitations of the heuristics methods, have encouraged the exploration of exact optimization models for nesting during the last decade. Despite the research effort, the current family of exact MIP models for nesting cannot efficiently solve both large problem instances and instances containing polygons with complex geometries. In order to improve the efficiency of the current MIP models, this work introduces a new family of continuous MIP models based on a novel formulation of the NoFit-Polygon Covering Model (NFP-CM), called NFP-CM based on Vertical Slices (NFP-CM-VS). Our new family of MIP models is based on a new convex decomposition of the feasible space of relative placements between pieces into vertical slices, together with a new family of valid inequalities, symmetry breakings, and variable eliminations derived from the former convex decomposition. Our experiments show that our new NFP-CM-VS models outperform the current state-of-the-art MIP models. Finally, we provide a detailed reproducibility protocol and dataset based on our Java software library as supplementary material to allow the exact replication of our models, experiments, and results

Docta Complutense

HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset

Author: Adhikari
Agirre
Al-Mubaid
Ana García-Serrano
Aouicha
Ashburner
Baker
Banerjee
Banjade
Batet
Batet
Batet
Batet
Batet
Batet
Ben Aouicha
Ben Aouicha
Blanchard
Botsch
Budanitsky
Castellanos
Castellanos
Castells
Chaves-González
Chen
Chirigati
Chirigati
Couto
Couto
Cross
Dagher
de Berg
Dijkman
Editorial
Fernando
Fernando Chirigati
Fokkens
Fähndrich
Gao
Garla
Grego
Guzzi
Hadj Taieb
Hadj Taieb
Hadj Taieb
Hao
Harispe
Harispe
Harispe
Harispe
Hill
Hirst
Jiang
Jiang
Juan J. Lastra-Díaz
Kyogoku
Lastra-Díaz
Lastra-Díaz
Lastra-Díaz
Lastra-Díaz
Lastra-Díaz
Lastra-Díaz
Lastra-Díaz
Lastra-Díaz
Lastra-Díaz
Leacock
Lee
Leopold
Leopold
Li
Lin
Liu
Lord
Mandreoli
Martinez-Gil
Martínez
Mazandu
McInnes
McInnes
Mehlhorn
Mendling
Meng
Meng
Meng
Merkel
Meymandpour
Mihalcea
Miller
Miller
Miriam Fernández
Montani
Montserrat Batet
Munafò
Oliva
Patwardhan
Patwardhan
Pedersen
Pedersen
Pedersen
Pedersen
Pedersen
Pedersen
Pekar
Pesquita
Petrakis
Pirró
Pirró
Pirró
Pothos
Rada
Resnik
Resnik
Rodríguez
Rubenstein
Schlicker
Schlicker
Sebti
Seco
Seddiqui
Shima
Stanchev
Stojanovic
Sánchez
Sánchez
Sánchez
Sánchez
Tversky
Van Miltenburg
Vrandečić
Wolke
Wolke
Wu
Wu
Yuan
Zhang
Zhou
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

This work is a detailed companion reproducibility paper of the methods and experiments proposed by Lastra-Díaz and García-Serrano in (2015, 2016) [56–58], which introduces the following contributions: (1) a new and efficient representation model for taxonomies, called PosetHERep, which is an adaptation of the half-edge data structure commonly used to represent discrete manifolds and planar graphs; (2) a new Java software library called the Half-Edge Semantic Measures Library (HESML) based on PosetHERep, which implements most ontology-based semantic similarity measures and Information Content (IC) models reported in the literature; (3) a set of reproducible experiments on word similarity based on HESML and ReproZip with the aim of exactly reproducing the experimental surveys in the three aforementioned works; (4) a replication framework and dataset, called WNSimRep v1, whose aim is to assist the exact replication of most methods reported in the literature; and finally, (5) a set of scalability and performance benchmarks for semantic measures libraries. PosetHERep and HESML are motivated by several drawbacks in the current semantic measures libraries, especially the performance and scalability, as well as the evaluation of new methods and the replication of most previous methods. The reproducible experiments introduced herein are encouraged by the lack of a set of large, self-contained and easily reproducible experiments with the aim of replicating and confirming previously reported results. Likewise, the WNSimRep v1 dataset is motivated by the discovery of several contradictory results and difficulties in reproducing previously reported methods and experiments. PosetHERep proposes a memory-efficient representation for taxonomies which linearly scales with the size of the taxonomy and provides an efficient implementation of most taxonomy-based algorithms used by the semantic measures and IC models, whilst HESML provides an open framework to aid research into the area by providing a simpler and more efficient software architecture than the current software libraries. Finally, we prove the outperformance of HESML on the state-of-the-art libraries, as well as the possibility of significantly improving their performance and scalability without caching using PosetHERep

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Open Research Online (The Open University)

The Oberta in open access

Traditional knowledge of wild edible plants used in the northwest of the Iberian Peninsula (Spain and Portugal): a comparative study

Abstract Background We compare traditional knowledge and use of wild edible plants in six rural regions of the northwest of the Iberian Peninsula as follows: Campoo, Picos de Europa, Piloña, Sanabria and Caurel in Spain and Parque Natural de Montesinho in Portugal. Methods Data on the use of 97 species were collected through informed consent semi-structured interviews with local informants. A semi-quantitative approach was used to document the relative importance of each species and to indicate differences in selection criteria for consuming wild food species in the regions studied. Results and discussion The most significant species include many wild berries and nuts (e.g. <it>Castanea sativa, Rubus ulmifolius, Fragaria vesca</it>) and the most popular species in each food-category (e.g. fruits or herbs used to prepare liqueurs such as <it>Prunus spinosa</it>, vegetables such as <it>Rumex acetosa</it>, condiments such as <it>Origanum vulgare</it>, or plants used to prepare herbal teas such as <it>Chamaemelum nobile</it>). The most important species in the study area as a whole are consumed at five or all six of the survey sites. Conclusion Social, economic and cultural factors, such as poor communications, fads and direct contact with nature in everyday life should be taken into account in determining why some wild foods and traditional vegetables have been consumed, but others not. They may be even more important than biological factors such as richness and abundance of wild edible flora. Although most are no longer consumed, demand is growing for those regarded as local specialties that reflect regional identity.</p

Crossref

Directory of Open Access Journals

PubMed Central

Biblioteca Digital do IPB

Digital.CSIC

Protocol for a reproducible experimental survey on biomedical sentence similarity.

Author: Alicia Lara-Clares
Ana Garcia-Serrano
Juan J Lastra-Díaz
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2021
Field of study

Measuring semantic similarity between sentences is a significant task in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and biomedical text mining. For this reason, the proposal of sentence similarity methods for the biomedical domain has attracted a lot of attention in recent years. However, most sentence similarity methods and experimental results reported in the biomedical domain cannot be reproduced for multiple reasons as follows: the copying of previous results without confirmation, the lack of source code and data to replicate both methods and experiments, and the lack of a detailed definition of the experimental setup, among others. As a consequence of this reproducibility gap, the state of the problem can be neither elucidated nor new lines of research be soundly set. On the other hand, there are other significant gaps in the literature on biomedical sentence similarity as follows: (1) the evaluation of several unexplored sentence similarity methods which deserve to be studied; (2) the evaluation of an unexplored benchmark on biomedical sentence similarity, called Corpus-Transcriptional-Regulation (CTR); (3) a study on the impact of the pre-processing stage and Named Entity Recognition (NER) tools on the performance of the sentence similarity methods; and finally, (4) the lack of software and data resources for the reproducibility of methods and experiments in this line of research. Identified these open problems, this registered report introduces a detailed experimental setup, together with a categorization of the literature, to develop the largest, updated, and for the first time, reproducible experimental survey on biomedical sentence similarity. Our aforementioned experimental survey will be based on our own software replication and the evaluation of all methods being studied on the same software platform, which will be specially developed for this work, and it will become the first publicly available software library for biomedical sentence similarity. Finally, we will provide a very detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results

Directory of Open Access Journals

Pearson (r) and Spearman (ρ) correlation values, harmonic score (h), and harmonic average (AVG) score obtained by the LiBlock method in combination with each NER tool using the best pre-processing configuration detailed in Table 7.

Author: Alicia Lara-Clares (10433024)
Ana Garcia-Serrano (6993692)
Juan J. Lastra-Díaz (6993689)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 21/11/2022
Field of study

In addition, the last column (p-val) shows the p-values for the comparison of the LiBlock method with cTAKES and the remaining NER combinations.</p

FigShare

The statistical significance results.

Author: Alicia Lara-Clares (10433024)
Ana Garcia-Serrano (6993692)
Juan J. Lastra-Díaz (6993689)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 21/11/2022
Field of study

We provide a series of tables reporting the p-values for each pair of methods evaluated in this work as supplementary material. (PDF)</p

FigShare

Detailed setup for the ontology-based sentence similarity measures evaluated in this work.

Author: Alicia Lara-Clares (10433024)
Ana Garcia-Serrano (6993692)
Juan J. Lastra-Díaz (6993689)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 21/11/2022
Field of study

The evaluation of the methods using Rada [69], coswJ&C [46], and Cai [68] word similarity measures use a reformulation of the original path-based measures based on the new Ancestors-based Shortest-Path Length (AncSPL) algorithm [42].</p

FigShare

Pearson (r), Spearman (ρ) and harmonic (h) values obtained in our experiments from the evaluation of ontology similarity methods detailed below in the MedSTSfull [52] dataset for each NER tool.

Author: Alicia Lara-Clares (10433024)
Ana Garcia-Serrano (6993692)
Juan J. Lastra-Díaz (6993689)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 21/11/2022
Field of study

Pearson (r), Spearman (ρ) and harmonic (h) values obtained in our experiments from the evaluation of ontology similarity methods detailed below in the MedSTSfull [52] dataset for each NER tool.</p

FigShare

Detail of the pre-processing configurations that are evaluated in this work.

Author: Alicia Lara-Clares (10433024)
Ana Garcia-Serrano (6993692)
Juan J. Lastra-Díaz (6993689)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 21/11/2022
Field of study

(*) WordPieceTokenizer [91] is used only for BERT-based methods [30, 31, 34, 62, 91–94, 99].</p

FigShare

Detailed setup for the sentence similarity methods based on pre-trained character, word (WE) and sentence (SE) embedding models evaluated herein.

Author: Alicia Lara-Clares (10433024)
Ana Garcia-Serrano (6993692)
Juan J. Lastra-Díaz (6993689)
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 21/11/2022
Field of study

Detailed setup for the sentence similarity methods based on pre-trained character, word (WE) and sentence (SE) embedding models evaluated herein.</p

FigShare

A new mixed-integer programming model for irregular strip packing based on vertical slices with a reproducible survey

HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset

Traditional knowledge of wild edible plants used in the northwest of the Iberian Peninsula (Spain and Portugal): a comparative study

Protocol for a reproducible experimental survey on biomedical sentence similarity.

Pearson (r) and Spearman (<i>ρ</i>) correlation values, harmonic score (<i>h</i>), and harmonic average (AVG) score obtained by the LiBlock method in combination with each NER tool using the best pre-processing configuration detailed in Table 7.

The statistical significance results.

Detailed setup for the ontology-based sentence similarity measures evaluated in this work.

Pearson (r), Spearman (<i>ρ</i>) and harmonic (<i>h</i>) values obtained in our experiments from the evaluation of ontology similarity methods detailed below in the MedSTS<sub><i>full</i></sub> [52] dataset for each NER tool.

Detail of the pre-processing configurations that are evaluated in this work.

Detailed setup for the sentence similarity methods based on pre-trained character, word (WE) and sentence (SE) embedding models evaluated herein.