Search CORE

1,227 research outputs found

Learning Relatedness Measures for Entity Linking

Author: Ceccarelli Diego
Lucchese Claudio
Orlando Salvatore
Perego Raffaele
Trani Salvatore
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowl- edge base. The most important of such features is entity relatedness. Indeed, we argue that these algorithms benefit from maximizing the relatedness among the relevant enti- ties selected for annotation, since this minimizes errors in disambiguating entity-linking. The definition of an e↵ective relatedness function is thus a crucial point in any entity-linking algorithm. In this paper we address the problem of learning high-quality entity relatedness functions. First, we formalize the problem of learning entity relatedness as a learning-to-rank problem. We propose a methodology to create reference datasets on the basis of manually annotated data. Finally, we show that our machine-learned entity relatedness function performs better than other relatedness functions previously proposed, and, more importantly, improves the overall performance of dif- ferent state-of-the-art entity-linking algorithms

Archivio Ricerca Ca'Foscari

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

IMT Institutional Repository

SEL: A unified algorithm for entity linking and saliency detection

Author: Ceccarelli Diego
LUCCHESE Claudio
ORLANDO Salvatore
Perego Raffaele
Trani Salvatore
Publication venue: Association for Computing Machinery, Inc
Publication date: 01/01/2016
Field of study

The Entity Linking task consists in automatically identifying and linking the entities mentioned in a text to their URIs in a given Knowledge Base, e.g., Wikipedia. Entity Linking has a large impact in several text analysis and information retrieval related tasks. This task is very challenging due to natural language ambiguity. However, not all the entities mentioned in a document have the same relevance and utility in understanding the topics being discussed. Thus, the related problem of identifying the most relevant entities present in a document, also known as Salient Entities, is attracting increasing interest. In this paper we propose SEL, a novel supervised two-step algorithm comprehensively addressing both entity linking and saliency detection. The first step is based on a classifier aimed at identifying a set of candidate entities that are likely to be mentioned in the document, thus maximizing the precision of the method without hindering its recall. The second step is still based on machine learning, and aims at choosing from the previous set the entities that actually occur in the document. Indeed, we tested two different versions of the second step, one aimed at solving only the entity linking task, and the other that, besides detecting linked entities, also scores them according to their saliency. Experiments conducted on two different datasets show that the proposed algorithm outperforms state-of-the-art competitors, and is able to detect salient entities with high accuracy

Archivio Ricerca Ca'Foscari

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Role of F-18-FDG-PET/CT in restaging of patients affected by gastrointestinal stromal tumours (GIST)

Author: Bertagna Francesco
Biasiotto Giorgio
Bosio Giovanni
Giubbini Raffaele
Orlando Emanuela
Publication venue: 'Salvia Medical Sciences Ltd'
Publication date: 20/05/2011
Field of study

BACKGROUND: Gastrointestinal stromal tumours (GISTs) are a subset of mesenchymal tumours that represent the most common mesenchymal neoplasms of the gastrointestinal (GI) tract and account for less than 1% of all gastrointestinal tumours. MATERIAL AND METHODS: We retrospectively evaluated 19 patients (6 females and 13 males; median age: 61 years ± 15 standard deviation) affected by GIST histologically documented after surgical intervention or biopsy. RESULTS: F18-FDG-PET/CT had identified pathologic uptakes and was considered positive for neoplastic tissue in 10 patients (53%) and negative in 9 (47%), in concordance with radiological findings. CONCLUSIONS: F18-FDG-PET/CT is a feasible, reliable, and accurate method to restage patients affected by previously histologically confirmed GIST, also in the absence of a staging study. Nuclear Med Rev 2010; 13, 2: 76–8

Via Medica Journals

ILMART: Interpretable Ranking with Constrained LambdaMART

Author: Lucchese Claudio
Nardini Franco Maria
Orlando Salvatore
Perego Raffaele
Veneri Alberto
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2022
Field of study

Interpretable Learning to Rank (LtR) is an emerging field within the research area of explainable AI, aiming at developing intelligible and accurate predictive models. While most of the previous research efforts focus on creating post-hoc explanations, in this paper we investigate how to train effective and intrinsically-interpretable ranking models. Developing these models is particularly challenging and it also requires finding a trade-off between ranking quality and model complexity. State-of-the-art rankers, made of either large ensembles of trees or several neural layers, exploit in fact an unlimited number of feature interactions making them black boxes. Previous approaches on intrinsically-interpretable ranking models address this issue by avoiding interactions between features thus paying a significant performance drop with respect to full-complexity models. Conversely, ILMART, our novel and interpretable LtR solution based on LambdaMART, is able to train effective and intelligible models by exploiting a limited and controlled number of pairwise feature interactions. Exhaustive and reproducible experiments conducted on three publicly-available LtR datasets show that ILMART outperforms the current state-of-the-art solution for interpretable ranking of a large margin with a gain of nDCG of up to 8%

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Primary breast non-Hodgkin lymphoma. A report of an unusual case

Author: Bertagna Francesco
Biasiotto Giorgio
Bosio Giovanni
Dognini Lodovica
Giubbini Raffaele
Orlando Emanuela
Publication venue: 'Salvia Medical Sciences Ltd'
Publication date: 31/08/2012
Field of study

Although lymphomas are generally considered as tumors oflymph nodes about 25-40% arise at extranodal sites. We reporta case of a 60 years old female who developed a right breastB-diffuse large cell non-Hodgkin lymphoma in 2005 treated bychemo/radio-therapy which relapsed at the same breast in 2007and at the other breast in 2010. The patient underwent bothradiologic and nuclear medicine studies.Although lymphomas are generally considered as tumors oflymph nodes about 25-40% arise at extranodal sites. We reporta case of a 60 years old female who developed a right breastB-diffuse large cell non-Hodgkin lymphoma in 2005 treated bychemo/radio-therapy which relapsed at the same breast in 2007and at the other breast in 2010. The patient underwent bothradiologic and nuclear medicine studies

Via Medica Journals

WINGS: A Parallel Indexer for Web Contents

Author: Orlando Salvatore
Perego Raffaele
Silvestri Fabrizio
Publication venue
Publication date: 08/09/2010
Field of study

Open Access Repository

The modelling of particle resuspension in a turbulent boundary layer

Author: Lucchese Claudio
Orlando Salvatore
Perego Raffaele
Publication venue
Publication date: 01/01/2011
Field of study

The work presented concerns the way small particles attached to a surface are resuspended when exposed to a turbulent flow. Of particular concern to this work is the remobilization of radioactive particles as a consequence of potential nuclear accidents. In this particular case the focus is on small particles, < 5 microns in diameter, where the principal force holding such particles onto a surface arises from van der Waals inter-molecular forces. Given its suitable treatment of the microphysics of small particles, it was decided here to aim to develop improved versions of the Rock’n’Roll (R’n’R) model; the R’n’R model is based on a statistical approach to resuspension involving the rocking and rolling of a particle about surface asperities induced by the moments of the fluctuating drag forces acting on the particle close to the surface. Firstly, a force (moment) balance model has been modified by including the distribution of the aerodynamic force instead of considering only its mean value. It was also possible to improve the representation of the adhesive-force distribution where it is customary to include a substantial reduction factor to take account of surface roughness. The R’n’R model is significantly improved by using realistic statistical fluctuations of both the stream-wise fluid velocity and acceleration close to the wall obtained from Large Eddy Simulation (LES) and Direct Numerical Simulation (DNS) of turbulent channel flow; in the standard model a major assumption is that these obey a Gaussian distribution. The flow conditions are translated into the moments of the drag force acting on the particle attached to the surface (using O’Neill’s formula for the aerodynamic drag forces in terms of the local flow velocities). In so doing the influence of highly non-Gaussian forces (associated with the sweeping and ejection events in a turbulent boundary layer) on the resuspension rate has been examined along with the sensitivity of the fluctuation statistics to LES and DNS. We have found most importantly that the statistics of both fluctuating forces and its derivative (normalized on their rms values) are noticeably independent of the normalized distance from the wall, y+ within the viscous sublayer (y+ < 6) – if this were not the case then modelling fluctuations with different particle sizes would be far more complex. In particular as a result of the analysis of our DNS/LES data 3 distinct features of the modified R’n’R model have emerged as playing an important part in the resuspension. The first is the typical forcing frequency ω due to the turbulent (fluctuating) aerodynamic drag forces acting on the particle attached to a surface (in the modified R’n’R model based on the DNS results (y+ = 0.1) it is a factor of 4 > the value in the original model based on Hall’s measurements of the lift force). This naturally has a significant effect of increasing the fraction resuspended for very short times (ωt ~< 1) iv and is the controlling influence over the entire range of times from short to long term resuspension. The second is the value of the ratio of the root-mean-square (rms) drag force to its mean value which in the modified model is nearly twice (1.8) than that in the original. This feature of the model is largely responsible for the greater fraction resuspended after times ~ 1s (times which are sufficient to include the transition period from short term resuspension to long term resuspension rates (~t-1). The third feature introduces changes in the resuspension because the distribution of aerodynamic drag forces in the modified model is distinctly non-Gaussian behaving more like a Rayleigh distribution. This means that the distribution of the drag force decays much more slowly in the wings of the distribution than the equivalent Gaussian (with the same rms) so that for very large values of the adhesive force / rms drag force ~ 8 (at the extreme end of the DNS measurements), the resuspension rate constant is a factor of 30 larger than that for an equivalent Gaussian model. Thus although the fraction of particles resuspended is very small in these instances, the differences between the modified and original models can be very large. This is particularly important when we consider resuspension from multilayer deposits. When we consider these influences in the context of a broad range of adhesive forces due to surface roughness, we find that in general, the modified model gives around 10% more for the fraction of particle resuspension fraction than the original R’n’R model (for an initial log normal distribution of adhesive forces), however the difference could become significant (3 to 7 times greater depending on the range of values of the adhesive-force spread factor) when the friction velocity is small (i.e., smaller resuspension fraction). As for the short-term resuspension rate, the difference between the modified and original model becomes significant when this is dominated by the typical forcing frequency (ω+ is 0.0413 for the original model, 0.08553 for LES approach and 0.127143 for DNS for y+ = 6). The sensitivity to the adhesive-force spread factor has also been studied and the results indicate that the modified model removes particles much more easily than the original model in conditions of small friction velocity and a smoother surface (i.e., small spread factor). Finally in this phase of the work, the correlation between the distribution of the fluctuating force and its derivative has been checked for both LES and DNS statistics. The results demonstrate that this correlation has a very slight effect on particle resuspension compared with the result from the uncorrelated curve-fitted model. In view of recent numerical data for lift and drag forces in turbulent boundary layers (Lee & Balachandar), the lift and drag we have considered and the impact of these data on predictions made by the non-Gaussian R’n’R model are compared with those based on O’Neill formula. The results indicate that, in terms of the long-term resuspension fraction, the difference is minor. It is concluded that as the particle size decreases the L&B method will lead to less-and-less long-term resuspension. Finally the ultimate model that has been developed in this work is a hybrid version of the R’n’R model adapted for application to multilayer deposits based on the Friess and Yadigaroglu multilayer v approach. The deposit is modelled in several overlying layers where the coverage effect (masking) of the deposit layers has been studied; in the first instance a monodisperse deposit with a coverage ratio factor was modelled where this was subsequently replaced by the more general case of a polydisperse deposit with a particle size distribution. The results indicate that, in general, as the number of modelled layers increases the resuspension fraction of the whole deposit after a certain time decreases significantly. In other words, it takes a much longer time to resuspend a thicker deposit. Taking account of the particle size distribution slightly increases the short-term resuspension. However, this change decreases the long-term resuspension significantly. The model results have been compared with data from the STORM SR11 test (ISP-40) and the BISE experiments. In general, both comparisons indicate that with smaller spread of the adhesive force distribution (i.e., the range of adhesive force distribution is narrower) the new multilayer model agrees very well with the experimental data. It can be inferred that multilayer deposits lead to much narrower distributions of adhesive force.EThOS - Electronic Theses Online ServiceIRSN (Institut de Radioprotection et de Sûreté Nucléaire)GBUnited Kingdo

Archivio Ricerca Ca'Foscari

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

OpenGrey Repository

Load-Sensitive Selective Pruning for Distributed Search

Author: Craig Macdonald
Daniele Broccolo
Fabrizio Silvestri
Iadh Ounis
Nicola Tonellotto
Raffaele Perego
Salvatore Orlando
Publication venue
Publication date: 01/01/2013
Field of study

A search engine infrastructure must be able to provide the same quality of service to all queries received during a day. During normal operating conditions, the demand for resources is considerably lower than under peak conditions, yet an oversized infrastructure would result in an unnecessary waste of computing power. A possible solution adopted in this situation might consist of defining a maximum threshold processing time for each query, and dropping queries for which this threshold elapses, leading to disappointed users. In this paper, we propose and evaluate a different approach, where, given a set of different query processing strategies with differing efficiency, each query is considered by a framework that sets a maximum query processing time and selects which processing strategy is the best for that query, such that the processing time for all queries is kept below the threshold. The processing time estimates used by the scheduler are learned from past queries. We experimentally validate our approach on 10,000 queries from a standard TREC dataset with over 50 million documents, and we compare it with several baselines. These experiments encompass testing the system under different query loads and different maximum tolerated query response times. Our results show that, at the cost of a marginal loss in terms of response quality, our search system is able to answer 90 % of queries within half a second during times of high query volume

Archivio Ricerca Ca'Foscari

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Enlighten

Archivio della ricerca- Università di Roma La Sapienza

Landscapes, Art, Parks and Cultural Change

Author: Calabritto Claudio
Carmen Monic
Chambers Iain
Esposito Raffaele
Festa Mario
Izzo Rosita
Lanza Orlando
Publication venue
Publication date: 31/05/2007
Field of study

A critical examination of a series of cultural interventions in the Matese mountains north of Naples, deploying artistic practices to re-signify the complexities and memories of the landscape

ZENODO

Method to rank documents by a computer, using additive ensembles of regression trees and cache optimisation, and search engine using such a method

Author: Claudio Lucchese
Domenico DATO
Franco Maria NARDINI
Nicola TONELLOTTO
Raffaele PEREGO
Rossano Venturini
Salvatore Orlando
Publication venue
Publication date: 01/01/2021
Field of study

The present invention concerns a novel method to efficiently score documents (texts, images, audios, videos, and any other information file) by using a machine-learned ranking function modeled by an additive ensemble of regression trees. A main contribution is a new representation of the tree ensemble based on bitvectors, where the tree traversal, aimed to detect the leaves that contribute to the final scoring of a document, is performed through efficient logical bitwise operations. In addition, the traversal is not performed one tree after another, as one would expect, but it is interleaved, feature by feature, over the whole tree ensemble. Tests conducted on publicly available LtR datasets confirm unprecedented speedups (up to 6.5×) over the best state-of-the-art methods

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari