Search CORE

139 research outputs found

Discovering temporal change patterns in the presence of taxonomies

Author: CAGLIERO LUCA
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Summarization of emergency news articles driven by relevance feedback

Author: Cagliero Luca
Publication venue: place:Piscataway
Publication date: 01/01/2017
Field of study

Many articles on the same news are daily published by online newspapers and by various social media. To ease news article exploration sentence-based summarization algorithms aim at automatically generating for each news a summary consisting of the most salient sentences in the original articles. However, since sentence selection is error-prone, the automatically generated summaries are still subject to manual validation by domain experts. If the validation step not only focuses on pruning less relevant content but also on enriching summaries with missing yet relevant sentences this activity may become extremely time consuming. The paper focuses on summarizing news articles by means of an itemset-based technique. To tune summarizer performance a relevance feedback given on sentences is exploited to drive the generation of a new, more targeted summary. The feedback indicates the pertinence of the sentences that are already in the summary. Among the words or the word combinations selected by the summarization model, those occurring in sentences with high feedback score represent concepts that may be deemed as particularly relevant. Therefore, they are exploited to drive the new sentence selection process. The proposed approach was tested on collections of newsvarticles reporting emergency situations. The results show the effectiveness of the proposed approach

Crossref

ZENODO

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Data mining by means of generalized patterns

Author: Cagliero Luca
Publication venue
Publication date: 01/01/2012
Field of study

The thesis is mainly focused on the study and the application of pattern discovery algorithms that aggregate database knowledge to discover and exploit valuable correlations, hidden in the analyzed data, at different abstraction levels. The aim of the research effort described in this work is two-fold: the discovery of associations, in the form of generalized patterns, from large data collections and the inference of semantic models, i.e., taxonomies and ontologies, suitable for driving the mining proces

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Mining SpatioTemporally Invariant Patterns

Author: Cagliero Luca
Colomba Luca
Garza Paolo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2022
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Density-based Clustering by Means of Bridge Point Identification

Author: Cagliero Luca
Colomba Luca
Garza Paolo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2023
Field of study

Density-based clustering focuses on defining clusters consisting of contiguous regions characterized by similar densities of points. Traditional approaches identify core points first, whereas more recent ones initially identify the cluster borders and then propagate cluster labels within the delimited regions. Both strategies encounter issues in presence of multi-density regions or when clusters are characterized by noisy borders. To overcome the above issues, we present a new clustering algorithm that relies on the concept of bridge point. A bridge point is a point whose neighborhood includes points of different clusters. The key idea is to use bridge points, rather than border points, to partition points into clusters. We have proved that a correct bridge point identification yields a cluster separation consistent with the expectation. To correctly identify bridge points in absence of a priori cluster information we leverage an established unsupervised outlier detection algorithm. Specifically, we empirically show that, in most cases, the detected outliers are actually a superset of the bridge point set. Therefore, to define clusters we spread cluster labels like a wildfire until an outlier, acting as a candidate bridge point, is reached. The proposed algorithm performs statistically better than state-of-the-art methods on a large set of benchmark datasets and is particularly robust to the presence of intra-cluster multiple densities and noisy borders

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Inferring multilingual domain-specific word embeddings from large document corpora

Author: Luca Cagliero
Moreno La Quatra
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

The use of distributed vector representations of words in Natural Language Processing has become established. To tailor general-purpose vector spaces to the context under analysis, several domain adaptation techniques have been proposed. They all require sufficiently large document corpora tailored to the target domains. However, in several cross-lingual NLP domains both large enough domain-specific document corpora and pre-trained domain-specific word vectors are hard to find for languages other than English. This paper aims at tackling the aforesaid issue. It proposes a new methodology to automatically infer aligned domain-specific word embeddings for a target language on the basis of the general-purpose and domain-specific models available for a source language (typically, English). The proposed inference method relies on a two-step process, which first automatically identifies domain-specific words and then opportunistically reuses the non-linear space transformations applied to the word vectors of the source language in order to learn how to tailor the vector space of the target language to the domain of interest. The performance of the proposed method was validated via extrinsic evaluation by addressing the established word retrieval task. To this aim, a new benchmark multilingual dataset, derived from Wikipedia, has been released. The results confirmed the effectiveness and usability of the proposed approach

Directory of Open Access Journals

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Open Access Repository

Automatic slides generation in the absence of training data

Author: Luca Cagliero
Moreno La Quatra
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Disseminating the main research findings is one of the main requirements to become a successful researcher. Presentation slides are the most common way to present paper content. To support researchers in slide preparation, the NLP research community has explored the use of summarization techniques to automatically generate a draft of the slides consisting of the most salient sentences or phrases. State-of-the-art methods adopt a supervised approach, which first estimates global content relevance using a set of training papers and slides, then performs content selection by optimizing also section-level coverage. How- ever, in several domains and contexts there is a lack of training data, which hinders the use of supervised models. This paper focuses on addressing the above issue by applying unsupervised summarization methods. They are exploited to generate sentence-level summaries of the paper sections, which are then refined by applying an optimization step. Furthermore, it evaluates the quality of the output slides by taking into account the original paper structure as well. The results, achieved on a benchmark collection of papers and slides, show that unsupervised models performed better than supervised ones on specific paper facets, whereas they were competitive in terms of overall quality score

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

End-to-end Training For Financial Report Summarization

Author: Cagliero Luca
La Quatra Moreno
Publication venue: COLING
Publication date: 01/01/2020
Field of study

Quoted companies are requested to periodically publish financial reports in textual form. The annual financial reports typically include detailed financial and business information, thus giving relevant insights into company outlooks. However, a manual exploration of these financial reports could be very time consuming since most of the available information can be deemed as non-informative or redundant by expert readers. Hence, an increasing research interest has been devoted to automatically extracting domain-specific summaries, which include only the most relevant information. This paper describes the SumTO system architecture, which addresses the Shared Task of the Financial Narrative Summarisation (FNS) 2020 contest. The main task objective is to automatically extract the most informative, domain-specific textual content from financial, English-written documents. The aim is to create a summary of each company report covering all the business-relevant key points. To address the above-mentioned goal, we propose an end-to-end training method relying on Deep NLP techniques. The idea behind the system is to exploit the syntactic overlap between input sentences and ground-truth summaries to fine-tune pre-trained BERT embedding models, thus making such models tailored to the specific context. The achieved results confirm the effectiveness of the proposed method, especially when the goal is to select relatively long text snippets

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Extractive Conversation Summarization Driven by Textual Entailment Prediction

Author: Cagliero Luca
Gallipoli Giuseppe
Garza Paolo
Publication venue: IEEE
Publication date: 01/01/2023
Field of study

Summarizing conversations like meetings, email threads or discussion forums poses relevant challenges on how to model the dialogue structure. Existing approaches mainly focus on premise-claim entailment relationships while neglecting contrasting or uncertain assertions. Furthermore, existing techniques are abstractive, thus requiring a training set consisting of humanly generated summaries. With the twofold aim of enriching the dialogue representation and addressing conversation summarization in the absence of training data, we present an extractive conversation summarization pipeline. We explore the use of contradictions and neutral premise-claim relations, both in the same document or in different documents. The results achieved on four datasets covering different domains show that applying unsupervised methods on top of a refined premise-claim selection achieves competitive performance in most domains

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Leveraging the explainability of associative classifiers to support quantitative stock trading

Author: Attanasio Giuseppe
Baralis Elena
Cagliero Luca
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Forecasting the stock market is particularly challenging due to the presence of a variety of inter-related economic and political factors. In recent years, the application of Machine Learning algorithms in quantitative stock trading systems has become established, as it enables a data-driven approach to investing in the financial markets. However, most professional traders still look for an explanation of automatically generated signals to verify their adherence to technical and fundamental rules. This paper presents an explainable approach to stock trading. It investigates the use of classification rules, which represent reliable associations between a set of discrete indicator values and the target class, to address next-day stock price prediction. Adopting associative classifiers in short-term stock trading not only provides reliable signals but also allows domain experts to understand the rationale behind signal generation. The backtesting of a state-of-the-art associative classifier, relying on a lazy pruning strategy, has shown promising performance in terms of equity appreciation and robustness of the trading system to market drawdowns

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)