Search CORE

99 research outputs found

Leveraging full-text article exploration for citation analysis

Author: Baralis E.
Cagliero L.
La Quatra M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Scientific articles often include in-text citations quoting from external sources. When the cited source is an article, the citation context can be analyzed by exploring the article full-text. To quickly access the key information, researchers are often interested in identifying the sections of the cited article that are most pertinent to the text surrounding the citation in the citing article. This paper first performs a data-driven analysis of the correlation between the textual content of the sections of the cited article and the text snippet where the citation is placed. The results of the correlation analysis show that the title and abstract of the cited article are likely to include content highly similar to the citing snippet. However, the subsequent sections of the paper often include cited text snippets as well. Hence, there is a need to understand the extent to which an exploration of the full-text of the cited article would be beneficial to gain insights into the citing snippet, considering also the fact that the full-text access could be restricted. To this end, we then propose a classification approach to automatically predicting whether the cited snippets in the full-text of the paper contain a significant amount of new content beyond abstract and title. The proposed approach could support researchers in leveraging full-text article exploration for citation analysis. The experiments conducted on real scientific articles show promising results: the classifier has a 90% chance to correctly distinguish between the full-text exploration and only title and abstract cases

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Open Access Repository

Predicting student academic performance by means of associative classification

Author: Baralis E.
Cagliero L.
Canale L.
Farinetti L.
Venuto E.
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

The Learning Analytics community has recently paid particular attention to early predict learners’ performance. An established approach entails training classification models from past learner-related data in order to predict the exam success rate of a student well before the end of the course. Early predictions allow teachers to put in place targeted actions, e.g., supporting at-risk students to avoid exam failures or course dropouts. Although several machine learning and data mining solutions have been proposed to learn accurate predictors from past data, the interpretability and explainability of the best performing models is often limited. Therefore, in most cases, the reasons behind classifiers’ decisions remain unclear. This paper proposes an Explainable Learning Analytics solution to analyze learner-generated data acquired by our technical university, which relies on a blended learning model. It adopts classification techniques to early predict the success rate of about 5000 students who were enrolled in the first year courses of our university. It proposes to apply associative classifiers at different time points and to explore the characteristics of the models that led to assign pass or fail success rates. Thanks to their inherent interpretability, associative models can be manually explored by domain experts with the twofold aim at validating classifier outcomes through local rule-based explanations and identifying at-risk/successful student profiles by interpreting the global rule-based model. The results of an in-depth empirical evaluation demonstrate that associative models (i) perform as good as the best performing classification models, and (ii) give relevant insights into the per-student success rate assignments

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

GraphSum: discovering correlations among multiple terms for graph-based summarization

Author: Baralis E.
Cagliero L.
Fiori A.
Mahoto N. A.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Generalized association rule mining with constraints

Author: BARALIS E.
CAGLIERO L.
CERQUITELLI T.
GARZA P.
Publication venue: 'Elsevier BV'
Publication date
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Multi-document summarization based on the Yago ontology

Author: Baralis E.
Cagliero L.
Fiori A.
Jabeen S.
Shah S.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Speech Analysis of Language Varieties in Italy

Author: Baralis E.
Koudounas A.
Quatra M. L.
Siniscalchi S. M.
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2024
Field of study

Italy exhibits rich linguistic diversity across its territory due to the distinct regional languages spoken in different areas. Recent advances in self-supervised learning provide new opportunities to analyze Italy's linguistic varieties using speech data alone. This includes the potential to leverage representations learned from large amounts of data to better examine nuances between closely related linguistic varieties. In this study, we focus on automatically identifying the geographic region of origin of speech samples drawn from Italy's diverse language varieties. We leverage self-supervised learning models to tackle this task and analyze differences and similarities between Italy's regional languages. In doing so, we also seek to uncover new insights into the relationships among these diverse yet closely related varieties, which may help linguists understand their interconnected evolution and regional development over time and space. To improve the discriminative ability of learned representations, we evaluate several supervised contrastive learning objectives, both as pre-training steps and additional fine-tuning objectives. Experimental evidence shows that pre-trained self-supervised models can effectively identify regions from speech recording. Additionally, incorporating contrastive objectives during fine-tuning improves classification accuracy and yields embeddings that distinctly separate regional varieties, demonstrating the value of combining self-supervised pre-training and contrastive learning for this task

Archivio istituzionale della ricerca - Università di Palermo

Occupational exposure to vibrations: some considerations with reference to the recently issued regulations

Author: BARALIS L
CIGNA C
PATRUCCO M.
SAVOCA D
Publication venue: Fiordo s.r.l.
Publication date
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Machine learning supported next-maintenance prediction for industrial vehicles

Author: Baralis E.
Cagliero L.
Loti R.
Mellia M.
Mishra S.
Salvatori L.
Vassio L.
Publication venue: CEUR-WS
Publication date
Field of study

Industrial and construction vehicles require tight periodic maintenance operations. Their schedule depends on vehicle characteristics and usage. The latter can be accurately monitored through various on-board devices, enabling the application of Machine Learning techniques to analyze vehicle usage patterns and design predictive analytics. This paper presents a data-driven application to automatically schedule the periodic maintenance operations of industrial vehicles. It aims to predict, for each vehicle and date, the actual remaining days until the next maintenance is due. Our Machine Learning solution is designed to address the following challenges: (i) the non-stationarity of the per-vehicle utilization time series, which limits the effectiveness of classic scheduling policies, and (ii) the potential lack of historical data for those vehicles that have recently been added to the fleet, which hinders the learning of accurate predictors from past data. Preliminary results collected in a real industrial scenario demonstrate the effectiveness of the proposed solution on heterogeneous vehicles. The system we propose here is currently under deployment, enabling further tests and tunings

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

NEMICO: Mining network data through cloud-based data mining techniques

Author: Baralis E.
Cagliero L.
Cerquitelli T.
Chiusano S.
Garza P.
Grimaudo L.
Pulvirenti F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Thanks to the rapid advances in Internet-based applications, data acquisition and storage technologies, petabyte-sized network data collections are becoming more and more common, thus prompting the need for scalable data analysis solutions. By leveraging today’s ubiquitous many-core computer architectures and the increasingly popular cloud computing paradigm, the applicability of data mining algorithms to these large volumes of network data can be scaled up to gain interesting insights. This paper proposes NEMICO, a comprehensive Big Data mining system targeted to network traffic flow analyses (e.g., traffic flow characterization, anomaly detection, multiplelevel pattern mining). NEMICO comprises new approaches that contribute to a paradigm-shift in distributed data mining by addressing most challenging issues related to Big Data, such as data sparsity, horizontal scaling, and parallel computation

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Heterogeneous industrial vehicle usage predictions: A real case

Author: Amparore E.
Baralis E.
Cagliero L.
Loti R.
Markudova D.
Mellia M.
Salvatori L.
Vassio L.
Publication venue: CEUR-WS
Publication date
Field of study

Predicting future vehicle usage based on the analysis of CAN bus data is a popular data mining application. Many of the usage indicators, like the utilization hours, are non-stationary time series. To predict their values, recent approaches based on Machine Learning combine multiple data features describing engine status, travels, and roads. While most of the proposed solutions address cars and trucks usage prediction, a smaller body of work has been devoted to industrial and construction vehicles, which are usually characterized by more complex and heterogeneous usage patterns. This paper describes a real case study performed on a 4-year CAN bus dataset collecting usage data about 2 250 construction vehicles of various types and models. We apply a statistics-based approach to select the most discriminating data features. Separately for each vehicle, we train regression algorithms on historical data enriched with contextual information. The achieved results demonstrate the effectiveness of the proposed solution

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)