7 research outputs found
Abstractive Summarization as Augmentation for Document-Level Event Detection
Transformer-based models have consistently produced substantial performance
gains across a variety of NLP tasks, compared to shallow models. However, deep
models are orders of magnitude more computationally expensive than shallow
models, especially on tasks with large sequence lengths, such as document-level
event detection. In this work, we attempt to bridge the performance gap between
shallow and deep models on document-level event detection by using abstractive
text summarization as an augmentation method. We augment the DocEE dataset by
generating abstractive summaries of examples from low-resource classes. For
classification, we use linear SVM with TF-IDF representations and RoBERTa-base.
We use BART for zero-shot abstractive summarization, making our augmentation
setup less resource-intensive compared to supervised fine-tuning. We experiment
with four decoding methods for text generation, namely beam search, top-k
sampling, top-p sampling, and contrastive search. Furthermore, we investigate
the impact of using document titles as additional input for classification. Our
results show that using the document title offers 2.04% and 3.19% absolute
improvement in macro F1-score for linear SVM and RoBERTa, respectively.
Augmentation via summarization further improves the performance of linear SVM
by about 0.5%, varying slightly across decoding methods. Overall, our
augmentation setup yields insufficient improvements for linear SVM compared to
RoBERTa
Extraction of Articles from News Portals Using Machine Learning
Revolucijom novih tehnologija, primarno pametnih mobilnih ureÄaja,
omoguÄena je konstantna i neprekidna komunikacija.
Neprekidna komunikacija razliÄitom vrstom medija,
poput teksta i slika, stvara neprekidan tok podataka.
Ti se podaci trebaju obraditi.
Ekstrakcija informacija važno je podruÄje primarno orijentirano
na ekstrakciju strukturiranih informacija iz nestrukturiranih
tekstualnih izvora, koje svoje
metode i pristupe crpi iz obrade prirodnog jezika
i umjetne inteligencije.
Jedan od standardnih izvora podataka su Älanci (online) novinskih
portala. Standardni pristupi ekstrakcije Älanaka s novinskih
portala temeljeni su na ruÄno pisanim pravilima i heuristikama.
Ovaj diplomski rad istražuje moguÄnosti kombiniranja tradicionalnih
algoritama zajedno s raÄunalnim vidom za poboljÅ”avanje
ekstrakcije Älanaka s novinskih portala.Rapid adoption of new mobile technologies, such as
smartphones and tablets, enabled continuous and
uninterrupted communication worldwide. Communicating
through various different mediums, such as text and images,
continuously creates data. That data needs to be
processed and analyzed. Information extraction, an
important field for extracting structured information
from raw, unstructured textual sources, draws its
methods from natural language processing and artificial
intelligence. In the domain of information extraction is
extracting articles from online news portals. Standard
approaches use various heuristics and hand-crafted rules.
This thesis explores the combination of computer vision and
traditional approaches for improving the results of
article extraction
Extraction of Articles from News Portals Using Machine Learning
Revolucijom novih tehnologija, primarno pametnih mobilnih ureÄaja,
omoguÄena je konstantna i neprekidna komunikacija.
Neprekidna komunikacija razliÄitom vrstom medija,
poput teksta i slika, stvara neprekidan tok podataka.
Ti se podaci trebaju obraditi.
Ekstrakcija informacija važno je podruÄje primarno orijentirano
na ekstrakciju strukturiranih informacija iz nestrukturiranih
tekstualnih izvora, koje svoje
metode i pristupe crpi iz obrade prirodnog jezika
i umjetne inteligencije.
Jedan od standardnih izvora podataka su Älanci (online) novinskih
portala. Standardni pristupi ekstrakcije Älanaka s novinskih
portala temeljeni su na ruÄno pisanim pravilima i heuristikama.
Ovaj diplomski rad istražuje moguÄnosti kombiniranja tradicionalnih
algoritama zajedno s raÄunalnim vidom za poboljÅ”avanje
ekstrakcije Älanaka s novinskih portala.Rapid adoption of new mobile technologies, such as
smartphones and tablets, enabled continuous and
uninterrupted communication worldwide. Communicating
through various different mediums, such as text and images,
continuously creates data. That data needs to be
processed and analyzed. Information extraction, an
important field for extracting structured information
from raw, unstructured textual sources, draws its
methods from natural language processing and artificial
intelligence. In the domain of information extraction is
extracting articles from online news portals. Standard
approaches use various heuristics and hand-crafted rules.
This thesis explores the combination of computer vision and
traditional approaches for improving the results of
article extraction
Extraction of Articles from News Portals Using Machine Learning
Revolucijom novih tehnologija, primarno pametnih mobilnih ureÄaja,
omoguÄena je konstantna i neprekidna komunikacija.
Neprekidna komunikacija razliÄitom vrstom medija,
poput teksta i slika, stvara neprekidan tok podataka.
Ti se podaci trebaju obraditi.
Ekstrakcija informacija važno je podruÄje primarno orijentirano
na ekstrakciju strukturiranih informacija iz nestrukturiranih
tekstualnih izvora, koje svoje
metode i pristupe crpi iz obrade prirodnog jezika
i umjetne inteligencije.
Jedan od standardnih izvora podataka su Älanci (online) novinskih
portala. Standardni pristupi ekstrakcije Älanaka s novinskih
portala temeljeni su na ruÄno pisanim pravilima i heuristikama.
Ovaj diplomski rad istražuje moguÄnosti kombiniranja tradicionalnih
algoritama zajedno s raÄunalnim vidom za poboljÅ”avanje
ekstrakcije Älanaka s novinskih portala.Rapid adoption of new mobile technologies, such as
smartphones and tablets, enabled continuous and
uninterrupted communication worldwide. Communicating
through various different mediums, such as text and images,
continuously creates data. That data needs to be
processed and analyzed. Information extraction, an
important field for extracting structured information
from raw, unstructured textual sources, draws its
methods from natural language processing and artificial
intelligence. In the domain of information extraction is
extracting articles from online news portals. Standard
approaches use various heuristics and hand-crafted rules.
This thesis explores the combination of computer vision and
traditional approaches for improving the results of
article extraction