41 research outputs found
SRL4ORL: Improving Opinion Role Labeling using Multi-task Learning with Semantic Role Labeling
For over a decade, machine learning has been used to extract
opinion-holder-target structures from text to answer the question "Who
expressed what kind of sentiment towards what?". Recent neural approaches do
not outperform the state-of-the-art feature-based models for Opinion Role
Labeling (ORL). We suspect this is due to the scarcity of labeled training data
and address this issue using different multi-task learning (MTL) techniques
with a related task which has substantially more data, i.e. Semantic Role
Labeling (SRL). We show that two MTL models improve significantly over the
single-task model for labeling of both holders and targets, on the development
and the test sets. We found that the vanilla MTL model which makes predictions
using only shared ORL and SRL features, performs the best. With deeper analysis
we determine what works and what might be done to make further improvements for
ORL.Comment: Published in NAACL 201
A Mention-Ranking Model for Abstract Anaphora Resolution
Resolving abstract anaphora is an important, but difficult task for text
understanding. Yet, with recent advances in representation learning this task
becomes a more tangible aim. A central property of abstract anaphora is that it
establishes a relation between the anaphor embedded in the anaphoric sentence
and its (typically non-nominal) antecedent. We propose a mention-ranking model
that learns how abstract anaphors relate to their antecedents with an
LSTM-Siamese Net. We overcome the lack of training data by generating
artificial anaphoric sentence--antecedent pairs. Our model outperforms
state-of-the-art results on shell noun resolution. We also report first
benchmark results on an abstract anaphora subset of the ARRAU corpus. This
corpus presents a greater challenge due to a mixture of nominal and pronominal
anaphors and a greater range of confounders. We found model variants that
outperform the baselines for nominal anaphors, without training on individual
anaphor data, but still lag behind for pronominal anaphors. Our model selects
syntactically plausible candidates and -- if disregarding syntax --
discriminates candidates using deeper features.Comment: In Proceedings of the 2017 Conference on Empirical Methods in Natural
Language Processing (EMNLP). Copenhagen, Denmar
How Much Consistency Is Your Accuracy Worth?
Contrast set consistency is a robustness measurement that evaluates the rate
at which a model correctly responds to all instances in a bundle of minimally
different examples relying on the same knowledge. To draw additional insights,
we propose to complement consistency with relative consistency -- the
probability that an equally accurate model would surpass the consistency of the
proposed model, given a distribution over possible consistencies. Models with
100% relative consistency have reached a consistency peak for their accuracy.
We reflect on prior work that reports consistency in contrast sets and observe
that relative consistency can alter the assessment of a model's consistency
compared to another. We anticipate that our proposed measurement and insights
will influence future studies aiming to promote consistent behavior in models.Comment: BlackboxNLP 2023 accepted paper camera-ready version; 6 pages main, 3
pages appendi
Latentna semantiÄka analiza, varijante i primjene
U danaÅ”nje vrijeme sve viÅ”e težimo tome da omoguÄimo da raÄunalo izvrÅ”ava zadatke, koje Äovjek Äini rutinski, jednako brzo i efikasno. Jedan od takvih zadataka je i pronalazak par dokumenata iz kolekcije koji su najrelevantniji za korisnikov upit. Prvi korak u rjeÅ”avanju tog problema je reprezentacija kolekcije dokumenata pojmovno-dokumentnom matricom, Äiji elementi predstavljaju tf-idf težine rijeÄi u dokumentu. Na taj naÄin smo svaki dokument prikazali vektorom u prostoru pojmova. Ako i upit prikažemo vektorom, onda za usporedbu upita i dokumenta iz kolekcije, možemo iskoristiti standardne mjere sliÄnosti, poput kosinusne. U takvom prostoru, sinonimi Äe biti ortogonalni, a viÅ”eznaÄnice Äe biti predstavljene jednim vektorom, neovisno o kontekstu u kojem se rijeÄ nalazi. Motivirani tom Äinjenicom i velikom dimenzijom pojmovno-dokumentne matrice, odluÄili smo ju aproksimirati matricom nižeg ranga. Aproksimaciju je omoguÄila singularna dekompozicija matrice (SVD). Pokazali smo da aproksimacijom uzimamo u obzir kontekst u kojem se rijeÄ nalazi. Kako bismo korisnikov upit mogli usporediti s vektorima dokumenata u novonastalom prostoru i njega transformiramo. Pokazali smo kako u sluÄaju dinamiÄke kolekcije možemo dodati nove dokumente i pojmove u veÄ postojeÄi latentni prostor. Iako je opisana metoda, koju kraÄe zovemo LSA, donekle rijeÅ”ila problem sinonima, preostao je problem s viÅ”eznaÄnicama. Osim toga, LSA pretpostavlja da Å”um uzorka podataka (dobiven zbog jeziÄne varijabilnosti) ima Gaussovu distribuciju, Å”to nije prirodna pretpostavka. SljedeÄom metodom, pLSA, pretpostavili smo da svaki dokument dolazi iz nekog generativnog, vjerojatnosnog procesa Äije parametre tražimo maksimizacijom izglednosti. Svaki dokument je mjeÅ”avina latentnih koncepata i tražimo posteriorne vjerojatnosti tih koncepata uz dana opažanja. MeÄutim, pLSA ih shvaÄa kao parametar modela, Å”to dovodi do prenauÄenosti. Zato smo prezentirali joÅ” jedan model, LDA, koji te vjerojatnosti tretira kao distribuciju koja ovisi o nekom parametru. Kao i pLSA, i LDA reprezentira dokumente kao mjeÅ”avinu latentnih tema, ali teme su sada distribucije rijeÄi iz rjeÄnika. Zato je bilo potrebno definirati neku distribuciju distribucija, gdje se prirodno nametnula Diricheltova distribucija. Na kraju smo ukratko prikazali modeliranje tema na kolekciji Älanaka iz Wikipedije.Nowadays, more and more important is to make a computer that performs tasks that man does routinely, as fast and efficiently. One of these tasks is finding a few documents from the given collection, that are most relevant for userās query. The first step in solving this problem is representing the collection of documents as a term-document matrix, whose elements are tf-idf weights of words in the document. In this way, we represent each document as a vector in the space of terms. If the query is represented as a vector as well, standard similarity measures, such as a cosine similarity, can be used for comparison of the query and documents. In such space, synonyms will be orthogonal and polysemies will be presented with one vector, regardless of the context of the word. Motivated by this fact, and a large dimension of the term-document matrix, a lower rank approximation of the matrix is done. The approximation is gained using a singular value decomposition (SVD) of the matrix. We have shown that the approximation takes into account the context of the words. The query needs to be transformed into a new space as well, so it can be compared with vectors in this lower dimensional space. We showed how can we add new documents and terms in the case of a dynamic collection. While this method, solves the problem of synonyms to some extent, the problem with polysemies remains unsolved. In addition, LSA assumes that the data noise (gained from language variability) has a Gaussian distribution, which is not a natural assumption. The following method, pLSA, assumes that each document comes from a generative, probabilistic process, whose parameters we seek with maximization of likelihood. Each document is a mixture of latent concepts and we look for posterior probabilities of these concepts when observations are given. However, pLSA assumes these probabilities are parameters of model which leads to over-fitting of the model. Therefore, we present another model, LDA, that treats these probabilities as a distribution that depends on some parameter. Documents are, again, represented as a mixture of latent topics, but these topics are a distribution of words from the dictionary. Therefore, it is necessary to define a distribution of distributions and a natural choice is the Dirichelt distribution. Finally, we have briefly presented a topic modeling of the collection of articles from Wikipedia
Latentna semantiÄka analiza, varijante i primjene
U danaÅ”nje vrijeme sve viÅ”e težimo tome da omoguÄimo da raÄunalo izvrÅ”ava zadatke, koje Äovjek Äini rutinski, jednako brzo i efikasno. Jedan od takvih zadataka je i pronalazak par dokumenata iz kolekcije koji su najrelevantniji za korisnikov upit. Prvi korak u rjeÅ”avanju tog problema je reprezentacija kolekcije dokumenata pojmovno-dokumentnom matricom, Äiji elementi predstavljaju tf-idf težine rijeÄi u dokumentu. Na taj naÄin smo svaki dokument prikazali vektorom u prostoru pojmova. Ako i upit prikažemo vektorom, onda za usporedbu upita i dokumenta iz kolekcije, možemo iskoristiti standardne mjere sliÄnosti, poput kosinusne. U takvom prostoru, sinonimi Äe biti ortogonalni, a viÅ”eznaÄnice Äe biti predstavljene jednim vektorom, neovisno o kontekstu u kojem se rijeÄ nalazi. Motivirani tom Äinjenicom i velikom dimenzijom pojmovno-dokumentne matrice, odluÄili smo ju aproksimirati matricom nižeg ranga. Aproksimaciju je omoguÄila singularna dekompozicija matrice (SVD). Pokazali smo da aproksimacijom uzimamo u obzir kontekst u kojem se rijeÄ nalazi. Kako bismo korisnikov upit mogli usporediti s vektorima dokumenata u novonastalom prostoru i njega transformiramo. Pokazali smo kako u sluÄaju dinamiÄke kolekcije možemo dodati nove dokumente i pojmove u veÄ postojeÄi latentni prostor. Iako je opisana metoda, koju kraÄe zovemo LSA, donekle rijeÅ”ila problem sinonima, preostao je problem s viÅ”eznaÄnicama. Osim toga, LSA pretpostavlja da Å”um uzorka podataka (dobiven zbog jeziÄne varijabilnosti) ima Gaussovu distribuciju, Å”to nije prirodna pretpostavka. SljedeÄom metodom, pLSA, pretpostavili smo da svaki dokument dolazi iz nekog generativnog, vjerojatnosnog procesa Äije parametre tražimo maksimizacijom izglednosti. Svaki dokument je mjeÅ”avina latentnih koncepata i tražimo posteriorne vjerojatnosti tih koncepata uz dana opažanja. MeÄutim, pLSA ih shvaÄa kao parametar modela, Å”to dovodi do prenauÄenosti. Zato smo prezentirali joÅ” jedan model, LDA, koji te vjerojatnosti tretira kao distribuciju koja ovisi o nekom parametru. Kao i pLSA, i LDA reprezentira dokumente kao mjeÅ”avinu latentnih tema, ali teme su sada distribucije rijeÄi iz rjeÄnika. Zato je bilo potrebno definirati neku distribuciju distribucija, gdje se prirodno nametnula Diricheltova distribucija. Na kraju smo ukratko prikazali modeliranje tema na kolekciji Älanaka iz Wikipedije.Nowadays, more and more important is to make a computer that performs tasks that man does routinely, as fast and efficiently. One of these tasks is finding a few documents from the given collection, that are most relevant for userās query. The first step in solving this problem is representing the collection of documents as a term-document matrix, whose elements are tf-idf weights of words in the document. In this way, we represent each document as a vector in the space of terms. If the query is represented as a vector as well, standard similarity measures, such as a cosine similarity, can be used for comparison of the query and documents. In such space, synonyms will be orthogonal and polysemies will be presented with one vector, regardless of the context of the word. Motivated by this fact, and a large dimension of the term-document matrix, a lower rank approximation of the matrix is done. The approximation is gained using a singular value decomposition (SVD) of the matrix. We have shown that the approximation takes into account the context of the words. The query needs to be transformed into a new space as well, so it can be compared with vectors in this lower dimensional space. We showed how can we add new documents and terms in the case of a dynamic collection. While this method, solves the problem of synonyms to some extent, the problem with polysemies remains unsolved. In addition, LSA assumes that the data noise (gained from language variability) has a Gaussian distribution, which is not a natural assumption. The following method, pLSA, assumes that each document comes from a generative, probabilistic process, whose parameters we seek with maximization of likelihood. Each document is a mixture of latent concepts and we look for posterior probabilities of these concepts when observations are given. However, pLSA assumes these probabilities are parameters of model which leads to over-fitting of the model. Therefore, we present another model, LDA, that treats these probabilities as a distribution that depends on some parameter. Documents are, again, represented as a mixture of latent topics, but these topics are a distribution of words from the dictionary. Therefore, it is necessary to define a distribution of distributions and a natural choice is the Dirichelt distribution. Finally, we have briefly presented a topic modeling of the collection of articles from Wikipedia
Latentna semantiÄka analiza, varijante i primjene
U danaÅ”nje vrijeme sve viÅ”e težimo tome da omoguÄimo da raÄunalo izvrÅ”ava zadatke, koje Äovjek Äini rutinski, jednako brzo i efikasno. Jedan od takvih zadataka je i pronalazak par dokumenata iz kolekcije koji su najrelevantniji za korisnikov upit. Prvi korak u rjeÅ”avanju tog problema je reprezentacija kolekcije dokumenata pojmovno-dokumentnom matricom, Äiji elementi predstavljaju tf-idf težine rijeÄi u dokumentu. Na taj naÄin smo svaki dokument prikazali vektorom u prostoru pojmova. Ako i upit prikažemo vektorom, onda za usporedbu upita i dokumenta iz kolekcije, možemo iskoristiti standardne mjere sliÄnosti, poput kosinusne. U takvom prostoru, sinonimi Äe biti ortogonalni, a viÅ”eznaÄnice Äe biti predstavljene jednim vektorom, neovisno o kontekstu u kojem se rijeÄ nalazi. Motivirani tom Äinjenicom i velikom dimenzijom pojmovno-dokumentne matrice, odluÄili smo ju aproksimirati matricom nižeg ranga. Aproksimaciju je omoguÄila singularna dekompozicija matrice (SVD). Pokazali smo da aproksimacijom uzimamo u obzir kontekst u kojem se rijeÄ nalazi. Kako bismo korisnikov upit mogli usporediti s vektorima dokumenata u novonastalom prostoru i njega transformiramo. Pokazali smo kako u sluÄaju dinamiÄke kolekcije možemo dodati nove dokumente i pojmove u veÄ postojeÄi latentni prostor. Iako je opisana metoda, koju kraÄe zovemo LSA, donekle rijeÅ”ila problem sinonima, preostao je problem s viÅ”eznaÄnicama. Osim toga, LSA pretpostavlja da Å”um uzorka podataka (dobiven zbog jeziÄne varijabilnosti) ima Gaussovu distribuciju, Å”to nije prirodna pretpostavka. SljedeÄom metodom, pLSA, pretpostavili smo da svaki dokument dolazi iz nekog generativnog, vjerojatnosnog procesa Äije parametre tražimo maksimizacijom izglednosti. Svaki dokument je mjeÅ”avina latentnih koncepata i tražimo posteriorne vjerojatnosti tih koncepata uz dana opažanja. MeÄutim, pLSA ih shvaÄa kao parametar modela, Å”to dovodi do prenauÄenosti. Zato smo prezentirali joÅ” jedan model, LDA, koji te vjerojatnosti tretira kao distribuciju koja ovisi o nekom parametru. Kao i pLSA, i LDA reprezentira dokumente kao mjeÅ”avinu latentnih tema, ali teme su sada distribucije rijeÄi iz rjeÄnika. Zato je bilo potrebno definirati neku distribuciju distribucija, gdje se prirodno nametnula Diricheltova distribucija. Na kraju smo ukratko prikazali modeliranje tema na kolekciji Älanaka iz Wikipedije.Nowadays, more and more important is to make a computer that performs tasks that man does routinely, as fast and efficiently. One of these tasks is finding a few documents from the given collection, that are most relevant for userās query. The first step in solving this problem is representing the collection of documents as a term-document matrix, whose elements are tf-idf weights of words in the document. In this way, we represent each document as a vector in the space of terms. If the query is represented as a vector as well, standard similarity measures, such as a cosine similarity, can be used for comparison of the query and documents. In such space, synonyms will be orthogonal and polysemies will be presented with one vector, regardless of the context of the word. Motivated by this fact, and a large dimension of the term-document matrix, a lower rank approximation of the matrix is done. The approximation is gained using a singular value decomposition (SVD) of the matrix. We have shown that the approximation takes into account the context of the words. The query needs to be transformed into a new space as well, so it can be compared with vectors in this lower dimensional space. We showed how can we add new documents and terms in the case of a dynamic collection. While this method, solves the problem of synonyms to some extent, the problem with polysemies remains unsolved. In addition, LSA assumes that the data noise (gained from language variability) has a Gaussian distribution, which is not a natural assumption. The following method, pLSA, assumes that each document comes from a generative, probabilistic process, whose parameters we seek with maximization of likelihood. Each document is a mixture of latent concepts and we look for posterior probabilities of these concepts when observations are given. However, pLSA assumes these probabilities are parameters of model which leads to over-fitting of the model. Therefore, we present another model, LDA, that treats these probabilities as a distribution that depends on some parameter. Documents are, again, represented as a mixture of latent topics, but these topics are a distribution of words from the dictionary. Therefore, it is necessary to define a distribution of distributions and a natural choice is the Dirichelt distribution. Finally, we have briefly presented a topic modeling of the collection of articles from Wikipedia