38 research outputs found
A Multiple Choices Reading Comprehension Corpus for Vietnamese Language Education
Machine reading comprehension has been an interesting and challenging task in
recent years, with the purpose of extracting useful information from texts. To
attain the computer ability to understand the reading text and answer relevant
information, we introduce ViMMRC 2.0 - an extension of the previous ViMMRC for
the task of multiple-choice reading comprehension in Vietnamese Textbooks which
contain the reading articles for students from Grade 1 to Grade 12. This
dataset has 699 reading passages which are prose and poems, and 5,273
questions. The questions in the new dataset are not fixed with four options as
in the previous version. Moreover, the difficulty of questions is increased,
which challenges the models to find the correct choice. The computer must
understand the whole context of the reading passage, the question, and the
content of each choice to extract the right answers. Hence, we propose the
multi-stage approach that combines the multi-step attention network (MAN) with
the natural language inference (NLI) task to enhance the performance of the
reading comprehension model. Then, we compare the proposed methodology with the
baseline BERTology models on the new dataset and the ViMMRC 1.0. Our
multi-stage models achieved 58.81% by Accuracy on the test set, which is 5.34%
better than the highest BERTology models. From the results of the error
analysis, we found the challenge of the reading comprehension models is
understanding the implicit context in texts and linking them together in order
to find the correct answers. Finally, we hope our new dataset will motivate
further research in enhancing the language understanding ability of computers
in the Vietnamese language
Comparison Between Traditional Machine Learning Models And Neural Network Models For Vietnamese Hate Speech Detection
Hate-speech detection on social network language has become one of the main
researching fields recently due to the spreading of social networks like
Facebook and Twitter. In Vietnam, the threat of offensive and harassment cause
bad impacts for online user. The VLSP - Shared task about Hate Speech Detection
on social networks showed many proposed approaches for detecting whatever
comment is clean or not. However, this problem still needs further researching.
Consequently, we compare traditional machine learning and deep learning on a
large dataset about the user's comments on social network in Vietnamese and
find out what is the advantage and disadvantage of each model by comparing
their accuracy on F1-score, then we pick two models in which has highest
accuracy in traditional machine learning models and deep neural models
respectively. Next, we compare these two models capable of predicting the right
label by referencing their confusion matrices and considering the advantages
and disadvantages of each model. Finally, from the comparison result, we
propose our ensemble method that concentrates the abilities of traditional
methods and deep learning methods.Comment: Published in The 2020 RIVF International Conference on Computing and
Communication Technologies (RIVF
Lensing in the Blue. II. Estimating the Sensitivity of Stratospheric Balloons to Weak Gravitational Lensing
The Superpressure Balloon-borne Imaging Telescope (SuperBIT) is a diffraction-limited, wide-field, 0.5 m, near-infrared to near-ultraviolet observatory designed to exploit the stratosphere's space-like conditions. SuperBIT's 2023 science flight will deliver deep, blue imaging of galaxy clusters for gravitational lensing analysis. In preparation, we have developed a weak-lensing measurement pipeline with modern algorithms for PSF characterization, shape measurement, and shear calibration. We validate our pipeline and forecast SuperBIT survey properties with simulated galaxy cluster observations in SuperBIT's near-UV and blue bandpasses. We predict imaging depth, galaxy number (source) density, and redshift distribution for observations in SuperBIT's three bluest filters; the effect of lensing sample selections is also considered. We find that, in three hours of on-sky integration, SuperBIT can attain a depth of b = 26 mag and a total source density exceeding 40 galaxies per square arcminute. Even with the application of lensing-analysis catalog selections, we find b-band source densities between 25 and 30 galaxies per square arcminute with a median redshift of z = 1.1. Our analysis confirms SuperBIT's capability for weak gravitational lensing measurements in the blue