research article
Developing a robust question and answer system for the Turkish language utilizing deep learning techniques
Abstract
Research on open-ended question and answer systems faces several complex challenges within the domain of natural language processing, one of which is limited data availability. Although numerous studies have been conducted in widely spoken languages such as English, there is a notable scarcity of research in languages such as Turkish. In our preliminary investigation, we proposed a solution to address the data shortage issue in Turkish by translating the English SQuAD dataset into Turkish. Another challenge is the success of deep learning models that use large language models. We developed various baseline models using deep learning techniques on this newly created dataset and performed analyses from multiple perspectives. Deep learning models and large language models often present an architectural enigma for many researchers, so analyzing both the questions and the corresponding answer-bearing data is of utmost importance. We have shown that the structuralization and enrichment of the data contribute significantly to the success of the model. In our research, we devised an architecture that incorporates a structural transformation of data before use in model training. This approach enabled us to enhance the learning capacity of the system without altering the underlying closed-box architecture of the large language models and deep learning systems employed.Publisher versio- article
- Deep learning
- Large language models
- Knowledge base data
- Squad
- Turkish question answer dataset
- Machine reading comprehension
- Question answering
- Natural language processing
- Transformers
- Analytical models
- Accuracy
- Bidirectional control
- Encoding
- Data models
- Deep learning
- Question answering (information retrieval)
- Translation
- Knowledge based systems