Context-based Bengali Next Word Prediction: A Comparative Study of Different Embedding Methods
Keywords:Context-based next word prediction, word embedding, sequence model, word2vec, fastText
Next word prediction is a helpful feature for various typing subsystems. It is also convenient to have suggestions while typing to speed up the writing of digital documents. Therefore, researchers over time have been trying to enhance the capability of such a prediction system. Knowledge regarding the inner meaning of the words along with the contextual understanding of the sequence can be helpful in enhancing the next word prediction capability. Theoretically, these reasonings seem to be very promising. With the advancement of Natural Language Processing (NLP), these reasonings are found to be applicable in real scenarios. NLP techniques like Word embedding and sequential contextual modeling can help us to gain insight into these points. Word embedding can capture various relations among the words and explain their inner knowledge. On the other hand, sequence modeling can capture contextual information. In this paper, we figure out which embedding method works better for Bengali next word prediction. The embeddings we have compared are word2vec skip-gram, word2vec CBOW, fastText skip-gram and fastText CBOW. We have applied them in a deep learning sequential model based on LSTM which was trained on a large corpus of Bengali texts. The results reveal some useful insights about the contextual and sequential information gathering that will help to implement a context-based Bengali next word prediction system.
DUJASE Vol. 7 (2) 8-15, 2022 (July)
How to Cite
Copyright (c) 2022 Dhaka University Journal of Applied Science and Engineering
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.