Context-based Bengali Next Word Prediction: A Comparative Study of Different Embedding Methods

Mahir  Mahbub; Suravi  Akhter; Ahmedul  Kabir; Zerina  Begum

doi:10.3329/dujase.v7i2.65088

Authors

Mahir Mahbub Institute of Information Technology, University of Dhaka, Dhaka-1000, Bangladesh
Suravi Akhter Institute of Information Technology, University of Dhaka, Dhaka-1000, Bangladesh
Ahmedul Kabir Institute of Information Technology, University of Dhaka, Dhaka-1000, Bangladesh
Zerina Begum Institute of Information Technology, University of Dhaka, Dhaka-1000, Bangladesh

DOI:

https://doi.org/10.3329/dujase.v7i2.65088

Keywords:

Context-based next word prediction, word embedding, sequence model, word2vec, fastText

Abstract

Next word prediction is a helpful feature for various typing subsystems. It is also convenient to have suggestions while typing to speed up the writing of digital documents. Therefore, researchers over time have been trying to enhance the capability of such a prediction system. Knowledge regarding the inner meaning of the words along with the contextual understanding of the sequence can be helpful in enhancing the next word prediction capability. Theoretically, these reasonings seem to be very promising. With the advancement of Natural Language Processing (NLP), these reasonings are found to be applicable in real scenarios. NLP techniques like Word embedding and sequential contextual modeling can help us to gain insight into these points. Word embedding can capture various relations among the words and explain their inner knowledge. On the other hand, sequence modeling can capture contextual information. In this paper, we figure out which embedding method works better for Bengali next word prediction. The embeddings we have compared are word2vec skip-gram, word2vec CBOW, fastText skip-gram and fastText CBOW. We have applied them in a deep learning sequential model based on LSTM which was trained on a large corpus of Bengali texts. The results reveal some useful insights about the contextual and sequential information gathering that will help to implement a context-based Bengali next word prediction system.

DUJASE Vol. 7 (2) 8-15, 2022 (July)

Abstract
136

PDF
124

Context-based Bengali Next Word Prediction: A Comparative Study of Different Embedding Methods

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Information

Current Issue