Enhancing Language Model Performance with a Novel Text Preprocessing Method

Main Article Content

A. Jalili
H. Tabrizchi
A. Mosavi
A.R. Varkonyi-Koczy

Abstract

Advances in natural language processing highlight the importance of text data preparation with machine learning. It has been reported that the traditional methods often fail to deal with the language complexity which affects model performance. Consequently, this paper proposes an approach which uses tokenization, noise reduction, and normalization to improve text quality.

Article Details

How to Cite
[1]
A. Jalili, H. Tabrizchi, A. Mosavi, and A. Varkonyi-Koczy, “Enhancing Language Model Performance with a Novel Text Preprocessing Method”, Acta Phys. Pol. A, vol. 146, no. 4, p. 542, Nov. 2024, doi: 10.12693/APhysPolA.146.542.
Section
Special segment

References

M.A.K. Raiaan, M.S.H. Mukta, K. Fatema, N.M. Fahad, S. Sakib, M.M.J. Mim, IEEE Access 12, 26839 (2024)

D. Khurana, A. Koli, K. Khatter, S. Singh, Multimedia Tools Appl. 82, 3713 (2022)

Q Wan, X. Xu, J. Han, Appl. Soft Comput. 150, 111039 (2024)

M.F. Mridha, A.A. Lima, K. Nur, S.C. Das, M. Hasan, M.M. Kabir, IEEE Access 9, 156043 (2021)

K. Al Sharou, Z. Li, L. Specia, in: Proc. of the Int. Conf. on Recent Advances in Natural Language Processing (RANLP 2021), INCOMA, 2021, p. 53

G. Angiani, L. Ferrari, T. Fontanini, P. Fornacciari, E. Iotti, F. Magliani, S. Manicardi, in: Int. Workshop on Knowledge Discovery on the Web, 2016

M.A. Alonso, C. Gòmez-Rodrìguez, J. Vilares, Appl. Sci. 11, 1090 (2021)

M. Arief, M.B.M. Deris, in: 2021 6th Int. Conf. on Informatics and Computing (ICIC), IEEE, 2021

K. Amarasinghe, M. Manic, in: 2015 Resilience Week (RWS), IEEE, 2015

M.M. Rahman, F.A. Sakib, F. Faisal, arXiv:2310.05589, 2023

M. Anandarajan, C. Hill, T. Nolan, Practical Text Analytics: Maximizing the Value of Text Data, 2019

A.H. Aliwy, Int. J. Inform. Edu. Technol. 2, 348 (2012)

R. Albalawi, T.H. Yeap, M. Benyoucef, Front. Artif. Intell. 3, 42 (2020)

H.K. Al-Khafaji, A.T. Habeeb, IOSR Journal of Computer Engineering (IOSR-JCE) 19(3), 44 (2017)