Analyzing Word Frequency and Predictive Patterns in AI-Generated Essays

Henry Sanmi Makinde; Akindeji Ibrahim Makinde; Mutiyat Adeola Usman; Hope Adegoke

Research Article

Analyzing Word Frequency and Predictive Patterns in AI-Generated Essays

by Henry Sanmi Makinde, Akindeji Ibrahim Makinde, Mutiyat Adeola Usman, Hope Adegoke

Communications on Applied Electronics

Foundation of Computer Science (FCS), NY, USA

Volume 8 - Issue 1

Published: January 2026

Authors: Henry Sanmi Makinde, Akindeji Ibrahim Makinde, Mutiyat Adeola Usman, Hope Adegoke

10.5120/cae2026652920

PDF

Henry Sanmi Makinde, Akindeji Ibrahim Makinde, Mutiyat Adeola Usman, Hope Adegoke . Analyzing Word Frequency and Predictive Patterns in AI-Generated Essays. Communications on Applied Electronics. 8, 1 (January 2026), 73-85. DOI=10.5120/cae2026652920

                        @article{ 10.5120/cae2026652920,
                        author  = { Henry Sanmi Makinde,Akindeji Ibrahim Makinde,Mutiyat Adeola Usman,Hope Adegoke },
                        title   = { Analyzing Word Frequency and Predictive Patterns in AI-Generated Essays },
                        journal = { Communications on Applied Electronics },
                        year    = { 2026 },
                        volume  = { 8 },
                        number  = { 1 },
                        pages   = { 73-85 },
                        doi     = { 10.5120/cae2026652920 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2026
                        %A Henry Sanmi Makinde
                        %A Akindeji Ibrahim Makinde
                        %A Mutiyat Adeola Usman
                        %A Hope Adegoke
                        %T Analyzing Word Frequency and Predictive Patterns in AI-Generated Essays%T 
                        %J Communications on Applied Electronics
                        %V 8
                        %N 1
                        %P 73-85
                        %R 10.5120/cae2026652920
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Artificial Intelligence (AI) has dramatically transformed various aspects of human life and activities, including the composition of essays and texts. AI technologies have enabled computers to generate text that closely resembles human writing and this has raised concerns with implications for academic integrity, creative authenticity, and professional communication. This study aim to investigates the linguistic characteristics and predictive mechanisms underlying AI-generated essays, aiming to identify markers that distinguish them from human-authored texts. 1,000 essays with diverse topics and writing styles were generated using ChatGPT, DeepSeek, and Gemini and a comparable corpus of human-written essays were also collected from publicly available sources. The research work used natural language processing (NLP) techniques and machine learning models to analyze word frequency, next-word prediction patterns, and stylistic elements in a corpus of AI-generated and human-written essays.The results show that the temperature settings in AI models significantly influence word selection, with higher temperatures increasing randomness and reducing the likelihood of predictable word choices. Machine learning classification using Support Vector Machines (SVM) of 98% and Random Forests of 95.75% achieved high accuracy in differentiating between AI and human essays, highlighting the effectiveness of linguistic features for automated detection. The study concludes that AI-generated content can be reliably distinguished from human writing using stylistic and lexical features, contributing to the development of more reliable AI assessment tools and a better understanding of NLP model behavior.

References

Tang, R., Chuang, Y. N., & Hu, X. (2024). The science of detecting LLM-generated text. Communications of the ACM, 67(4), 50-59.
Logacheva, E., Hellas, A., Prather, J., Sarsa, S., & Leinonen, J. (2024). Evaluating Contextually Personalized Programming Exercises Created with Generative AI. arXiv preprint arXiv:2407.11994. https://doi.org/10.1145/3632620.3671103
Javaid, M., Haleem, A., Singh, R. P., Khan, S., & Khan, I. H. (2023). Unlocking the opportunities through ChatGPT Tool towards ameliorating the education system. BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 3(2), 100115. https://doi.org/10.1016/j.tbench.2023.100115
Draxler, F., Werner, A., Lehmann, F., Hoppe, M., Schmidt, A., Buschek, D., & Welsch, R. (2024). The AI ghostwriter effect: When users do not perceive ownership of AI-generated text but self-declare as authors. ACM Transactions on Computer-Human Interaction, 31(2), 1-40. https://doi.org/10.1145/3637875
Dergaa, I., Chamari, K., Zmijewski, P., & Saad, H. B. (2023). Fromhuman writing to artiﬁcial intelligence generated text: Examiningthe prospects and potential threats of ChatGPT in academic writ-ing. Biology of Sport, 40(2), 615–622
Roberto, C., & Sebastian, L. A. One-Class Learning for AI-Generated Essay Detection (2023). : Corizzo, R.; Leal-Arenas, S. One-Class Learning for AI-Generated Essay Detection. Appl. Sci. 2023, 13, 7901. Hz
Melliti, M. (2024). Using Genre Analysis to Detect AI-Generated Academic Texts. Diá-logos, 16(29), 09-27.
Akinwande, M., Adeliyi, O., & Yussuph, T. (2024). Decoding AI and Human Authorship: Nuances Revealed Through NLP and Statistical Analysis. International Journal of Cybernetics and Informatics. Vol. 13(4): 85-103
Moreno A. and Redondo T. (2016). Text Analytics: the convergence of Big Data and Artificial Intelligence. IJIMAI 3, 6 (2016), 57–64.
Shah, A., Ranka, P., Dedhia, U., Prasad, S., Muni, S., & Bhowmick, K. (2023). Detecting and unmasking AI-generated texts through explainable artificial intelligence using stylistic features. International Journal of Advanced Computer Science and Applications, 14(10) 1043-1053
Gray, A. (2024). ChatGPT" contamination": estimating the prevalence ofLLMs in the s cholarly literature. arXiv preprint arXiv. 2403.16887
Comas‐Forgas, R., Koulouris, A., & Kouis, D. (2025). ‘AI‐navigating’or ‘AI‐sinking’? An analysis of verbs in research articles titles suspicious of containing AI‐generated/assisted content. Learned Publishing, 38(1), 1-11.
Brahma, M., Karthika, N. J., Singh, A., Adiga, D., Bhate, S., Ramakrishnan, G., Saluja, R., & Desarkar, M. S. (2025). MorphTok: Morphologically Grounded Tokenization for Indian Languages. arXiv preprint arXiv:2504.10335. https://doi.org/10.48550/arXiv.2504.10335
Pattnayak, P., Patel, H. L., & Agarwal, A. (2025). Tokenization Matters: Improving Zero-Shot NER for Indic Languages. arXiv preprint arXiv:2504.16977. https://doi.org/10.48550/arXiv.2504.16977
Raj, B. S., Suri, G., Dewangan, V., & Sonavane, R. (2024). When Every Token Counts: Optimal Segmentation for Low-Resource Language Models. arXiv preprint arXiv:2412.06926. https://doi.org/10.48550/arXiv.2412.06926
Aida, T., & Bollegala, D. (2025). Investigating the Contextualised Word Embedding Dimensions Specified for Contextual and Temporal Semantic Changes. In Proceedings of the 31st International Conference on Computational Linguistics (pp. 1413–1437). Association for Computational Linguistics. https://doi.org/10.48550/arXiv.2407.02820
Worth, P. J. (2023). Word Embeddings and Semantic Spaces in Natural Language Processing. International Journal of Intelligence Science, 13(1), 1–21. https://doi.org/10.4236/ijis.2023.131001
Palominos, C., He, R., Fröhlich, K., Mülfarth, R. R., Seuffert, S., Sommer, I. E., Homan, P., Kircher, T., Stein, F & Hinzen, W. (2024). Approximating the semantic space: word embedding techniques in psychiatric speech analysis. Schizophrenia, 10(1), 1-10.,
Worth, P. J. (2023). Word Embeddings and Semantic Spaces in Natural Language Processing. International Journal of Intelligence Science, 13(1), 1–21. https://doi.org/10.4236/ijis.2023.131001
Zhou, J., Liu, C., Duan, N., & Li, M. (2022). An Overview of Pretrained Language Models for Natural Language Processing. AI Open, 3, 9–28. https://doi.org/10.1016/j.aiopen.2021.12.001
OpenAI. (2023). GPT-4 Technical Report. https://doi.org/10.48550/arXiv.2303.08774
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2023). Language models are few-shot learners. Communications of the ACM, 66(5), 108–117. https://doi.org/10.1145/3571991
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019. https://doi.org/10.48550/arXiv.1810.04805
Wang, A., Zhang, Y., Liu, J., & Bowman, S. R. (2023). Evaluating Pretrained Transformers for Natural Language Understanding. Transactions of the Association for Computational Linguistics, 11, 245–261. https://doi.org/10.1162/tacl_a_00559
Oancea, B. (2025). Text classification using machine learning methods. arXiv preprint arXiv:2502.19801. https://doi.org/10.48550/arXiv.2502.19801
Abia, V. M., & Johnson, E. H. (2024). Sentiment Analysis Techniques: A Comparative Study of Logistic Regression, Random Forest, and Naive Bayes on General English and Nigerian Texts. Journal of Engineering Research and Reports, 26(9), 123–135. https://doi.org/10.9734/jerr/2024/v26i91268
Shijaku, E., & Canhasi, E. (2024). Classification of human- and AI-generated texts for different languages and domains. International Journal of Speech Technology. https://doi.org/10.1007/s10772-024-10143-3
Sanchez-Medina, J. J. (2024). Sentiment analysis and random forest to classify LLM versus human source applied to Scientific Texts. arXiv preprint arXiv:2404.08673. https://doi.org/10.48550/arXiv.2404.08673
Makinde, H. S., Makinde, A. I., Usman, M. A., Adegoke, H., Makinde-Isola, B. A., Lawal, W., & Jimoh, I. T. The Readability Paradox: Can We Trust Decisions on AI Detectors? Technium Education and Humanities, 11, 181-195.
Krawczyk, N., Probierz, B., & Kozak, J. (2024). Towards AI-Generated Essay Classification Using Numerical Text Representation. Applied Sciences, 14(21), 1-23.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Predictive Patterns AI-generated essays DeepSeek ChatGPT Gemini Machine learning Analyzing word frequency