Research Article

Analyzing Word Frequency and Predictive Patterns in AI-Generated Essays

by  Henry Sanmi Makinde, Akindeji Ibrahim Makinde, Mutiyat Adeola Usman, Hope Adegoke
journal cover
Communications on Applied Electronics
Foundation of Computer Science (FCS), NY, USA
Volume 8 - Issue 1
Published: January 2026
Authors: Henry Sanmi Makinde, Akindeji Ibrahim Makinde, Mutiyat Adeola Usman, Hope Adegoke
10.5120/cae2026652920
PDF

Henry Sanmi Makinde, Akindeji Ibrahim Makinde, Mutiyat Adeola Usman, Hope Adegoke . Analyzing Word Frequency and Predictive Patterns in AI-Generated Essays. Communications on Applied Electronics. 8, 1 (January 2026), 73-85. DOI=10.5120/cae2026652920

                        @article{ 10.5120/cae2026652920,
                        author  = { Henry Sanmi Makinde,Akindeji Ibrahim Makinde,Mutiyat Adeola Usman,Hope Adegoke },
                        title   = { Analyzing Word Frequency and Predictive Patterns in AI-Generated Essays },
                        journal = { Communications on Applied Electronics },
                        year    = { 2026 },
                        volume  = { 8 },
                        number  = { 1 },
                        pages   = { 73-85 },
                        doi     = { 10.5120/cae2026652920 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2026
                        %A Henry Sanmi Makinde
                        %A Akindeji Ibrahim Makinde
                        %A Mutiyat Adeola Usman
                        %A Hope Adegoke
                        %T Analyzing Word Frequency and Predictive Patterns in AI-Generated Essays%T 
                        %J Communications on Applied Electronics
                        %V 8
                        %N 1
                        %P 73-85
                        %R 10.5120/cae2026652920
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Artificial Intelligence (AI) has dramatically transformed various aspects of human life and activities, including the composition of essays and texts. AI technologies have enabled computers to generate text that closely resembles human writing and this has raised concerns with implications for academic integrity, creative authenticity, and professional communication. This study aim to investigates the linguistic characteristics and predictive mechanisms underlying AI-generated essays, aiming to identify markers that distinguish them from human-authored texts. 1,000 essays with diverse topics and writing styles were generated using ChatGPT, DeepSeek, and Gemini and a comparable corpus of human-written essays were also collected from publicly available sources. The research work used natural language processing (NLP) techniques and machine learning models to analyze word frequency, next-word prediction patterns, and stylistic elements in a corpus of AI-generated and human-written essays.The results show that the temperature settings in AI models significantly influence word selection, with higher temperatures increasing randomness and reducing the likelihood of predictable word choices. Machine learning classification using Support Vector Machines (SVM) of 98% and Random Forests of 95.75% achieved high accuracy in differentiating between AI and human essays, highlighting the effectiveness of linguistic features for automated detection. The study concludes that AI-generated content can be reliably distinguished from human writing using stylistic and lexical features, contributing to the development of more reliable AI assessment tools and a better understanding of NLP model behavior.

References
  • Tang, R., Chuang, Y. N., & Hu, X. (2024). The science of detecting LLM-generated text. Communications of the ACM, 67(4), 50-59.
  • Logacheva, E., Hellas, A., Prather, J., Sarsa, S., & Leinonen, J. (2024). Evaluating Contextually Personalized Programming Exercises Created with Generative AI. arXiv preprint arXiv:2407.11994. https://doi.org/10.1145/3632620.3671103
  • Javaid, M., Haleem, A., Singh, R. P., Khan, S., & Khan, I. H. (2023). Unlocking the opportunities through ChatGPT Tool towards ameliorating the education system. BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 3(2), 100115. https://doi.org/10.1016/j.tbench.2023.100115
  • Draxler, F., Werner, A., Lehmann, F., Hoppe, M., Schmidt, A., Buschek, D., & Welsch, R. (2024). The AI ghostwriter effect: When users do not perceive ownership of AI-generated text but self-declare as authors. ACM Transactions on Computer-Human Interaction, 31(2), 1-40. https://doi.org/10.1145/3637875
  • Dergaa, I., Chamari, K., Zmijewski, P., & Saad, H. B. (2023). Fromhuman writing to artificial intelligence generated text: Examiningthe prospects and potential threats of ChatGPT in academic writ-ing. Biology of Sport, 40(2), 615–622
  • Roberto, C., & Sebastian, L. A. One-Class Learning for AI-Generated Essay Detection (2023). : Corizzo, R.; Leal-Arenas, S. One-Class Learning for AI-Generated Essay Detection. Appl. Sci. 2023, 13, 7901. Hz
  • Melliti, M. (2024). Using Genre Analysis to Detect AI-Generated Academic Texts. Diá-logos, 16(29), 09-27.
  • Akinwande, M., Adeliyi, O., & Yussuph, T. (2024). Decoding AI and Human Authorship: Nuances Revealed Through NLP and Statistical Analysis. International Journal of Cybernetics and Informatics. Vol. 13(4): 85-103
  • Moreno A. and Redondo T. (2016). Text Analytics: the convergence of Big Data and Artificial Intelligence. IJIMAI 3, 6 (2016), 57–64.
  • Shah, A., Ranka, P., Dedhia, U., Prasad, S., Muni, S., & Bhowmick, K. (2023). Detecting and unmasking AI-generated texts through explainable artificial intelligence using stylistic features. International Journal of Advanced Computer Science and Applications, 14(10) 1043-1053
  • Gray, A. (2024). ChatGPT" contamination": estimating the prevalence ofLLMs in the s cholarly literature. arXiv preprint arXiv. 2403.16887
  • Comas‐Forgas, R., Koulouris, A., & Kouis, D. (2025). ‘AI‐navigating’or ‘AI‐sinking’? An analysis of verbs in research articles titles suspicious of containing AI‐generated/assisted content. Learned Publishing, 38(1), 1-11.
  • Brahma, M., Karthika, N. J., Singh, A., Adiga, D., Bhate, S., Ramakrishnan, G., Saluja, R., & Desarkar, M. S. (2025). MorphTok: Morphologically Grounded Tokenization for Indian Languages. arXiv preprint arXiv:2504.10335. https://doi.org/10.48550/arXiv.2504.10335
  • Pattnayak, P., Patel, H. L., & Agarwal, A. (2025). Tokenization Matters: Improving Zero-Shot NER for Indic Languages. arXiv preprint arXiv:2504.16977. https://doi.org/10.48550/arXiv.2504.16977
  • Raj, B. S., Suri, G., Dewangan, V., & Sonavane, R. (2024). When Every Token Counts: Optimal Segmentation for Low-Resource Language Models. arXiv preprint arXiv:2412.06926. https://doi.org/10.48550/arXiv.2412.06926
  • Aida, T., & Bollegala, D. (2025). Investigating the Contextualised Word Embedding Dimensions Specified for Contextual and Temporal Semantic Changes. In Proceedings of the 31st International Conference on Computational Linguistics (pp. 1413–1437). Association for Computational Linguistics. https://doi.org/10.48550/arXiv.2407.02820
  • Worth, P. J. (2023). Word Embeddings and Semantic Spaces in Natural Language Processing. International Journal of Intelligence Science, 13(1), 1–21. https://doi.org/10.4236/ijis.2023.131001
  • Palominos, C., He, R., Fröhlich, K., Mülfarth, R. R., Seuffert, S., Sommer, I. E., Homan, P., Kircher, T., Stein, F & Hinzen, W. (2024). Approximating the semantic space: word embedding techniques in psychiatric speech analysis. Schizophrenia, 10(1), 1-10.,
  • Worth, P. J. (2023). Word Embeddings and Semantic Spaces in Natural Language Processing. International Journal of Intelligence Science, 13(1), 1–21. https://doi.org/10.4236/ijis.2023.131001
  • Zhou, J., Liu, C., Duan, N., & Li, M. (2022). An Overview of Pretrained Language Models for Natural Language Processing. AI Open, 3, 9–28. https://doi.org/10.1016/j.aiopen.2021.12.001
  • OpenAI. (2023). GPT-4 Technical Report. https://doi.org/10.48550/arXiv.2303.08774
  • Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2023). Language models are few-shot learners. Communications of the ACM, 66(5), 108–117. https://doi.org/10.1145/3571991
  • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019. https://doi.org/10.48550/arXiv.1810.04805
  • Wang, A., Zhang, Y., Liu, J., & Bowman, S. R. (2023). Evaluating Pretrained Transformers for Natural Language Understanding. Transactions of the Association for Computational Linguistics, 11, 245–261. https://doi.org/10.1162/tacl_a_00559
  • Oancea, B. (2025). Text classification using machine learning methods. arXiv preprint arXiv:2502.19801. https://doi.org/10.48550/arXiv.2502.19801
  • Abia, V. M., & Johnson, E. H. (2024). Sentiment Analysis Techniques: A Comparative Study of Logistic Regression, Random Forest, and Naive Bayes on General English and Nigerian Texts. Journal of Engineering Research and Reports, 26(9), 123–135. https://doi.org/10.9734/jerr/2024/v26i91268
  • Shijaku, E., & Canhasi, E. (2024). Classification of human- and AI-generated texts for different languages and domains. International Journal of Speech Technology. https://doi.org/10.1007/s10772-024-10143-3
  • Sanchez-Medina, J. J. (2024). Sentiment analysis and random forest to classify LLM versus human source applied to Scientific Texts. arXiv preprint arXiv:2404.08673. https://doi.org/10.48550/arXiv.2404.08673
  • Makinde, H. S., Makinde, A. I., Usman, M. A., Adegoke, H., Makinde-Isola, B. A., Lawal, W., & Jimoh, I. T. The Readability Paradox: Can We Trust Decisions on AI Detectors? Technium Education and Humanities, 11, 181-195.
  • Krawczyk, N., Probierz, B., & Kozak, J. (2024). Towards AI-Generated Essay Classification Using Numerical Text Representation. Applied Sciences, 14(21), 1-23.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Predictive Patterns AI-generated essays DeepSeek ChatGPT Gemini Machine learning Analyzing word frequency

Powered by PhDFocusTM