Classic models vs. Transformers in detecting generated text

Main Article Content

César Espin-Riofrio
Verónica Mendoza-Morán
Oswaldo Vergara-Bello
Leonardo Bazurto-Velasco
Yuan Guim-Echeverria

Abstract

The sophistication of generative language models poses significant challenges for automatic detection of artificially generated content in academic and educational contexts. This research systematically compares traditional models (SVM, Random Forest, MLP, XGBoost, Voting Classifier) and Transformer architectures (BERT, DeBERTa, RoBERTa) employing phraseological, syntactic, semantic features, style vectors, and TF-IDF representations on the PAN 2025 English dataset. The impact of the complete feature set (186) versus optimal selection through feature selection (33 attributes) was evaluated. The Voting Classifier with feature selection achieved the best performance (F1-score: 0.992101, accuracy: 0.991930), surpassing DeBERTa by 3.2 percentage points (F1-score: 0.959864). Results demonstrate that feature engineering combined with traditional ensembles can outperform deep architectures while maintaining interpretability and efficiency, contributing to the development of robust tools for academic integrity and automated text verification.

Downloads

Download data is not yet available.

Article Details

Section

Artículos

How to Cite

Espin-Riofrio, C., Mendoza-Morán, V., Vergara-Bello, O., Bazurto-Velasco, L., & Guim-Echeverria, Y. (2026). Classic models vs. Transformers in detecting generated text. Scientific Journal Science and Method, 4(1), 460-476. https://doi.org/10.55813/gaea/rcym/v4/n1/163

References

Ardeshirifar, R. (2025). Comparing hand-crafted and deep learning approaches for detecting AI-generated text: Performance, generalization, and linguistic insights. AI and Ethics, 5, 4197–4209. https://doi.org/10.1007/s43681-025-00699-4 DOI: https://doi.org/10.1007/s43681-025-00699-4

Bafna, J., Mittal, H., Sethia, S., Shrivastava, M., & Mamidi, R. (2024). Mast Kalandar at SemEval-2024 Task 8: On the trail of textual origins: RoBERTa-BiLSTM approach to detect AI-generated text. In A. Kr. Ojha, A. S. Doğruöz, H. Tayyar Madabushi, G. Da San Martino, S. Rosenthal, & A. Rosá (Eds.), Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024) (pp. 1627–1633). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.semeval-1.231 DOI: https://doi.org/10.18653/v1/2024.semeval-1.231

Bevendorff, J., Wiegmann, M., Potthast, M., & Stein, B. (2025). PAN’25/26 generative AI detection: Voight-Kampff AI detection sensitivity (Version v1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14962653

Espin-Riofrio, C., Charco, J. L., Preciado-Maila, D. K., Ramos-Ramírez, L., Camacho-Villalva, H., & Montejo-Ráez, A. (2024). Embeddings of initial tokens from BERT-based models to identify human-written or automatically generated text. In M. M. Larrondo Petrie, J. Texier, & R. A. Rivas Matta (Eds.), Sustainable engineering for a diverse, equitable, and inclusive future at the service of education, research, and industry for a society 5.0.: Proceedings of the 22nd LACCEI International Multi-Conference for Engineering, Education and Technology (LACCEI 2024). Fundacion LACCEI. https://doi.org/10.18687/LACCEI2024.1.1.108 DOI: https://doi.org/10.18687/LACCEI2024.1.1.108

Espin-Riofrio, C., Ortiz-Zambrano, J., & Montejo-Ráez, A. (2023). An approach to lexicon filtering for author profiling. Procesamiento del Lenguaje Natural, 71, 75–86. https://doi.org/10.26342/2023-71-6

Espin-Riofrio, C., Ortiz-Zambrano, J., & Montejo-Ráez, A. (2024). SINAI at IberAuTexTification in IberLEF 2024: Perplexity metrics and text features for classifying automatically generated text. In S. M. Jiménez-Zafra, L. Chiruzzo, F. Rangel, F. Balouchzahi, U. B. Corrêa, A. Bonet Jover, H. Gómez-Adorno, J. Á. González Barba, D. I. Hernández Farías, A. Montejo Ráez, P. Moral, C. Rodríguez Abellán, M. E. Vallecillo Rodríguez, M. Taulé, & R. Valencia-García (Eds.), Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2024) (CEUR Workshop Proceedings, Vol. 3756). CEUR-WS.org. https://ceur-ws.org/Vol-3756/IberAuTexTification2024_paper1.pdf

Gaggar, R., Bhagchandani, A., & Oza, H. (2023). Machine-generated text detection using deep learning [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2311.15425

Guggilla, C., Roy, B., Chavan, T. R., Rahman, A., & Bowen, E. (2025). AI generated text detection using instruction fine-tuned large language and transformer-based models [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2507.05157

Ippolito, D., Duckworth, D., Callison-Burch, C., & Eck, D. (2020). Automatic detection of generated text is easiest when humans are fooled. In D. Jurafsky, J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 1808–1822). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.164 DOI: https://doi.org/10.18653/v1/2020.acl-main.164

Mitchell, E., Lee, Y., Khazatsky, A., Manning, C. D., & Finn, C. (2023). DetectGPT: Zero-shot machine-generated text detection using probability curvature. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, & J. Scarlett (Eds.), Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202, pp. 24950–24962). PMLR. https://proceedings.mlr.press/v202/mitchell23a.html

Najjar, A. A., Ashqar, H. I., Darwish, O. A., & Hammad, E. (2025). Detecting AI-generated text in educational content: Leveraging machine learning and explainable AI for academic integrity [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2501.03203

Preda, A.-A., Cercel, D.-C., Rebedea, T., & Chiru, C.-G. (2023). UPB at IberLEF-2023 AuTexTification: Detection of machine-generated text using transformer ensembles. In M. Montes-y-Gómez, F. Rangel, S. M. Jiménez-Zafra, M. Casavantes, B. Altuna, M. Á. Álvarez-Carmona, G. Bel-Enguix, L. Chiruzzo, I. de la Iglesia, H. J. Escalante, M. Á. García-Cumbreras, J. A. García-Díaz, J. Á. González Barba, R. Labadie Tamayo, S. Lima, P. Moral, & F. M. Plaza del Arco (Eds.), Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2023) (CEUR Workshop Proceedings, Vol. 3496). CEUR-WS.org. https://ceur-ws.org/Vol-3496/autextification-paper19.pdf

Prova, N. (2024). Detecting AI generated text based on NLP and machine learning approaches [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2404.10032

Sani, B., Soy, A., Hafiz Imam, S., Mustapha, A., Aliyu, L. J., Abdulmumin, I., Ahmad, I. S., & Muhammad, S. H. (n.d.). Who Wrote This? Identifying Machine vs Human-Generated Text in Hausa. Retrieved July 8, 2025, from https://github.com/TheBangis/hausa_corpus DOI: https://doi.org/10.18653/v1/2025.africanlp-1.12

Shah, A., Ranka, P., Dedhia, U., Prasad, S., Muni, S., & Bhowmick, K. (2023). Detecting and unmasking AI-generated texts through explainable artificial intelligence using stylistic features. International Journal of Advanced Computer Science and Applications, 14(10). https://doi.org/10.14569/IJACSA.2023.01410110 DOI: https://doi.org/10.14569/IJACSA.2023.01410110

StyleDistance. (s. f.). styledistance [Model]. Hugging Face. Recuperado el 7 de julio de 2025, de https://huggingface.co/StyleDistance/styledistance

Uchendu, A., Lee, J., Shen, H., Le, T., Huang, T.-H. K., & Lee, D. (2023). Does human collaboration enhance the accuracy of identifying LLM-generated deepfake texts? Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 11(1), 163–174. https://doi.org/10.1609/hcomp.v11i1.27557 DOI: https://doi.org/10.1609/hcomp.v11i1.27557

Wang, Y., Mansurov, J., Ivanov, P., Su, J., Shelmanov, A., Tsvigun, A., Whitehouse, C., Mohammed Afzal, O., Mahmoud, T., Sasaki, T., Arnold, T., Aji, A. F., Habash, N., Gurevych, I., & Nakov, P. (2024). M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection. In Y. Graham & M. Purver (Eds.), Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1369–1407). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.eacl-long.83 DOI: https://doi.org/10.18653/v1/2024.eacl-long.83

Wang, Z., Cheng, J., Cui, C., & Yu, C. (2023). Implementing BERT and fine-tuned RobertA to detect AI generated news by ChatGPT [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2306.07401

Yan Wu, L., & Segura-Bedmar, I. (2025). AI-generated text detection with a GLTR-based approach [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2502.12064

Yu, P., Chen, J., Feng, X., & Xia, Z. (2025). CHEAT: A large-scale dataset for detecting ChatGPT-written abstracts. IEEE Transactions on Big Data, 11(3), 898–906. https://doi.org/10.1109/TBDATA.2025.3536929 DOI: https://doi.org/10.1109/TBDATA.2025.3536929

Zhong, W., Tang, D., Xu, Z., Wang, R., Duan, N., Zhou, M., Wang, J., & Yin, J. (2020). Neural deepfake detection with factual structure of text. In B. Webber, T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 2461–2470). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.193 DOI: https://doi.org/10.18653/v1/2020.emnlp-main.193