Modelos clásicos vs. Transformers en la detección de texto generado
Contenido principal del artículo
Resumen
La sofisticación de los modelos de lenguaje generativo plantea desafíos significativos para la detección automática de contenido generado artificialmente en contextos académicos y educativos. Esta investigación compara sistemáticamente modelos tradicionales (SVM, Random Forest, MLP, XGBoost, Voting Classifier) y arquitecturas Transformer (BERT, DeBERTa, RoBERTa) empleando características fraseológicas, sintácticas, semánticas, vectores de estilo y representaciones TF-IDF sobre el dataset PAN 2025 en inglés. Se evaluó el impacto del conjunto completo de características (186) versus selección óptima mediante feature selection (33 atributos). El Voting Classifier con feature selection alcanzó el mejor rendimiento (F1-score: 0.992101, accuracy: 0.991930), superando en 3.2 puntos porcentuales a DeBERTa (F1-score: 0.959864). Los resultados demuestran que la ingeniería de características combinada con ensambles tradicionales puede superar arquitecturas profundas manteniendo interpretabilidad y eficiencia, contribuyendo al desarrollo de herramientas robustas para la integridad académica y verificación textual automatizada.
##plugins.themes.bootstrap3.displayStats.downloads##
Detalles del artículo
Sección

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial 4.0.
Cómo citar
Referencias
Ardeshirifar, R. (2025). Comparing hand-crafted and deep learning approaches for detecting AI-generated text: Performance, generalization, and linguistic insights. AI and Ethics, 5, 4197–4209. https://doi.org/10.1007/s43681-025-00699-4 DOI: https://doi.org/10.1007/s43681-025-00699-4
Bafna, J., Mittal, H., Sethia, S., Shrivastava, M., & Mamidi, R. (2024). Mast Kalandar at SemEval-2024 Task 8: On the trail of textual origins: RoBERTa-BiLSTM approach to detect AI-generated text. In A. Kr. Ojha, A. S. Doğruöz, H. Tayyar Madabushi, G. Da San Martino, S. Rosenthal, & A. Rosá (Eds.), Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024) (pp. 1627–1633). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.semeval-1.231 DOI: https://doi.org/10.18653/v1/2024.semeval-1.231
Bevendorff, J., Wiegmann, M., Potthast, M., & Stein, B. (2025). PAN’25/26 generative AI detection: Voight-Kampff AI detection sensitivity (Version v1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14962653
Espin-Riofrio, C., Charco, J. L., Preciado-Maila, D. K., Ramos-Ramírez, L., Camacho-Villalva, H., & Montejo-Ráez, A. (2024). Embeddings of initial tokens from BERT-based models to identify human-written or automatically generated text. In M. M. Larrondo Petrie, J. Texier, & R. A. Rivas Matta (Eds.), Sustainable engineering for a diverse, equitable, and inclusive future at the service of education, research, and industry for a society 5.0.: Proceedings of the 22nd LACCEI International Multi-Conference for Engineering, Education and Technology (LACCEI 2024). Fundacion LACCEI. https://doi.org/10.18687/LACCEI2024.1.1.108 DOI: https://doi.org/10.18687/LACCEI2024.1.1.108
Espin-Riofrio, C., Ortiz-Zambrano, J., & Montejo-Ráez, A. (2023). An approach to lexicon filtering for author profiling. Procesamiento del Lenguaje Natural, 71, 75–86. https://doi.org/10.26342/2023-71-6
Espin-Riofrio, C., Ortiz-Zambrano, J., & Montejo-Ráez, A. (2024). SINAI at IberAuTexTification in IberLEF 2024: Perplexity metrics and text features for classifying automatically generated text. In S. M. Jiménez-Zafra, L. Chiruzzo, F. Rangel, F. Balouchzahi, U. B. Corrêa, A. Bonet Jover, H. Gómez-Adorno, J. Á. González Barba, D. I. Hernández Farías, A. Montejo Ráez, P. Moral, C. Rodríguez Abellán, M. E. Vallecillo Rodríguez, M. Taulé, & R. Valencia-García (Eds.), Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2024) (CEUR Workshop Proceedings, Vol. 3756). CEUR-WS.org. https://ceur-ws.org/Vol-3756/IberAuTexTification2024_paper1.pdf
Gaggar, R., Bhagchandani, A., & Oza, H. (2023). Machine-generated text detection using deep learning [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2311.15425
Guggilla, C., Roy, B., Chavan, T. R., Rahman, A., & Bowen, E. (2025). AI generated text detection using instruction fine-tuned large language and transformer-based models [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2507.05157
Ippolito, D., Duckworth, D., Callison-Burch, C., & Eck, D. (2020). Automatic detection of generated text is easiest when humans are fooled. In D. Jurafsky, J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 1808–1822). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.164 DOI: https://doi.org/10.18653/v1/2020.acl-main.164
Mitchell, E., Lee, Y., Khazatsky, A., Manning, C. D., & Finn, C. (2023). DetectGPT: Zero-shot machine-generated text detection using probability curvature. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, & J. Scarlett (Eds.), Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202, pp. 24950–24962). PMLR. https://proceedings.mlr.press/v202/mitchell23a.html
Najjar, A. A., Ashqar, H. I., Darwish, O. A., & Hammad, E. (2025). Detecting AI-generated text in educational content: Leveraging machine learning and explainable AI for academic integrity [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2501.03203
Preda, A.-A., Cercel, D.-C., Rebedea, T., & Chiru, C.-G. (2023). UPB at IberLEF-2023 AuTexTification: Detection of machine-generated text using transformer ensembles. In M. Montes-y-Gómez, F. Rangel, S. M. Jiménez-Zafra, M. Casavantes, B. Altuna, M. Á. Álvarez-Carmona, G. Bel-Enguix, L. Chiruzzo, I. de la Iglesia, H. J. Escalante, M. Á. García-Cumbreras, J. A. García-Díaz, J. Á. González Barba, R. Labadie Tamayo, S. Lima, P. Moral, & F. M. Plaza del Arco (Eds.), Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2023) (CEUR Workshop Proceedings, Vol. 3496). CEUR-WS.org. https://ceur-ws.org/Vol-3496/autextification-paper19.pdf
Prova, N. (2024). Detecting AI generated text based on NLP and machine learning approaches [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2404.10032
Sani, B., Soy, A., Hafiz Imam, S., Mustapha, A., Aliyu, L. J., Abdulmumin, I., Ahmad, I. S., & Muhammad, S. H. (n.d.). Who Wrote This? Identifying Machine vs Human-Generated Text in Hausa. Retrieved July 8, 2025, from https://github.com/TheBangis/hausa_corpus DOI: https://doi.org/10.18653/v1/2025.africanlp-1.12
Shah, A., Ranka, P., Dedhia, U., Prasad, S., Muni, S., & Bhowmick, K. (2023). Detecting and unmasking AI-generated texts through explainable artificial intelligence using stylistic features. International Journal of Advanced Computer Science and Applications, 14(10). https://doi.org/10.14569/IJACSA.2023.01410110 DOI: https://doi.org/10.14569/IJACSA.2023.01410110
StyleDistance. (s. f.). styledistance [Model]. Hugging Face. Recuperado el 7 de julio de 2025, de https://huggingface.co/StyleDistance/styledistance
Uchendu, A., Lee, J., Shen, H., Le, T., Huang, T.-H. K., & Lee, D. (2023). Does human collaboration enhance the accuracy of identifying LLM-generated deepfake texts? Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 11(1), 163–174. https://doi.org/10.1609/hcomp.v11i1.27557 DOI: https://doi.org/10.1609/hcomp.v11i1.27557
Wang, Y., Mansurov, J., Ivanov, P., Su, J., Shelmanov, A., Tsvigun, A., Whitehouse, C., Mohammed Afzal, O., Mahmoud, T., Sasaki, T., Arnold, T., Aji, A. F., Habash, N., Gurevych, I., & Nakov, P. (2024). M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection. In Y. Graham & M. Purver (Eds.), Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1369–1407). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.eacl-long.83 DOI: https://doi.org/10.18653/v1/2024.eacl-long.83
Wang, Z., Cheng, J., Cui, C., & Yu, C. (2023). Implementing BERT and fine-tuned RobertA to detect AI generated news by ChatGPT [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2306.07401
Yan Wu, L., & Segura-Bedmar, I. (2025). AI-generated text detection with a GLTR-based approach [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2502.12064
Yu, P., Chen, J., Feng, X., & Xia, Z. (2025). CHEAT: A large-scale dataset for detecting ChatGPT-written abstracts. IEEE Transactions on Big Data, 11(3), 898–906. https://doi.org/10.1109/TBDATA.2025.3536929 DOI: https://doi.org/10.1109/TBDATA.2025.3536929
Zhong, W., Tang, D., Xu, Z., Wang, R., Duan, N., Zhou, M., Wang, J., & Yin, J. (2020). Neural deepfake detection with factual structure of text. In B. Webber, T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 2461–2470). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.193 DOI: https://doi.org/10.18653/v1/2020.emnlp-main.193