Systems and Means of Informatics
2026, Volume 36, Issue 2, pp 116-132
AN ALGORITHM FOR GENERATING SYNTHETIC DATA FOR TECHNICAL VISION SYSTEMS BASED ON STACKING OF GENERATIVE-ADVERSARIAL AND DIFFUSION MODELS
Abstract
The paper addresses the problem of generating synthetic images for computer vision systems under limited availability of representative real-world datasets. A hybrid algorithm based on stacking generative adversarial and diffusion models is proposed. The key contribution is the modification of the Diffusion-GAN architecture, in which the forward diffusion process is replaced by the mechanism from Stable Diffusion, combining the computational efficiency of diffusion models with the training stability of adversarial approaches. The algorithm implements a three-stage pipeline: training the modified generative model, generating synthetic images, and postprocessing to improve visual quality. Experimental validation was performed on the vehicle detection and classification task using the Vehicle Classification SGCUM dataset. The results demonstrate that the YOLOv8 model trained exclusively on synthetic data achieves accuracy metrics comparable to those of a model trained on real data, confirming the suitability of the generated data for training deep neural networks.
[+] References (23)
- Rocco, C. 2021. Synthetic dataset creation for computer vision application: Pipeline proposal. Curitiba: Pontifical Catholic University of Parana. Master Thesis. 102 p. doi: 10.13140/RG.2.2.12115.25126.
- Synthetic data. Available at: https://edps.europa.eu/press-publications/publications/ techsonar/synthetic-data (accessed May 11, 2026).
- Geograkis, G., A. Mousavian, A. C. Berg, and J. Kosecka. 2017. Synthesizing training data for object detection in indoor scenes. 13th Conference "Robotics: Science and Systems" Proceedings. Art. 043. 9 p. doi: 10.15607/RSS.2017.XIIL043.
- Kar, A., A. Prakash, M.-Y. Liu, et al. 2019. Meta-Sim: Learningto generate synthetic datasets. Cornell University. 14 p. Available at: https://arxiv.org/pdf/1904.11621 (accessed May 11, 2026).
- Wong, Z. M., K. Kunii, M. Baylis, W. H. Ong, P. Kroupa, and S. Koller. 2019. Synthetic dataset generation for object-to-model deep learning in industrial applications. PeerJ Computer Science 5(9):e222. 18 p. doi: 10.7717/peerj-cs.222.
- Nikolenko, S. 2019. Synthetic data for deep learning. Cornell University. 156 p. Available at: https://arxiv.org/pdf/1909.11512v1 (accessed May 11, 2026).
- Bubenlcek, T. 2020. Using game engine to generate synthetic datasets for machine learning. 23rd Central European Seminar on Computer Graphics Proceedings. 5 p. Available at: https://cescg.org/cescg_submission/using-game-engine-to-generate- synthetic-datasets-for-machine-learning/ (accessed May 11, 2026).
- Boikov, A., V. Payor, R. Savelev, and A. Kolesnikov. 2021. Synthetic data generation for steel defect detection and classification using deep learning. Symmetry 13(7): 1176. 10 p. doi: 10.3390/sym13071176.
- Borkman, S., A. Crespi, S. Dhakad, et al. 2021. Unity perception: Generate synthetic data for computer vision. Cornell University. 13 p. Available at: https:// arxiv.org/pdf/2107.04259v2 (accessed May 11, 2026).
- Ebadi, S. E., Y.-C. Jhang, A. Zook, et al. 2021. PeopleSansPeople: A synthetic data generator for human-centric computer vision. Cornell University. 29 p. Available at: https://arxiv.org/pdf/2112.09290 (accessed May 11, 2026).
- Kiefer, B., D. Ott, and A. Zell. 2021. Leveraging synthetic data in object detection on unmanned aerial vehicles. Cornell University. 8 p. Available at: https:// arxiv.org/pdf/2112.12252v1 (accessed May 11, 2026).
- Tsirikoglou, A. 2022. Synthetic data for visual machine learning: A data-centric approach. Linkoping: Linkoping University Electronic Press. D.Sc. Diss. 115 p. doi: 10.3384/9789179291754.
- Prakash, A., S. Debnath, J.-F. Lafleche, et al. 2020. Self-supervisedreal-to-sim scene generation. Cornell University. 22 p. Available at: https://arxiv.org/pdf/2011.14488 (accessed May 11, 2026).
- Thambawita, V., P. Salehi, S.A. Sheshkal, et al. 2022. SinGAN-Seg: Synthetic training data generation for medical image segmentation. PLOS ONE 17(5):e0267976. 24 p. doi: 10.1371 /journal.pone.0267976.
- He, R., S. Sun, X. Yu, et al. 2022. Is synthetic data from generative models ready for image recognition? Cornell University. 24 p. Available at: https:// arxiv.org/pdf/2210.07574 (accessed May 11, 2026).
- Voetman, R., M. Aghaei, and K. Dijkstra. 2023. The big data myth: Using diffusion models for dataset generation to train deep detection models. Cornell University. 21 p. Available at: https://arxiv.org/pdf/2306.09762 (accessed May 11, 2026).
- Minchul, K., L. Feng, J. Anil, and L. Xiaoming. 2023. DCFace: Synthetic face generation with dual condition diffusion model. IEEE/CVF Conference on Computer Vision and Pattern Recognition Proceedings. IEEE. U12715U2725. doi: 10.1109/CVPR52729.2023.01223.
- Yingzhou, L., H. Wang, and W. Wei 2023. Machine learning for synthetic data generation: A review. Cornell University. 18 p. Available at: https:// arxiv.org/pdf/2302.04062 (accessed May 11, 2026).
- Bauer, A., S. Trapp, M. Stenger, et al. 2024. Comprehensive exploration of synthetic data generation: A survey. Cornell University. 103 p. Available at: https://arxiv.org/pdf/2401.02524 (accessed May 11, 2026).
- Dubey, A., S. M. Kuriakose, andN. Bhardwaj. 2025. SynGen-Vision: Synthetic data generation for training industrial vision models. Cornell University. 5 p. Available at: https://arxiv.org/ pdf/2509.04894 (accessed May 11, 2026).
- Zhendong, W., Z. Huangjie, H. Pengcheng, C. Weizhu, and Z. Mingyuan. 2023. Diffusion-GAN: Training GANs with diffusion. Cornell University. 26 p. Available at: https://arxiv.org/pdf/2206.02262 (accessed May 11, 2026).
- Rombach, R., A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. 2022. High- resolution image synthesis with latent diffusion models. IEEE/CVF Conference on Computer Vision and Pattern Recognition Proceedings. U10674U0685. doi: 10.1109/ CVPR52688.2022.01042.
- Nikhol, A., and P. Dhariwal. 2021. Improved denoising diffusion probabilistic models. Cornell University. 17 p. Available at: https://arxiv.org/pdf/2102.09672 (accessed May 11, 2026).
[+] About this article
Title
AN ALGORITHM FOR GENERATING SYNTHETIC DATA FOR TECHNICAL VISION SYSTEMS BASED ON STACKING OF GENERATIVE-ADVERSARIAL AND DIFFUSION MODELS
Journal
Systems and Means of Informatics
Volume 36, Issue 2, pp 116-132
Cover Date
2026-06-05
DOI
10.14357/08696527260207
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
deep neural networks; generative-adversarial models; diffusion models; synthetic data; technical vision systems
Authors
I. S. Reutov
Author Affiliations
 Department of Mathematics and Computer Science, Cherepovets State University, 5 Lunacharskogo Prosp., Cherepovets 162602, Russian Federation
|