Semantics-Preserving Data Migration from Relational to Non-Relational Databases Based on Source Code Analysis

V.R. Gerasimov, V.V. Dusheba

Èlektron. model. 2026, 48(3):53-65

ABSTRACT

A method for semantics-preserving data migration from relational to non-relational databases that store information in JSON or similar formats, based on intelligent analysis of the applicationʼs source code is proposed. Unlike schema-oriented approaches, the method employs a semantics-preserving transformation implemented using large language models to identify hidden aggregate boundaries and access patterns at the ORM metadata level, provided that the data structure does not meet the requirements of third normal form (3NF) or exhibits redundancy. The method allows for the deterministic selection of embedding or referencing strategies, based on model-driven engineering with formal verification of data consistency. Experiments have shown that taking application logic into account allows for maintaining data consistency within a specific domain and increasing query execution speed by 60-85% compared to insufficiently normalized data in relational databases. The method covers complex stages of structural refactoring during the transition to document-oriented data models.

Full text: PDF

KEYWORDS

semantics-preserving data migration; source code analysis; non-relational databases; model-driven engineering; large language models; object relation mapping.

REFERENCES

  1. Sadalage, P.J., & Fowler, M. (2012). NoSQL distilled: A brief guide to the emerging world of polyglot persistence. Addison-Wesley. ISBN: 978-0321826626
  2. Akoka, J., & Comyn-Wattiau, I. (2018). Roundtrip engineering of NoSQL databases. Enterprise Modelling and Information Systems Architectures (EMISAJ), 13. https://doi.org/18417/emisa.si.hcm.22
  3. El Alami, A., Khourdifi, Y., Ait El Mouden, Z., Lahmer, M., & Hasnaoui, M.L. (2024). Migrating Relational Databases to NoSQL-Oriented Documents Using Object-Oriented Concepts. International Journal of Intelligent Engineering and Systems, 17(4). https://doi.org/22266/ijies2024.0831.48
  4. Chebotko, A., Kashlev, A., & Lu, S. (2015). A Big Data Modeling Methodology for Apache Cassandra. In Proceedings of the IEEE International Congress on Big Data (BigData Congress 2015) (pp. 238-245). IEEE. https://doi.org/10.1109/BigDataCongress.2015.41
  5. Gerasymov, V.R., & Dusheba, V.V. (2024). Analysis of Optimizing Database Performance Methods. Electronic Modeling, 46(6), 43-54. https://doi.org/10.15407/emodel.46.06.043
  6. Brambilla, M., Cabot, J., & Wimmer, M. (2017). Model-Driven Software Engineering in Practice (2nd ed.). Springer. https://doi.org/10.1007/978-3-031-02549-5
  7. Hou, X., Zhao, Y., Liu, Y., et al. (2024). Large Language Models for Software Engineering: A Systematic Literature Review. ACM Transactions on Software Engineering and Metho-dology, 33(8), Article 220. https://doi.org/10.1145/3695988
  8. Busch, D., Bainczyk, A., Smyth, S., & Steffen, B. (2025). LLM-based code generation and system migration in language-driven engineering. International Journal on Software Tools for Technology Transfer, 27(1), 137-147. https://doi.org/10.1007/s10009-025-00798-x
  9. MongoDB Documentation. Data Modeling Introduction. URL: https://www.mongodb.com/docs/manual/core/data-modeling-introduction/ (date of access: 15.11.2025).
  10. Amazon Web Services. SQL to NoSQL: Planning Your Migration. URL: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.html (date of access: 11.01.2026).
  11. Linares-Vásquez, M., Li, B., Vendome, C., & Poshyvanyk, D. (2016). Documenting Database Usages and Schema Constraints in Database-Centric Applications. In Proceedings of the 25th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2016) (pp. 270-281). ACM. https://doi.org/10.1145/2931037.2931072
  12. Chen, L., Davoudian, A., & Liu, M. (2022). A workload-driven method for designing aggregate-oriented NoSQL database schemas. Data & Knowledge Engineering, 142, 102089. https://doi.org/10.1016/j.datak.2022.102089
  13. Li, Y., Cuadrado, J.S., Guerra, E., & de Lara, J. (2021). A Unified Metamodel for NoSQL and Relational Databases. In Proceedings of the 24th ACM/IEEE International Conference on Model Driven Engineering Languages and Systems (MODELS 2021 Companion) (pp. 165-174). IEEE/ACM. https://doi.org/10.1109/MODELS-C53483.2021.00031
  14. Mihalcea, V. (2024). High-Performance Java Persistence. Hypersistence SRL. ISBN 978-9730228236
  15. Buss, C., Safari, M., Termehchy, A., Lee, S., & Maier, D. (2025). Towards Scalable Schema Mapping using Large Language Models. In Proceedings of the 4th International Workshop on Composable Data Management Systems (MIDAS 2025). https://doi.org/10.1145/3737412.3743490

Received 10.02.2026;
received a review 17.02.2026