Proactive Auto-Scaling of Kubernetes Services Based on Machine Learning

A.S. Pazinin
ORCID: https://orcid.org/0009-0002-9506-953https://orcid.org/0009-0002-9506-953

Èlektron. model. 2026, 48(2):69-86

ABSTRACT

To optimize the use of computing resources, reactive and proactive approaches to service autoscaling in Kubernetes are considered: the standard reactive Horizontal Pod Autoscaler (HPA) and a proactive autoscaler based on machine learning using LSTM. A controller has been developed and proposed that collects CPU metrics from Prometheus, trains and updates the model, predicts short-term load dynamics, and adjusts the number of replicas via the Kubernetes API. Metrics for predictions and decisions are sent to Pushgateway and visualized in Grafana. Experimental studies in an Azure Kubernetes Service cluster with controlled container load showed a 30 % reduction in total vCPU usage compared to HPA while maintaining the same service level, reducing scaling latency (scaling up in 30-60 s versus 75-90 s; shrink time of 60-90 s versus 90-150 s) and elimination of “jitter.” The results confirm the effectiveness of applying proactive Kubernetes service autoscaling based on machine learning methods for services with stable or seasonal traffic patterns.

Full text: PDF

KEYWORDS

Kubernetes, autoscaling, HPA, LSTM, Prometheus, Pushgateway, Grafana.

REFERENCES

Horizontal Pod autoscaling. (б. д.). Kubernetes. https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
Lorido-Botran, T., Miguel-Alonso, J., & Lozano, J.A. (2014). A review of auto-scaling techniques for elastic applications in cloud environments. Journal of Grid Computing, 12(4), 559- https://doi.org/10.1007/s10723-014-9314-7
Tesauro, G., Das, R., Chan, H., Kephart, J., Levine, D., Rawson, F., & Lefurgy, C. (2007). Managing power consumption and performance of computing systems using reinforcement learning. Advances in Neural Information Processing Systems, 1497- https://papers.nips.cc/paper/3251-managing-power-consumption-and-performance-of-computing-systems-using-reinforcement-learning?utm_source=chatgpt.com
Horizontal Pod Autoscaler walkthrough. (б. д.). Kubernetes. https://kubernetes.io/docs/ tasks/run-application/horizontal-pod-autoscale-walkthrough/
Greff, K., Srivastava, R., Koutník, J., Steunebrink, B., & Schmidhuber, J. (2017). LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems., 28(10), 2222- https://doi.org/10.1109/TNNLS.2016.2582924
Dang-Quang, N.-M., & Yoo, M. (2021). Deep learning-based autoscaling using bidirectional long short-term memory for kubernetes. Applied Sciences., 11(9), Стаття 3835. https://www.mdpi.com/2076-3417/11/9/3835
Imdoukh, M., Ahmad, I., & Alfailakawi, M. (2020). Machine learning-based auto-scaling for containerized applications. Neural Computing and Applications., (32), 9745- https://link.springer.com/article/10.1007/s00521-019-04507-z
Rolik, O., & Volkov, V. (2024). Method of horizontal pod scaling in kubernetes to omit overregulation. Information, Computing and Intelligent Systems (Journal Abbreviation: Inf. Comput. And Intell. Syst. J.), (5), 55- https://doi.org/10.20535/2786-8729.5.2024.315877
Boyarchuk, S., & Tyshchenko, I. (2025). ARIMA and LSTM time series forecasting models in economics and finance. Computer Design Systems. Theory and Practice, 7(1), 172-180. https://doi.org/10.23939/cds2025.01.172
What is azure kubernetes service (AKS)? — azure kubernetes service. (б. д.). Microsoft Learn: Build skills that open doors in your career. https://learn.microsoft.com/azure/aks/what-is-aks
Data model | Prometheus. (б. д.). Prometheus — Monitoring system & time series database. https://prometheus.io/docs/concepts/data_model/
Metrics for kubernetes object states. (б. д.). Kubernetes. https://kubernetes.io/docs/concepts/cluster-administration/kube-state-metrics/
Pushing metrics | Prometheus. (б. д.). Prometheus — Monitoring system & time series database. https://prometheus.io/docs/instrumenting/pushing/
Time series | Grafana documentation. (б. д.). Grafana Labs. https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/time-series/
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735- https://doi.org/10.1162/neco.1997.9.8.1735
Fedoryshyn, B., & Krasko, O. (2024). Migration of services in a Kubernetes cluster based on load forecasting. Information and Communication Technologies and Electronic Engineering, 4(2), 82-92. https://doi.org/10.23939/ictee2024.02.082
Majevsky, Ya., & Pravorska, N. (2022). Increasing the efficiency of microservices scaling automation in the Kubernetes containerized application management system. Bulletin of the Khmelnytskyi National University. Series: Technical Sciences, 313(5), 260-264. https://doi.org/10.31891/2307-5732-2022-313-5-260-264
Islam, S., Keung, J., Lee, K., & Liu, A. (2012). Empirical prediction models for adaptive resource provisioning in the cloud. Future Generation Computer Systems, (1), 155- https://doi.org/10.1016/j.future.2011.05.027
Resource metrics pipeline. (б. д.). Kubernetes. https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/
Gutman, D., & Sirota, O. (2023). Proactive automatic scaling up for Kubernetes. Adaptive Automatic Control Systems, 1(42), 32-38. https://doi.org/10.20535/1560-8956.42.2023.278925