AI-Driven Optimization System for Large-Scale Kubernetes Clusters: Enhancing Cloud Infrastructure Availability, Security, and Disaster Recovery

Main Article Content

Haoran Li
Jun Sun
Ke Xiong

Abstract

This paper presents AI-driven optimization for large Kubernetes clusters, addressing critical cloud availability, security, and disaster recovery issues. The design concept integrates advanced machine learning techniques with Kubernetes' native capabilities to improve cluster management across multiple cloud and edge environments. Key components include data collection and preprocessing, AI/ML models for predictive analytics, a decision engine, and seamless integration with the Kubernetes control plane. The system uses performance metrics, security policy management, and disaster recovery planning to improve resource utilization, threat detection, and powerful assistance. The test results show a 23% improvement in cluster utilization, a 97.8% accuracy in decision-making, and a 78% reduction in safety security time compared to the standard always there. Case studies across the e-commerce, financial services, and IoT industries have confirmed the performance in real-world deployments, showing improvements in the cost of operation, security, and reliability. This research contributes to the evolution of intelligent cloud management, providing solutions for optimizing Kubernetes deployments in complex, distributed environments.

Article Details

How to Cite
Li , H. ., Sun , J. ., & Xiong , K. (2024). AI-Driven Optimization System for Large-Scale Kubernetes Clusters: Enhancing Cloud Infrastructure Availability, Security, and Disaster Recovery. Journal of Artificial Intelligence General Science (JAIGS) ISSN:3006-4023, 2(1), 281–306. https://doi.org/10.60087/jaigs.v2i1.244
Section
Articles