AI-Driven Optimization System for Large-Scale Kubernetes Clusters: Enhancing Cloud Infrastructure Availability, Security, and Disaster Recovery
Main Article Content
Abstract
This paper presents AI-driven optimization for large Kubernetes clusters, addressing critical cloud availability, security, and disaster recovery issues. The design concept integrates advanced machine learning techniques with Kubernetes' native capabilities to improve cluster management across multiple cloud and edge environments. Key components include data collection and preprocessing, AI/ML models for predictive analytics, a decision engine, and seamless integration with the Kubernetes control plane. The system uses performance metrics, security policy management, and disaster recovery planning to improve resource utilization, threat detection, and powerful assistance. The test results show a 23% improvement in cluster utilization, a 97.8% accuracy in decision-making, and a 78% reduction in safety security time compared to the standard always there. Case studies across the e-commerce, financial services, and IoT industries have confirmed the performance in real-world deployments, showing improvements in the cost of operation, security, and reliability. This research contributes to the evolution of intelligent cloud management, providing solutions for optimizing Kubernetes deployments in complex, distributed environments.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
©2024 All rights reserved by the respective authors and JAIGC