Background: I work for a global high technology leader investing in digital and "deep tech" innovations – connectivity, big data, artificial intelligence, cybersecurity, and quantum technology. The company provides solutions, services, and products that help its customers – businesses, organizations, and states – in the defense, aeronautics, space, transportation, and digital identity and security markets.
Jenkins is, of course, used by us to manage CI/CD in our software factories. But the story I'll tell you is quite different because it's about production event management. Operators use ansible to build and deploy infrastructure and/or middleware. They have hundreds of playbooks and roles. The first challenge we faced was providing playbook access to developers working on virtual machines and not with containers. The second challenge is how to use playbooks to automatically deal with production events.
Goals: Automatically manage production events for our customers.
Solution & Results: Because Jenkins is a good orchestrator, it can be used for things other than just CI/CD. One way to help developers and operators work together better is to share the same tools. Developers have already used Jenkins as a CI/CD orchestrator. That's the reason why we invited operators to use this tool as a production orchestrator.
Orchestration is needed in CI/CD and in production, as well. This project rocks because Jenkins is the best tool for orchestrating things.
We first ask them to publish their automation scripts (mostly Ansible playbooks) through a Jenkinsfile and share this job with developers. This way, developers can use exactly the same Ansible playbooks and Jenkinsfile as the operators in their own private dev or integration environment. The good thing is, it also drastically reduced the gap between dev/integration environment and production.
Because of this success, operators choose to extend the Jenkins usage to manage the production environment. Because all their infrastructure automation scripts were accessible from Jenkins (Building and configuring a server), operators decided to use these Jenkins jobs to manage production events.
Observability in production is made by Prometheus, and we can easily create observability rules that trigger a Webhook event. By combining these, we could easily implement a full self-healing stack. For instance, because the deployment of a reverse proxy Nginx is automated via Ansible and can be triggered as a Jenkins job, it was easy to trigger the deployment of a new virtual machine with an Nginx reverse proxy in case of an incident or even a load increase.
Top results we experienced with Jenkins: