Copello Global are looking for a Site Reliability Engineer(Software Engineer) to work with our client in Warwickshire who support critical infrastructure for some of the UK’s incident response organisations.
The business are specialists in data, voice, video and analytics and, as such, the services and platform they provide needs to be kept continually operational and working. This role is designed to ensure this is the case.
The platform enables those using it to make effective, and often critical, decisions in key moments. Decisions that can dramatically affect the outcomes of incidents.
As Site Reliability Engineer you will:
- Get involved in exciting technical challenges for overall health [Includes monitor of infrastructure, network and application] , Analysing, troubleshooting, Designing vital services, platforms and infrastructure of a key communications solution.
- Oversee site reliability through maintenance of hardware & software processes, network facilities, controls and security systems.
- Responsible for effective utilization of predictive and other non-destructive methodologies designed to identify and isolate inherent reliability problems.
- As a team, ensure 24/7 uptime and availability of the system.
- Monitor and respond to notifications and alerts.
- Conduct system analysis, configuration management and develops improvements for system.
- Software performance, availability and reliability.
- Able to troubleshoot complicated, cross platform issues handling OS, Networking, Database in a cloud-based SaaS environment and handle live production incidents, debug/troubleshoot application and infrastructure issues, follow and implement SRE best practices.
- Work closely with software engineers and testers to ensure the system is responding properly to no-functional requirements such as performance, security, and availability
- Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it
- Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually.
- Strong project management capability.
- Should have experience with ticketing tools like ServiceDesk, Jira.etc
- Ability to provide advice, best practices and recommendations for the operation and deployment of Microsoft Azure.
- Familiarity with Linux and UNIX systems (e.g. CentOS, RedHat) and command line system administration such as Bash, VIM, SSH.
- Demonstrable experience in Containerization-Docker ,orchestration (Kubernetes) and Microservices.
- Network routing, Load balancing and Networking protocols, a base knowledge of TCP/IP, with an understanding of HTTP and DNS
- Basic programming and scripting skills (preferably bash, shell, perl, python, java etc.,).
Non-essential but would be useful to have
- Demonstrated understanding of SRE & Agile methodologies, ITIL methodologies, ITIL v3 or v4 certification
- Azure Dev-ops Certifications.
- Demonstrable experience in CI/CD tools.
- Demonstrable experience Database like SQL,Hadoop,CouchBase,Gridgain..etc
- Hands-on experience in configuration management of server farms (using tools such as Puppet, Chef, Ansible, etc.,).
Upload your CV
Send us your CV today and let us find you the perfect job
We want to hear from you…
If you are looking for a preferred recruitment company to work with to find you your next opportunity or to source prime talent in the market, get in touch!