Site Reliability Engineer-London/Hybrid-Linux(Ubuntu/RHEL)/Puppet/ELK/Zabbix-Media Giant-£80,000+ Bens
Our client, a global media giant is seeking a Site Reliability Engineer to join their Systems Team. The Systems team are responsible for all infrastructure underpinning the various company outlets, the software dependencies of our bespoke applications, the MailOnline deployment pipeline, and various aspects of cyber security.
You will be part of a permanent team of multi-disciplined engineers working alongside various teams, inside and out, to deliver key projects and strategy. The role will report into the Head of Site Reliability Engineering and Infrastructure.
This is a fantastic opportunity to join a fast-paced company and advanced technology department that blends a mixture of hands-on technology support, strategic roadmaps and new technologies.
The candidate will have a proven record in supporting production environments in a very fast-paced company with tech spread across geographic locations and cloud providers. We are looking for someone that is an excellent communicator, methodical, and well-motivated. They will possess the ability to ensure set objectives are delivered within tight timescales where key production services need to always remain stable and efficient. The candidate should have an in-depth, understanding of many infrastructure related technologies. You will be part of a team of various disciplines in not only supporting existing infrastructure and applications but also in introducing Site Reliability practices across the wider business.
Confident in managing virtual platforms and the underpinning services such as Enterprise Storage, SD-Networking.
Confident in managing peripheral infrastructure services such as Backup/Restore, non-native monitoring, Hardware monitoring.
Provide technology insight and support to key management staff and peers.
Have the ability to manage tasks to tight deadlines and manage upwards regular updates
Have a good understanding of VMware, EMC and/or AWS cloud and other IASS solutions.
Be keen to develop scripting capability and API integration in one or more popular languages. (Puppet/ Python/Shell)
Act as a technical escalation point for the applications owners to resolve issues swiftly, and find root cause and mitigate from happening again had too better improve service(s) we deliver.
Understand and work towards a strategy set out by senior management ensuring we adhere to direction and execute tasks based on priority to meet strategy deadlines.
Have the drive to constantly improve and try out new technology offerings to improve Operational efficiency and execution.
Constantly improve functional monitoring and non-functional monitoring of the infrastructure, to head off any issue that might occur.
Being a part of a 24/7 On Call rota.
Be flexible when it comes to out of hours support. You will be required to be on-call evening and weekends as you will form part of an on-call system for escalating to out of core working hours and ask to carry out change controls in agreed business maintenance windows.
Technical Experience Required
A good understanding of Linux OS (Ubuntu/RHEL).
A good understanding of configuration management and automation with Puppet.
Experience using and configuring monitoring technologies, specifically ELK and Zabbix.
Experience building and configuring Elasticsearch clusters at scale.
A good understanding of VMware and virtualisation.
A good understanding of caching both at the CDN layer and data centre tier (Akamai/Varnish).
Programming and scripting experience – Bash, Python, Perl etc
Understanding of high availability, backup, and DR approaches/processes.
Cloud experience with AWS, GCP, and Azure. From compute all the way to scalable automated deployments.
Networking skills – A good understanding of Firewalls, Load balancing, DNS, VLAN, Cloud VPN/VPC.
Exposure to various common technologies such as Redis and RabbitMQ.
Knowledge and experience of working in the publishing/media industry.
Experience in or supporting development in other languages such as Node.js, Clojure, and Java.
Good understanding of Database and Storage technologies (Oracle, MYSQL, SAN FC, NAS, NFS, iSCSI).
The IT team is required to work flexible hours to ensure systems are operational at all times required by the business and be part of an on-call rota.
How do you Apply?
If you are interested in applying for the Site Reliability Engineer role please do so via the link on this page or contact Digital Republic on the phone or email
Contact us to find out more:
Get in contact with Digital Republic by sending a mail to [email protected]
or call the office on 00447823530190. Check out the website on www.digitalrepublictalent.com. You can also find our more on Linkedin, Instagram or Facebook