You are a Site Reliability Engineer with a real passion for building robust, scalable, automated systems. Your team can count on you to deliver creative and inventive solutions to hard problems in distributed, highly-available environments. You’re experienced working remotely as part of a globally-distributed team. You live and breath DevOps values and why it is more than just tools or a job title. You embrace the idea of “immutable infrastructure” when you design systems. You’re comfortable working with developers, senior leadership, and non-technical individuals to help deliver value to the larger organization. You take opportunities to fix problems, mentor less senior individuals, and step outside your comfort zone to develop your own skill set. You hold yourself and others in the team to a high bar of quality when it comes to working with our production environments.
Apptio TechOps Engineering Services team is a group of folks who continuously challenge themselves to build robust and reliable services to enhance our internal engineering capabilities.
Our mission is to deliver reliable, scalable, simplified solutions through guidance, visibility, and robust automation fabric.
What we want you to do:
We are looking for a talented Site Reliability Engineer to join our team to help us design and build the next generation platform that will support Apptio’s production services.
In this role, you will contribute during EMEA business hours as well as working on longer-term projects.
Our team is responsible for the reliability of components in different levels of the stack, operating systems, monitoring, metrics, service discovery, schedulers, and logging are just a few of them.
You may be a good fit if you have:
Demonstrated experience in a large-scale, distributed Linux/Unix environment
Knowledge of configuration management tools (i.e., Puppet)
Demonstrated experience with high-level programming languages such as Python, Ruby, or Go
Familiarity with RESTful systems and their APIs. Be very comfortable with JSON.
Experience with cloud providers such as AWS, Azure, or Google Cloud Platform
Experience in identifying and resolving high-severity, time-sensitive issues and outages in a customer-facing environment
Metrics, metrics, metrics. Have a deep understanding of the importance of observability what to measure, when to measure, and how to measure it
This position requires that you are a resident of Denmark, France, Germany, Italy, Netherlands, Sweden, Spain, or the UK. We would also accept those willing to relocate but cannot provide visa sponsorship for this role.
Experience with and/or serious interest to learn one or more of the following is very valuable:
2+ years of senior-level responsibility in a RedHat/CentOS based Linux environment
High-level understanding of container/workload scheduling systems like Mesos or Kubernetes
Knowledge of database technologies. RDBMS like MySQL is a definite plus. NoSQL experience is nice to have as well.
Experience with using CI/CD (Continuous Integration/Continuous Delivery) pipeline architecture
Familiarity and/or experience deploying serverless architectures
Familiarity with Infrastructure as Code tools, such as Terraform or Cloud formation
Monitoring tools such as Sensu, Splunk or Grafana, experience with Prometheus is definitely a plus
Experience working remotely