Site Reliability Engineer (SRE) Posted Mar 14
Synergis , Atlanta, GA
Synergis is seeking a Site Reliability Engineer (SRE). As a member of our Reliability Engineering team, you will be responsible for scaling some of the largest software products in Retail by automating the application infrastructure, deployment, and monitoring of those products in production. You will also be part of a 24×7 on-call team that will lead the triage of incidents for your products using your expertise to mitigate the problem as soon as possible. Our own what you build mentality empowers you to make decisions quickly to deliver reliability improvements without the red tape that typically surrounds enterprise environments.


70% - Delivery & Execution:

Collaborates and pairs with other product team members (UX, engineering, and product management) to create secure, reliable, scalable software solutions

Documents, reviews and ensures that all quality and change control standards are met

Works with Product Team to ensure user stories that are developer-ready, easy to understand, and testable

Writes custom code or scripts to automate infrastructure, monitoring services, and test cases

Writes custom code or scripts to do destructive testing to ensure adequate resiliency in production

Configures commercial off the shelf solutions to align with evolving business needs

Creates meaningful dashboards, logging, alerting, and responses to ensure that issues are captured and addressed proactively

20% - Support & Enablement:

Fields questions from other product teams or support teams

Monitors tools and participates in conversations to encourage collaboration across product teams

Provides application support for software running in production

Proactively monitors production Service Level Objectives for products

Proactively reviews the performance and capacity of all aspects of production: code, infrastructure, data, and message processing

10% - Learning:

Participates in learning activities around modern software design and development core practices (communities of practice)

Proactively views articles, tutorials, and videos to learn about new technologies and best practices being used within other technology organizations


Typically reports to the Software Engineer Manager or Sr. Manager.

Preferred Qualifications:

Proficient in production monitoring concepts and implementation including synthetic, real user, application performance, system, log, time-series, and dashboarding. Includes tools like appdynamics, dynatrace, newrelic, splunk, grafana, ELK, etc

Proficient in production systems design including High Availability, Disaster Recovery, Performance, Efficiency, and Security

Proficient in a modern Scripting language (preferably python)

Proficient in a modern infrastructure automation toolkit such as Puppet or Chef

Proficient in a Linux or Unix based environment

Deep understanding of modern microservice based architectures and operations

Experience in destructive testing methodologies and tools such as chaos monkey

Experience in CI/CD automation

Experience in a version control systems such as Git or SVN

Experience in a cloud computing platform and the associated automation patterns it provides

Experience in defensive coding practices and patterns for high-availability

Exposure to a modern objected oriented programming language (preferably Java)

Please send resumes to Randall Layman (see below)

Click here to apply - Please mention that you saw the job on Popular Science