Join our Talent Network
Skip to main content
We're prepping for scheduled maintenance! Application submissions will be closed from 11:00 pm EST Friday to 9:00 am EST on Saturday.

Cloud Engineer - Monitoring Operations

This job posting is no longer active.


KBX Technology Solutions is looking for a Cloud Engineer to join our team! This role will be responsible for delivering and supporting full stack monitoring focused on the user experience. This role will be involved in crafting and implementing the Monitoring Strategy and tool usage documentation for application monitoring using enterprise monitoring toolsets (Datadog, AWS CloudWatch, etc.). This includes monitoring areas such as integrations, external services, infrastructure components, transactions, and business activities from applications that are in the cloud. This individual will work with senior members to gather monitoring requirements from stakeholders and deliver/maintain solutions and reducing incident occurrences.

What You Will Do In Your Role

  • Responsible for Production Monitoring expertise and implementing automation components including tools, platforms, process, and policies.
  • Roll-out best practices to product and support teams for setting up alerts, monitoring queues, reviewing logs etc.,
  • Define and enable Datadog governance for the application product and infrastructure teams to better manage the logging, monitoring and notification for events and issues
  • Design and build new features for infrastructure and services observability. Dive into new technologies and figure out how to best monitor them.
  • Collaborate across Application Development, product, and production management to establish and maintain Service Level Objective (SLO), Service Level Indicator (SLI) for key production services.
  • Develop solutions to implement the SLO/SLI requirements, including visualization of the monitoring dashboard.
  • Collaborate with business and technology to design and implement performance benchmarks for each application, provide dashboards and periodic reporting
  • Implement required telemetry and observability to monitor and measure the quality of service in real-time against the established SLA.
  • Write code within the monitoring solution or extend existing application code to improve application performance monitoring and to ensure application SLAs are met.
  • Build and drive adoption for greater self-healing and resiliency patterns.
  • Hands-on experience with cloud-based technologies and tools in configuration management, deployment, monitoring and operations.
  • Involved in in Incident, problem and change management processes and tools.
  • Troubleshoot key technical issues or escalate and work with appropriate technology teams to provide solutions.
  • Work with development teams throughout the software life cycle ensuring sustainable software releases.
  • Perform analysis on logs and use problem solving techniques.
  • Perform analytics on previous incidents and usage patterns to better predict issues and take proactive actions.

The Experience You Will Bring


  • At least 3 years’ experience in application performance monitoring/logging and synthetic monitoring of large systems Datadog, AWS CloudWatch, AppDynamics, Prometheus, Splunk, etc.
  • At least 3 years’ Experience with monitoring Kubernetes environment.
  • At least 3 years’ experience in .NET (Framework or Core) programming languages to design and improve software
  • At least 3 years’ Experience using source code management using Git.
  • Experience with messaging frameworks such as SOAP, REST and web services
  • Experience with scripting languages
  • Experience with of one or more infrastructure components (e.g., networking, cloud services, orchestration tools, containerization, compute, and storage systems)
  • Experience working in Agile/Scrum teams and proficient in Continuous Integration and Continuous Delivery

What Will Put You Ahead

  • BS/BA degree
  • Experience deploying to and operating in AWS cloud.
  • Experience with Datadog for monitoring
  • Experience on integrating monitoring/alerting tools with external capabilities like ZenDesk, Sales Force, etc.
  • Experience with Gloo Edge for cloud-native API gateway functionality
  • Experience with Microsoft Azure Dev Ops for source control and Agile/Scrum process management
  • Experience with relational databases is a plus (PostgreSQL, MSSQL)
  • Experience with Python

Salary and Benefits Commensurate with Experience.
Equal Opportunity Employer.
Except where prohibited by state law, all offers of employment are conditioned upon successfully passing a drug test.

This employer uses E-Verify. Please visit the following website for additional information: www.kochcareers.com/doc/Everify.pdf

This role is eligible for variable pay based on performance and other related factors. Variable pay may be issued as a monetary bonus or in another form. 

Employees may be eligible to participate in our benefits programs which include: medical, dental, vision, flexible spending accounts and health savings accounts, life insurance, AD&D, disability, retirement, paid vacation, paid parental leave and educational assistance. Specific eligibility criteria is set by the applicable Summary Plan Description, policy or guideline.

For this position we anticipate paying $135,000 to $180,000 per year.

This job posting is no longer active.

Sign up for our talent network.

Not ready to apply? Take a minute to sign up to receive notifications on opportunities that match your interests.

Sign Up Now
Our teams around the globe are finding innovative solutions to the COVID-19 pandemic. See how