Sr. Site Reliability Engineer
**Open to Remote/Hybrid **
Full Time Opportunity (W2 only)
Pay Range: $60 - $70.42/hr.
Required Skils:
Cloud, AWS, Elasticsearch, Terraform, Scripting
- Measure and monitor availability and overall system and environment health. Make recommendations to improve service.
- Build and implement monitoring and recovery tools to provide optimum delivery and resilience.
- Works with partner groups to establish Service Level Objectives, Service Level Indicators and Error Budgets.
- Provides advanced operational support and engineering for multiple large distributed software applications
- Plays a principal role in the development and implementation of departmental workflow automation processes and procedures.
- A point of contact for internal and external customers, providing technical guidance and support for application and service delivery environments.
- Provides guidance to development and engineering teams to automate and optimize service availability, scalability, performance, monitoring and alerting.
- Key contributor with technical evaluations and "proof of concept” programs as it relates to evaluating and implementing new technologies and tools.
- Required to provide on-call support during off-duty hours on weekdays, weekends and holidays on a scheduled/rotating basis.
- Required to perform duties outside of normal work hours based on business needs.
Cloud Platform:
- Strong working knowledge of cloud services and architecture. (AWS, Azure)
- Distributed Systems. (Architectures, micro-services, high availability) with proficiency in installation, maintenance and operational support in large scale enterprise environments.
- Strong working knowledge of container computing. (Docker, Kubernetes, Service Mesh)
- Can build and configure Azure, AWS services. (LAMBDA, Azure Functions)
- Understands Proxies and Load Balancing. (Nginx, HAProxu, Envoy).
Monitoring and Tools:
- Expert in ServiceNow integrations.
- Expert in log event aggregation, metric collection and application monitoring and event handling. (Elastic, SCOM, AppD, Uptrends, AppInsights, Cloudwatch)
- Strong knowledge of Windows and UNIX/Linux technologies.
- Strong knowledge network triaging, packet loss and routing.
- Ability to create Service Level Objectives (SLO), Service Level Indicators (SLI), Error Budgeting and Burn Rates.
Development:
- Develops "everything as code” methodologies across configuration, infrastructure and orchestration.
- Can easily work with programming languages. (.Net, C#, C++, Python).
- Develops solutions with continuous integration tools. (Chef, Ansible, Jenkins, Stash/Git)
- Develops solutions with configuration management tools. (Puppet, Hiera, Terraform, Terragrunt, Ansible)
- Uses scripting languages or other tools to enable workflow automation.
Related:
- Strong analytical and problem solving skills to troubleshoot infrastructure problems specific to their area of technical expertise and potentially across technical disciplines.
- Partners with development or engineering teams to automate and optimize service availability.
Administrative/Interpersonal Skills (All):
- Good organization skills to balance and prioritize work assignments.
- Good verbal and written communication skills.
- Ability to work as a member of a multi-cultural, multi-location, team.
Job | |
---|---|
Software Engineer IV | |
Full Stack Software Developer II | |
UX/UI Designer III | |
Android Developer | |
Sr. Compensation Analyst |
Copyright © 2023 Infinity Consulting Solutions, Inc. All rights reserved. Privacy Policy | CCPA Privacy Policy |Terms of Use