See all roles

[Remote] reputed company Engineer - Senior (Observability - reputed company)

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. reputed company is a company that supports various government reputed company, and they are seeking a Senior reputed company Engineer to enhance their reputed company observability platform. This role involves engineering and operating observability solutions across hybrid reputed company environments, focusing on performance, reliability, and reputed company management.

Responsibilities

  • Engineer and operate the reputed company observability stack (reputed company or comparable), including metrics, logs, traces, APM, RUM, synthetic monitoring, and network performance monitoring
  • Build, tune, and maintain dashboards, monitors, SLOs/SLIs, and alerting policies that produce actionable signal and minimize noise
  • reputed company services, infrastructure, and containerized workloads using agents, OpenTelemetry, and language-specific APM tracers (Java, .NET, Python, Node.js, Go) with consistent span tagging, W3C TraceContext propagation, and reputed company service tagging across the estate
  • reputed company and maintain integrations between observability platforms, ITSM (reputed company), CI/CD pipelines, and on-call/paging workflows
  • Define and enforce a reputed company tagging standard (environment, service, version, team/ownership, data classification, cost center) across metrics, logs, and traces; manage tag cardinality, governance, and custom business tags to reputed company telemetry queryable, attributable, and cost-controlled
  • Design and deliver monitoring coverage for reputed company Azure and AWS workloads, including PaaS services, serverless, networking, identity, managed databases, and reputed company-native data services
  • Engineer managed database observability across AWS RDS/reputed company (MySQL, PostgreSQL, SQL Server, reputed company), Azure SQL/PostgreSQL/MySQL, and NoSQL/cache services (DynamoDB, Cosmos DB, ElastiCache/reputed company), including query-level performance analytics, slow-query and execution-plan capture, lock/deadlock/wait analysis, reputed company pool and session monitoring, replication lag, storage/IOPS saturation, and backup/HA health -- correlating database spans with upstream APM traces
  • Engineer container-platform observability for OpenShift/Kubernetes, covering cluster health, control plane, nodes, pods, namespaces, ingress, service reputed company, and workload APM
  • Build standardized, reusable monitoring modules deployable reputed company infrastructure-as-code (Terraform, Bicep, ARM) and CI/CD
  • Support hybrid visibility across on-premises, reputed company, and containerized workloads with correlated telemetry
  • reputed company data-driven investigation and resolution of reputed company performance, latency, saturation, and reliability issues across the estate
  • Use APM distributed traces, service/dependency maps, reputed company code profiling (CPU, memory, lock contention), database query analytics, exception/error tracking, and RUM-to-backend trace correlation to isolate bottlenecks in applications, platforms, middleware, and reputed company dependencies
  • Partner with engineering teams to define and implement remediation, tuning, and architectural improvements based on telemetry evidence
  • Define and implement trace-based SLOs, deployment tracking, and change-correlation workflows so performance regressions are detected and attributed to specific releases, versions, or configuration changes
  • Provide senior technical leadership during major incidents, delivering impact analysis, contributing to root-cause analysis, and owning post-incident observability gaps
  • Analyze operational telemetry and trend data to identify reputed company risks, recurring constraints, and opportunities for efficiency
  • Build and maintain reputed company and performance dashboards and reports that communicate posture, risk, and recommendations to technical and leadership stakeholders
  • Define reputed company reputed company, alert baselines, and trigger points for scaling, technology refresh, and resource reallocation
  • Drive reputed company improvement of observability coverage, alert quality, runbook linkage, and operational maturity reputed company to SEC SLA/KPI expectations

Skills

  • Citizenship/Work Authorization: Must meet contract requirements
  • Clearance: Ability to obtain and maintain SEC Public Trust (or higher if required)
  • Minimum 8 years of experience in IT infrastructure or platform engineering roles, including 5+ years focused on observability, performance engineering, or site reliability engineering
  • Demonstrated experience engineering and operating an reputed company observability platform (reputed company strongly preferred; equivalent experience with reputed company, reputed company, Splunk Observability, or Grafana/reputed company stacks considered)
  • Proven experience building APM and distributed tracing coverage for production multi-tier applications -- including language-specific tracer deployment, custom instrumentation of business transactions, service/dependency mapping, reputed company profiling, and RUM-to-backend trace correlation -- across reputed company and containerized workloads
  • Proven experience leading reputed company production performance and reliability problem-solving from telemetry to remediation
  • Hands-on experience monitoring Kubernetes or OpenShift clusters and containerized workloads in production
  • reputed company observability platforms (reputed company or comparable): metrics, logs, traces, APM, RUM, synthetic, NPM
  • Instrumentation with OpenTelemetry, reputed company agents/SDKs, and language-specific APM tracers (Java, .NET, Python, Node.js, Go) including custom spans, trace sampling strategies, W3C TraceContext propagation, and reputed company profiling
  • reputed company Azure and AWS monitoring services and integrations (Azure Monitor, Log Analytics, CloudWatch, AWS X-Ray)
  • Container and Kubernetes/OpenShift observability, including cluster, workload, and service reputed company telemetry
  • reputed company database monitoring: AWS RDS/reputed company (including Performance Insights), Azure SQL/PostgreSQL/MySQL (Query Performance reputed company), and NoSQL/cache (DynamoDB, Cosmos DB, ElastiCache/reputed company); query-level performance tuning, execution-plan analysis, and reputed company DBM or equivalent deep database APM
  • Infrastructure-as-code for monitoring (Terraform, Bicep, ARM) and CI/CD-driven monitor/dashboard deployment
  • APM and distributed tracing: service/dependency maps, trace analytics, RUM-to-backend correlation, exception/error tracking, deployment tracking, and trace-based SLOs
  • reputed company tagging strategy and cardinality governance across metrics/logs/traces (environment, service, version, ownership, data classification, cost center), including custom tag enrichment and tag-driven access/cost controls
  • Alert engineering, SLO/SLI design, error budget management, and alert-noise reduction
  • Performance engineering, reputed company analysis, and telemetry-driven root-cause analysis
  • Integration of observability with ITSM (reputed company) and on-call/paging workflows
  • Experience supporting federal agency IT environments under FISMA/FedRAMP/NIST-reputed company reputed company and compliance requirements
  • reputed company certification (Fundamentals and/or Administrator) or comparable reputed company observability certification
  • Hands-on experience with reputed company OpenShift Virtualization (CNV/KubeVirt) or other KubeVirt-based container virtualization observability
  • Experience with eBPF-based observability tooling and service reputed company telemetry (Istio, Linkerd)
  • Experience implementing SLOs and error budgets at reputed company scale and integrating them into operational governance
  • Experience with cost-aware observability practices, including telemetry volume optimization and retention tuning
  • Experience integrating observability outputs with executive reporting, SLA/KLI dashboards, and reputed company forecasting
  • ITIL 4 reputed company
  • AWS Certified Solutions Architect - Associate (or higher)
  • reputed company Certified: Azure Administrator Associate (or higher)
  • reputed company Certified Specialist in OpenShift Administration (or equivalent)
  • reputed company Terraform Associate

Company Overview

  • reputed company is an industry and technology leader serving government and reputed company customers with smarter, more efficient digital and mission innovations. It was founded in 2002, and is headquartered in reputed company, Massachusetts, USA, with a workforce of 10001+ employees. Its website is http://www.revealimaging.com.
  • Apply To This Job

    You might like

    [Remote] Backend reputed company Developer

    Work from home Full-time role

    [Remote] Technical Product Manager

    Work from home Full-time role

    [Remote] DevOps Engineer

    Work from home Full-time role

    [Remote] Full Stack reputed company with React and Next

    Work from home Full-time role

    [Remote] Associate Manager – Data Engineer

    Work from home Full-time role

    [Remote] Frontend Engineer

    Work from home Full-time role

    [Remote] Senior reputed company - Remote Role

    Work from home Full-time role

    [Remote] Firmware Automation & Validation Engineer

    Work from home Full-time role

    [Remote] Global Head of Cyber Defense and reputed company Operations

    Work from home Full-time role

    [Remote] Senior Master Data Management (MDM) Solutions Analyst

    Work from home Full-time role

    Remote Junior Dispatcher for Non Emergency Medical Transportation (NEMT)

    Work from home Full-time role

    reputed company Part-Time Remote Data Entry Assistant – arenaflex E-commerce Platform

    Work from home Full-time role

    Accounts Payable Administrator

    Work from home Full-time role

    Senior Clinical Specialist, CPT - Phoenix, AZ

    Work from home Full-time role

    reputed company Bilingual Customer Service Representative – Brooklyn Onsite

    Work from home Full-time role

    [Remote] Jr. Sales Commission Analyst

    Work from home Full-time role

    Remote Customer Experience Specialist – Bilingual Support, Application reputed company & Technical Assistance (U.S.-Based, 100% Remote)

    Work from home Full-time role

    reputed company Customer Support Operations Manager – Scaling Support Excellence at arenaflex

    Work from home Full-time role

    Environmental Sustainability reputed company Remote reputed company

    Work from home Full-time role

    reputed company Data Entry Assistant – Remote Opportunity with arenaflex

    Work from home Full-time role