How to Conduct DevOps Due Diligence
A comprehensive guide for acquirers on conducting DevOps and SRE technical due diligence, covering key metrics like release frequency, MTTR, and IaC.
Written by The Beyond M&A team
Practitioners across Tech DD, integration, and AI-native deal tooling
Last reviewed 20 May 2026
How we researchExecutive summary
DevOps due diligence assesses the target's software delivery lifecycle, stability, and scalability. Key areas include release velocity, system reliability (MTTR, SLOs), infrastructure-as-code maturity, and the team's on-call process. Strong automation, clear observability, and tested disaster recovery plans are positive signals. Manual processes and frequent sev-1 incidents are red flags.
- 01High release frequency is meaningless without corresponding stability metrics.
- 02Mature on-call processes focus on SLOs and blameless post-mortems, not heroics.
- 03100% Infrastructure-as-Code (IaC) coverage is the goal; assess Terraform/CloudFormation realistically.
- 04Observability is more than just logging; look for distributed tracing and mature monitoring.
- 05A tested Business Continuity and Disaster Recovery (BC/DR) plan is non-negotiable.
''' An acquisition's success hinges on integrating not just the product, but also the engine that builds and runs it. DevOps and Site Reliability Engineering (SRE) practices are at the heart of that engine. A rigorous DevOps due diligence process reveals the target's true operational maturity, scalability, and the hidden costs of potential integration.
Our Technology Due Diligence practice focuses on moving beyond surface-level metrics to assess the genuine health of an engineering organisation. This guide outlines the key areas we investigate.
Release Frequency vs. Stability
High deployment frequency is often cited as a sign of a mature DevOps culture. While the ability to deploy code to production multiple times a day is a positive indicator, it is a vanity metric without the context of stability.
What Good Looks Like:
- Consistent Cadence: Predictable release cycles, whether daily, weekly, or bi-weekly.
- Low Change Failure Rate (CFR): A low percentage of deployments that result in a service degradation or require a hotfix. Mature teams have a CFR below 15%.
- Automated Pipelines: CI/CD pipelines that automate testing, security scanning, and deployment, with manual approvals for key gates (e.g., production deploy).
Red Flags:
- High CFR: Frequent rollbacks or hotfixes immediately following a release signal inadequate testing or a complex, brittle architecture.
- Manual Releases: Reliance on manual steps for deployment, often documented in a wiki, which are prone to human error.
- "Big Bang" Deployments: Infrequent, large, and high-risk releases that suggest a lack of confidence in the deployment process.
Mean Time To Recovery (MTTR) & SLOs
Failures are inevitable in complex systems. The most critical metric is not Mean Time Between Failures (MTBF), but Mean Time To Recovery (MTTR). This measures how quickly the team can restore service after a failure.
What Good Looks Like:
- Low MTTR: For critical services, this should be measured in minutes, not hours. This indicates effective monitoring, automated rollback capabilities, and well-rehearsed incident response procedures.
- Defined SLOs/SLIs: Clearly defined Service Level Objectives (SLOs) and Service Level Indicators (SLIs) that are understood by both engineering and the business.
- Error Budgets: The use of error budgets gives teams the autonomy to balance innovation with reliability. If the error budget is depleted, the focus shifts to stability work.
Red Flags:
- High MTTR: Prolonged outages suggest a lack of observability, poor incident response, or a "hero-driven" culture where only one or two key engineers can fix problems.
- No SLOs: If the target cannot define and track success for their services, they cannot make data-driven decisions about reliability.
On-Call Maturity
An on-call rotation is a window into the soul of an engineering culture. An immature on-call process leads to burnout, attrition, and poor system reliability.
What Good Looks Like:
- Blameless Post-mortems: A focus on systemic causes, not individual errors. The output should be actionable improvements to tooling, process, or architecture.
- Sustainable Rotations: Well-managed schedules with clear escalation paths. On-call engineers should not be constantly firefighting.
- Actionable Alerts: Alerts are specific, tied to symptoms (not causes), and provide runbooks for initial diagnosis.
Red Flags:
- "Alert Fatigue": A high volume of non-actionable alerts that get ignored, leading to genuine issues being missed.
- Blame Culture: Post-mortems that seek to identify a person or team at fault. This discourages transparency and learning.
- Hero Worship: Over-reliance on a small number of individuals to handle all critical incidents.
The Observability Stack
Observability goes beyond traditional monitoring. It is the ability to ask arbitrary questions about the system's state without having to ship new code. This requires a modern toolchain.
What Good Looks Like:
- The Three Pillars: Solid tooling for metrics (e.g., Prometheus, Datadog), structured logs (e.g., ELK Stack, Splunk), and distributed tracing (e.g., Jaeger, OpenTelemetry).
- Business-Level Metrics: Dashboards that correlate application performance with business KPIs (e.g., user signups, transaction volume).
Red Flags:
- Logging Only: Reliance on SSHing into boxes and
grep-ing log files to debug issues. - Siloed Data: No unified view across different parts of the stack, making it difficult to trace a request from the user to the database.
Infrastructure-as-Code (IaC) Coverage
Modern infrastructure is ephemeral and manageable through code. IaC tools like Terraform, CloudFormation, or Pulumi are essential for scalable, repeatable, and secure infrastructure management.
What Good Looks Like:
- High Coverage: >90% of infrastructure is defined in code and stored in version control.
- Code Review for Infra: Infrastructure changes go through the same peer review process as application code.
- State File Management: Secure and robust management of Terraform state files or CloudFormation stacks.
Red Flags:
- "ClickOps": Manual provisioning of infrastructure via a cloud provider's console. This leads to configuration drift and is impossible to audit or replicate consistently.
- Zero or Partial IaC: Using code for initial setup but making subsequent changes manually.
- Sensitive Data in Code: Committing secrets, keys, or passwords directly into version control.
Disaster Recovery Posture
Many companies have a disaster recovery (DR) plan, but few have actually tested it. Diligence must verify the practical viability of the target's business continuity plan (BCP) and DR strategy.
What Good Looks Like:
- Documented and Tested Plan: A clear, up-to-date DR plan that is tested regularly (e.g., annually or semi-annually) through "game day" exercises.
- Defined RTO/RPO: Clear Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) that are aligned with business requirements.
- Automated Backups & Restore: Regular, automated backups of all critical data stores, with tested and proven restoration procedures.
Red Flags:
- "Shelfware" DR Plan: A plan exists on paper (likely in a virtual data room like Lens) but has never been tested.
- No Automation: Reliance on manual processes to failover to a secondary region.
- Unrealistic RTO/RPO: Claims of near-zero downtime without the architecture (e.g., active-active multi-region) to support it. '''
Frequently asked
What is MTTR and why is it important in due diligence?+
Mean Time To Recovery (MTTR) measures the average time it takes to recover from a production failure. It's a critical indicator of a team's operational maturity and the resilience of their systems. A low MTTR (minutes, not hours) suggests effective monitoring, well-practised incident response, and reliable deployment pipelines.
What level of Infrastructure-as-Code (IaC) is considered 'good'?+
Ideally, all production infrastructure is managed via IaC tools like Terraform or CloudFormation. A 'good' state is >90% coverage, with clear processes for code review and deployment. Red flags include manual infrastructure changes, low coverage, or using IaC for initial setup only, with subsequent changes made manually ('configuration drift').
Why is 'release frequency' on its own a vanity metric?+
High release frequency (e.g., multiple deployments per day) seems impressive but is meaningless without stability data. A team deploying 10 times a day but causing frequent outages is less mature than a team deploying once a week with near-perfect stability. Always evaluate change failure rate and MTTR alongside deployment velocity.
What's the difference between monitoring and observability?+
Monitoring tells you when something is wrong (e.g., CPU is at 95%). Observability helps you understand *why* it's wrong. It involves richer data sources like distributed tracing, structured logs, and application-level metrics, allowing engineers to debug complex, distributed systems without needing to ship new code to inspect them.
If you're reading this as…
Related guides
AI in DD
AI for HR Due Diligence
Leveraging AI in HR and culture due diligence for employment contract review, sentiment analysis, and attrition signal extraction.
Tech Due Diligence
Quantifying Technical Debt in Due Diligence
A precise, calm, and authoritative guide to quantifying technical debt during due diligence for M&A, translating code smell, test coverage, deployment friction, and architectural debt into investable dollars and a remediation roadmap.
Tech Due Diligence
Cloud Cost Due Diligence: Valuing FinOps Maturity and Cost Reduction
A precise examination of cloud cost due diligence, assessing FinOps maturity, reserved instance strategies, multi-account efficiencies, egress costs, and the enterprise value impact of cloud cost optimisation.
Tech Due Diligence
CTO Interview Questions for Due Diligence
A comprehensive guide to CTO interview questions during due diligence, focusing on architectural thinking, hiring philosophy, technical debt, and integration plausibility. Includes a scoring rubric for objective assessment.
Further reading on our network