Skip to main content

Part of topic: Next.js Architecture, Node.js & Backend Systems, Full-Stack Systems Design

StackRoleYearStatus
NestJS, Next.js 14, TypeScript, PostgreSQL, MongoDB, Prisma, WebSockets, JWT, RBAC, Tailwind CSSFull-Stack Software Engineer2026Completed
OpsFlow - Incident & Operations Management Platform Architecture Diagram - Hero Preview by Ancel Ajanga.

OpsFlow - Incident & Operations Management Platform

Building a production-ready incident and workflow management platform for engineering teams

Written by Ancel — Software Engineer
Full-Stack Software Engineer
4 months (2026)
Completed
NestJS
Next.js 14
TypeScript
PostgreSQL
MongoDB
Prisma
WebSockets
JWT
RBAC
Tailwind CSS
View full tech stack

OpsFlow is a production-ready incident and operations management platform architected for engineering teams, combining incident response, workflow management, task coordination, and knowledge sharing into a single secure team-based system. The platform features dual-database architecture (PostgreSQL + MongoDB), real-time WebSocket updates, and team-based multi-tenancy with 50+ API endpoints.

The Problem

Engineering and operations teams rely on multiple disconnected tools to manage incidents, workflows, and documentation, leading to poor incident visibility, delayed response times, fragmented operational knowledge, weak accountability, and difficulty scaling workflows as teams grow. Existing solutions are either too expensive or lack the integration needed for unified operations management.

To unify incidents, workflows, and knowledge in one place without expensive infra, I built OpsFlow with team-scoped data and real-time updates so failures in one area do not take down the rest.

The Solution

I architected OpsFlow as a unified operations platform enabling teams to manage the full incident lifecycle with SLA tracking, coordinate response workflows and tasks in real-time, maintain a searchable team-scoped knowledge base, enforce role-based and team-based access control, and monitor operational health through a centralized dashboard. The system uses dual-database architecture (PostgreSQL for transactions, MongoDB for event logs) and WebSocket-powered real-time updates without requiring paid infrastructure.

Key Technical Terms

  • Dual-database architecture:OpsFlow uses PostgreSQL for ACID incident and workflow data and MongoDB for high-volume event logs; that supports the project goal of both reliable state and cheap, flexible event history without a single bottleneck.
  • On-demand SLA calculation:SLA and breach detection run during API requests instead of background jobs so we could ship without a queue or workers; for this project that traded some efficiency for simpler, free-tier-friendly infra.
  • Team-scoped multi-tenancy:Every query is filtered by team so one team never sees another's incidents or docs; that supports secure, multi-tenant operations management on a single deployment.

The Impact

Production-ready system with 50+ API endpoints and real-time incident updates without paid infrastructure. Secure team-based multi-tenant architecture with complete data isolation, fully documented setup and deployment guides, and responsive UI across mobile, tablet, and desktop. Demonstrates enterprise-grade backend and system design skills with real-world SaaS architecture, security awareness, and proven ability to build production-ready systems under infrastructure constraints.

50+

RESTful API with comprehensive features

2

PostgreSQL + MongoDB architecture

WebSockets

Live incident activity without polling

RBAC + Teams

Role and team-based permissions

Outcomes

  • Production-ready system with 50+ API endpoints
  • Real-time incident updates without paid infrastructure
  • Secure, team-based multi-tenant architecture
  • Fully documented setup and deployment guides
  • Responsive UI across mobile, tablet, and desktop

Architecture Deep-Dive

OpsFlow follows a dual-database architecture: PostgreSQL (via Prisma ORM) for structured relational data (incidents, workflows, teams) and MongoDB for append-only event logs and timelines. The backend uses NestJS modular architecture with feature-based modules. The frontend uses Next.js 14 with TypeScript for type safety. Real-time updates are handled through WebSocket connections (Socket.IO) without background workers, computing SLA calculations and breach detection on-demand during API requests. Multi-tenant isolation is enforced at the database query level with team-scoped filtering.

Key Engineering Decisions

I chose a dual-database architecture (PostgreSQL + MongoDB) over a single database because we needed ACID for incidents and workflows and a flexible store for high-volume event logs. I implemented on-demand SLA calculations over scheduled jobs to avoid paid workers on the free tier, trading some efficiency for cost. I used WebSockets for real-time updates without Redis pub-sub so we could ship with one server and add scale later. I selected Prisma for type safety and migrations over raw SQL. I deferred automated incident routing and advanced analytics to focus on core operations management.

Failure Modes & Resilience

PostgreSQL or MongoDB down: the app fails clearly for affected operations; WebSocket and API are isolated so a DB blip does not take down the whole process. WebSocket disconnect: clients reconnect and refetch so they do not miss updates. Long-running report or export: request timeouts and async patterns prevent one heavy operation from blocking others. Unauthorized access: RBAC and team-scoped queries enforce isolation; audit trails record who did what.

Outcome & Future Potential

Production-ready system with 50+ API endpoints and real-time incident updates without paid infrastructure. Secure team-based multi-tenant architecture with complete data isolation, fully documented setup and deployment guides, and responsive UI across mobile, tablet, and desktop. Demonstrates enterprise-grade backend and system design skills with real-world SaaS architecture, security awareness, and proven ability to build production-ready systems under infrastructure constraints.

Roadmap & Expansion

Vision includes scaling to 1,000+ teams through PostgreSQL read replicas, Redis pub-sub for WebSocket message distribution, and microservices extraction for analytics and notifications. Planned containerization with Docker and Kubernetes for orchestration. Advanced features include AI-powered incident prediction, automated response workflows, integration with PagerDuty/Slack, and comprehensive operational dashboards. Enterprise features include SSO integration, advanced permissions, and white-label customization.

50+

API Endpoints

RESTful API with comprehensive features

2

Databases

PostgreSQL + MongoDB architecture

WebSockets

Real-time Updates

Live incident activity without polling

RBAC + Teams

Access Control

Role and team-based permissions

Project Gallery

OpsFlow - Incident & Operations Management Platform Architecture Diagram - Gallery Image 1 by Ancel Ajanga.