| Stack | Role | Year | Status |
|---|---|---|---|
| NestJS, Next.js 14, TypeScript, PostgreSQL, MongoDB, Prisma, WebSockets, JWT, RBAC, Tailwind CSS | Full-Stack Software Engineer | 2026 | Completed |

OpsFlow - Incident & Operations Management Platform
Building a production-ready incident and workflow management platform for engineering teams
OpsFlow is a production-ready incident and operations management platform architected for engineering teams, combining incident response, workflow management, task coordination, and knowledge sharing into a single secure team-based system. The platform features dual-database architecture (PostgreSQL + MongoDB), real-time WebSocket updates, and team-based multi-tenancy with 50+ API endpoints.
The Problem
Engineering and operations teams rely on multiple disconnected tools to manage incidents, workflows, and documentation, leading to poor incident visibility, delayed response times, fragmented operational knowledge, weak accountability, and difficulty scaling workflows as teams grow. Existing solutions are either too expensive or lack the integration needed for unified operations management.
To unify incidents, workflows, and knowledge in one place without expensive infra, I built OpsFlow with team-scoped data and real-time updates so failures in one area do not take down the rest.
The Solution
I architected OpsFlow as a unified operations platform enabling teams to manage the full incident lifecycle with SLA tracking, coordinate response workflows and tasks in real-time, maintain a searchable team-scoped knowledge base, enforce role-based and team-based access control, and monitor operational health through a centralized dashboard. The system uses dual-database architecture (PostgreSQL for transactions, MongoDB for event logs) and WebSocket-powered real-time updates without requiring paid infrastructure.
Key Technical Terms
- Dual-database architecture:OpsFlow uses PostgreSQL for ACID incident and workflow data and MongoDB for high-volume event logs; that supports the project goal of both reliable state and cheap, flexible event history without a single bottleneck.
- On-demand SLA calculation:SLA and breach detection run during API requests instead of background jobs so we could ship without a queue or workers; for this project that traded some efficiency for simpler, free-tier-friendly infra.
- Team-scoped multi-tenancy:Every query is filtered by team so one team never sees another's incidents or docs; that supports secure, multi-tenant operations management on a single deployment.
The Impact
Production-ready system with 50+ API endpoints and real-time incident updates without paid infrastructure. Secure team-based multi-tenant architecture with complete data isolation, fully documented setup and deployment guides, and responsive UI across mobile, tablet, and desktop. Demonstrates enterprise-grade backend and system design skills with real-world SaaS architecture, security awareness, and proven ability to build production-ready systems under infrastructure constraints.
50+
2
WebSockets
RBAC + Teams
Outcomes
- Production-ready system with 50+ API endpoints
- Real-time incident updates without paid infrastructure
- Secure, team-based multi-tenant architecture
- Fully documented setup and deployment guides
- Responsive UI across mobile, tablet, and desktop
Architecture Deep-Dive
OpsFlow follows a dual-database architecture: PostgreSQL (via Prisma ORM) for structured relational data (incidents, workflows, teams) and MongoDB for append-only event logs and timelines. The backend uses NestJS modular architecture with feature-based modules. The frontend uses Next.js 14 with TypeScript for type safety. Real-time updates are handled through WebSocket connections (Socket.IO) without background workers, computing SLA calculations and breach detection on-demand during API requests. Multi-tenant isolation is enforced at the database query level with team-scoped filtering.
Key Engineering Decisions
I chose a dual-database architecture (PostgreSQL + MongoDB) over a single database because we needed ACID for incidents and workflows and a flexible store for high-volume event logs. I implemented on-demand SLA calculations over scheduled jobs to avoid paid workers on the free tier, trading some efficiency for cost. I used WebSockets for real-time updates without Redis pub-sub so we could ship with one server and add scale later. I selected Prisma for type safety and migrations over raw SQL. I deferred automated incident routing and advanced analytics to focus on core operations management.
Failure Modes & Resilience
PostgreSQL or MongoDB down: the app fails clearly for affected operations; WebSocket and API are isolated so a DB blip does not take down the whole process. WebSocket disconnect: clients reconnect and refetch so they do not miss updates. Long-running report or export: request timeouts and async patterns prevent one heavy operation from blocking others. Unauthorized access: RBAC and team-scoped queries enforce isolation; audit trails record who did what.
Outcome & Future Potential
Production-ready system with 50+ API endpoints and real-time incident updates without paid infrastructure. Secure team-based multi-tenant architecture with complete data isolation, fully documented setup and deployment guides, and responsive UI across mobile, tablet, and desktop. Demonstrates enterprise-grade backend and system design skills with real-world SaaS architecture, security awareness, and proven ability to build production-ready systems under infrastructure constraints.
Roadmap & Expansion
Vision includes scaling to 1,000+ teams through PostgreSQL read replicas, Redis pub-sub for WebSocket message distribution, and microservices extraction for analytics and notifications. Planned containerization with Docker and Kubernetes for orchestration. Advanced features include AI-powered incident prediction, automated response workflows, integration with PagerDuty/Slack, and comprehensive operational dashboards. Enterprise features include SSO integration, advanced permissions, and white-label customization.
50+
RESTful API with comprehensive features
2
PostgreSQL + MongoDB architecture
WebSockets
Live incident activity without polling
RBAC + Teams
Role and team-based permissions
Project Gallery
