DeepSee delivers an open and flexible agentic platform to accelerate AI adoption for financial services in front, middle, and back-office operations. Our cloud-based platform seamlessly integrates with existing bank architectures, whether they’re just starting their AI transformation journey or looking to enhance existing in-house capabilities with Agentic AI solutions. With DeepSee’s pre-trained & pre-configured agents, banking and capital markets firms can automate and orchestrate manual, repetitive tasks—freeing domain experts for strategic work, reducing risk, and streamlining operations to drive greater efficiency.
We are looking for a Senior Director of Platform Engineering to lead our backend, frontend, infrastructure, and MLOps/DevOps/CICD teams. You’ll scale our Kubernetes platform across AKS, EKS, and on-prem, ensure high availability and performance, and evolve our agentic AI and MCP-based integrations for bank-grade reliability. You’ll partner tightly with the Chief Architect and our Product team to deliver a secure, observable, auditable platform for regulated clients.
Job Responsibilities:
- Own and drive the platform roadmap and strategy for multi-cloud/on-prem Kubernetes (AKS, EKS, vanilla K8s), compute, data, networking, ML serving, and high availability/performance.
- Lead, build, and develop multiple teams (Backend, Frontend, Infrastructure, MLOps/DevOps), including leadership, career ladders, and operational rhythms.
- Scale Kubernetes reliably: capacity planning, autoscaling (HPA/VPA/Cluster Autoscaler/KEDA), cost controls for mixed CPU/GPU workloads.
- Advance and mature GitOps, IaC, and observability practices (Argo CD, Terraform, Helm, OpenTelemetry, Datadog, Prometheus), including rollout strategies, standardization, monitoring, incident response, and post-mortems.
- Advance MLOps for LLMs/SLMs/ML/DL (KServe, MLflow pipelines, model governance, inference patterns, GPU scheduling, canary rollouts).
- Evolve and operate eventing and stateful architecture at scale (Kafka/ZooKeeper/KRaft, Postgres, S3/Blob, protobuf, schema evolution/versioning, resilient data planes).
- Directly contribute technically via coding, reviews, and debugging distributed systems.
- Partner closely with Chief Architect, Principal AI, Product, and other leads to deliver secure, observable, auditable, regulated banking solutions, supporting agentic AI and workflow automation.
Must Haves:
- Significant leadership experience: 10 years on distributed platforms and 5 years leading multi-disciplinary platform teams.
- Deep, hands-on Kubernetes expertise (networking, security, tenancy, upgrades; AKS/EKS operations).
- Proven hands-on expertise with GitOps, IaC, change management, rollout safety, and production observability (Argo CD, Terraform, Helm, OpenTelemetry, Datadog/Prometheus, SLOs/on-call).
- Advanced MLOps experience (KServe, MLflow, model registry/governance, GPU scheduling, cost tuning, canary rollouts, safe rollouts).
- Experience with designing/operating event streaming, stateful data, and resilient architecture at scale (Kafka/ZooKeeper/KRaft, Postgres, S3/Blob, protobuf, schema/versioning).
- Deep proficiency in core languages (Java, Python, Go), cloud SDKs, and strong architectural communication to executive-level and clients.
- Regulated FinServ experience (SOC 2/ISO 27001, SR 11-7, SEC/FINRA, model governance, OpenTelemetry, trace-driven perf, KServe ModelMesh or similar tools).
Nice to Haves:
- Hands-on skills with most listed technologies: Kubernetes (vanilla, AKS, EKS), Docker, Argo CD, Helm, Terraform, Kafka/ZooKeeper/KRaft, KServe, MLflow, OpenTelemetry, Datadog, Prometheus, protobuf, HPA, VPA, Karpenter or Cluster Autoscaler, LightRAG, Milvus, Postgres, S3/Blob, Redis, Airflow/dbt, Java, Python, Go.
- Experience working alongside a variety of engineering leaders and principal engineers (Chief Architect, CISO, Principal Knowledge Graph Engineer, AI Engineer, Lead BE, Principal FE, Product).
- Platform-as-a-product advocacy and developer experience focus, CNCF platform engineering guidance.
Finally, it is important that you align with our Stuff That Matters.
Knowledge Over Noise: We prioritize actionable insights
One Team, One Dream: We collaborate seamlessly across functions
Be a Seeker: We constantly pursue innovation and learning
Stay Human: We keep our solutions people-centric
Act Boldly: We take calculated risks to drive progress
Believe: We’re passionate about our mission
Own It: We take responsibility for our work and its impact
Why DeepSee.ai?
Competitive compensation package including equity, with remote work options
100% company-paid premiums on health, dental, and vision insurance
Opportunity to work on cutting-edge AI technology with real impact
Collaborative and innovative work environment
Join us in shaping the future of AI-powered automation and make a significant impact in a rapidly growing startup. If you’re a hands-on problem solver who thrives in fast-paced environments and is excited about leveraging AI to solve complex problems, we want to hear from you!