Oceanic
Back to Documentation
Technical Specification

Porpoise AI Training Pipeline Platform

No-code model fine-tuning with interactive knowledge capture

Technical Summary

Porpoise democratizes AI model training through three intuitive interfaces: an AI Interviewer with video avatars, a visual workflow builder, and a quick-train wizard. Organizations can fine-tune domain-specific models without data science expertise, reducing time-to-model from weeks to hours.

Core Capabilities

  • AI Interviewer: Interactive video avatars conduct knowledge capture interviews with 67% completion rate
  • Multi-Channel Invitations: Reach subject matter experts via Slack, Teams, Email, SMS, and audio calls
  • LoRA/QLoRA Fine-Tuning: Efficient parameter-efficient training for 7B-70B models
  • Multi-Cloud Optimization: Automatic GPU selection across AWS, GCP, Azure for 40% cost savings
  • No-Code Workflows: Visual pipeline builder and 4-step wizard for business users

Training Efficiency

Model SizeMethodTraining CostTime
7B parametersLoRA (3 epochs)$0.751-2 hours
13B parametersQLoRA (5 epochs)$5.103-5 hours
70B parametersLoRA (3 epochs)$25.208-12 hours

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│               PORPOISE TRAINING PLATFORM                │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐             │
│  │    AI    │  │  Visual  │  │  Quick   │             │
│  │Interview │  │ Workflow │  │  Wizard  │             │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘             │
│       │             │             │                     │
│       └─────────────┼─────────────┘                     │
│                     │                                   │
│  ┌──────────┐  ┌────▼─────┐  ┌──────────┐             │
│  │  Data    │  │ Training │  │  Multi   │             │
│  │ Pipeline │  │   Job    │  │  Cloud   │             │
│  └──────────┘  └──────────┘  └──────────┘             │
│                                                         │
└─────────────────────────────────────────────────────────┘
                       ▲
                       │
        ┌──────────────┴──────────────┐
        │    INTEGRATIONS             │
        ├─────────────────────────────┤
        │ • HeyGen (Avatars)          │
        │ • Twilio (SMS/Voice)        │
        │ • Slack/Teams (Echo)        │
        │ • MLflow (Tracking)         │
        └─────────────────────────────┘

Core Components

AI Interviewer

Interactive knowledge capture via video avatars

  • • HeyGen avatar integration
  • • Echo RAG pre-training
  • • Natural conversation flow
  • • 67% completion rate

Multi-Channel Invites

Reach experts through preferred channels

  • • Slack/Teams via Echo
  • • Email with calendar sync
  • • SMS via Twilio
  • • Audio calls (Phase 2)

Visual Workflow Builder

Drag-and-drop training pipeline configuration

  • • Node-based interface
  • • Data preprocessing
  • • Template sharing
  • • Real-time validation

Quick Train Wizard

4-step model training for business users

  • • Upload CSV/JSON data
  • • Model recommendations
  • • Auto-tuned parameters
  • • 27-minute median time

Training Orchestration

Multi-cloud GPU job scheduling

  • • Spot pricing optimization
  • • Queue management
  • • Auto-scaling GPUs
  • • Cost estimation

Experiment Tracking

Complete training lifecycle management

  • • MLflow integration
  • • Hyperparameter logging
  • • Model comparison
  • • Artifact versioning

Technology Stack

Training

  • • PyTorch 2.0+
  • • Hugging Face PEFT
  • • LoRA/QLoRA adapters
  • • DeepSpeed optimization
  • • bitsandbytes quantization

Infrastructure

  • • Kubernetes orchestration
  • • AWS/GCP/Azure GPUs
  • • Ray for distributed training
  • • MLflow tracking server
  • • S3/GCS storage

Integrations

  • • HeyGen API
  • • Twilio SDK
  • • Echo RAG platform
  • • Slack/Teams OAuth
  • • Blue Whale deployment