Skip to content

SkyPortal Documentation

The AI-native platform that simplifies how teams build, train, and manage machine learning models.


The AI Agent Platform

SkyPortal Platform Interface

The AI agent that sets up and runs your ML jobs across any cloud

Any AI use case accelerated 10X


What is SkyPortal?

SkyPortal unifies your entire ML workflow in one platform. Instead of juggling multiple tools like Weights & Biases, GitHub, cloud consoles, and observability platforms, SkyPortal brings everything together.

  • Multi-Cloud Management


    Connect to every host from any cloud provider. Track server health, job runtime, and experiment progress with convenient tagging and a single terminal interface.

  • Job Orchestration


    Easily launch and manage training jobs on cloud or on-prem GPUs. Automatic scheduling and resource optimization.

  • Real-Time Observability


    Monitor accuracy, loss, epochs, GPU usage, and budget spend in real-time with comprehensive dashboards.

  • Team Collaboration


    Centralized logging, experiment tracking, and reproducibility for your entire team.

  • AI Agents


    Smart copilots that debug, optimize, and automate workflows for ML engineers and their teams.

  • ML-Specific Code Editor


    Code lint locally or on any host. Diff files with repo, host, and local environments seamlessly.


Turn One ML Engineer into Ten

ML teams today juggle separate environments, repos, and results, making collaboration slow and error-prone. SkyPortal gives you an immediate 10x improvement on productivity and scale.

10X
Productivity Improvement
10X
Scale
100%
Reproducibility

Key Features

Experiment Tracking

Automatically log hyperparameters, code versions, datasets, and metrics for complete reproducibility.

Usage Monitoring

Track GPU hours, storage consumption, and job status across all your infrastructure.

Budget Control

Early stopping and overage alerts prevent runaway costs and optimize resource usage.

Multi-Cloud Training

Run training jobs across AWS, GCP, Azure, and independent GPU providers seamlessly.

Advanced Metrics

MAE, MSE, accuracy, throughput, and infrastructure usage - all in one dashboard.

Enterprise Ready

Private cloud deployment, role-based access control, and compliance features.


Who Uses SkyPortal?

  • ML Engineers

    Manage multiple cloud hosts in one place with unified observability and control.

  • Data Scientists

    Seamless training and easy experiment comparison without infrastructure headaches.

  • Product Managers

    Visibility into training progress, costs, and performance metrics in real-time.

  • Engineering Leaders

    Cost transparency, reproducibility, and compliance for AI initiatives at scale.


Integration Ecosystem

SkyPortal works with your existing tools:

  • ML Frameworks: PyTorch, TensorFlow, Hugging Face
  • Cloud Providers: AWS, GCP, Azure, RunPod, Vast.ai
  • Data Storage: S3, MinIO
  • Version Control: GitHub
  • Experiment Tracking: Weights & Biases
  • Infrastructure: Kubernetes

Get Started

  • Get Started with SkyPortal


    Visit our main website to sign up and get started

    Visit SkyPortal.ai

  • User Guide


    Comprehensive guides on using all SkyPortal features

    Learn More

  • API Reference


    Complete API documentation for programmatic access

    View API Docs

  • Release Notes


    Latest features, improvements, and bug fixes

    View Releases


Ready to accelerate your ML workflow?