Section 3 — Launch Your First ML Job
SkyPortal simplifies training job launches across any connected infrastructure.
Steps to Launch
-
Select a Host
Choose from your connected hosts or clusters. -
Choose Environment
Pick the appropriate Python environment and container if needed. -
Define the Job
- Training script
- Dataset location
- Hyperparameters
-
Resource requirements (GPUs, CPUs)
-
Run
Click Start Job or run the equivalent CLI/agent command.
What Happens Next
- The job is scheduled on your chosen host
- Logs begin streaming live
- Metrics such as loss, accuracy, and resource usage begin populating dashboards
- Budget tracking starts automatically :contentReference[oaicite:7]{index=7}
Best Practices
- Attach datasets via supported storage backends (e.g., S3, MinIO)
- Track experiments for reproducibility
- Use tagging to organize runs