Build and scale any AI workloads on your own infra – document parsing, LLM finetuning, video embeddings, or run 70B Llama. Production in hours, not months.
Transform your code, models, and data sources into AI pipelines via simple configuration
Write the pipeline with the I/O data sources and Axolotl config. The A100s spins-up and run training in your infra automatically.
All the infra complexity, from GPU driver version mismatch, disk creation, firewalls ports taking care by the platform.
name = "qlora_finetuning"description = "Finetuning Llama-7B with Axolotl"job = { task_name="axolotl", profile="node_a100", mode="train" params = { config_text="config/lora8b-instruct.yml", },}input = 'gs://bucket/input'output = 'gs://bucket.output'Add the LLM models, notebook code and I/O into the pipeline. The VM with SGLANG run the models and code automatically.
You don't have to deal with SGLang installation, port access and others. Cut your time significantly.
name='generate_qa_llm'description='Generate data with GPT OSS 20B'cmd = 'python augmented.ipynb'startup = [{ name="sglang", model = "openai/gpt-oss-20b"}]input = 'gs://bucket/input'output = 'gs://bucket.output'Write inference pipeline by define your models, job and compute resources. Kubernetes node GPU will created and run your inference.
No VLLM setup installation, No kubernetes manual creation. Everything done automatically.
name = "host_llm_inference"description = "Inferencing custom Llama-70B"job = { task_name="vllm", params = { model="llama-70b-finetuned", profile="medium_gpu" }}Write pipeline with your I/O, code and docker image. The instance will be provisioned and run your code inside the docker automatically.
No docker setup, volume mounting, integration to GCS. Everything done by platform.
name = 'extract_pdf'description = 'Extract PDF to Markdown with Docling'image = 'docling:latest'cmd = "chmod +x run.sh && ./run.sh"input = 'gs://bucket/input'output = 'gs://bucket/output'compute = 'single_a100'Each pipeline support parallel and chained dependencies across dedicated or shared compute resources
Automate provisioning single VM to multi-regional K8s clusters through a single YAML file.
Spin a development server for AI workloads development inside organization private networks. Secured via IAP provides protection to internal data while working from public internet.
Features
- Real-time editor online collaboration
- Pipelines building with high compute
- Produce test environment
dax project deploygcloud compute ssh deploy --tunnel-through-iap
Designed for repeatable deployment across teams and environments. Translating complex compute topologies into YAML-based components. Simplify complex infrastructure for LLM and data science pipelines with consistency and precision.
Features
- VM and Clusters support
- Spot / Preemptible options for cost savings
- Overrides configuration for more advanced usage
gcp_vm_g2_16: machineType: g2-standard-16 gpu: 1 osImage: projects/cos-cloud/global/images/family/cos-121-lts preemptible: "true" provisioningModel: SPOT imageSize: 50 bootSize: 30 alternativeZones: - us-east1-b - us-central1-b
Fully compatible with existing Kubernetes environments or deployable on demand through DAX. Support for Ray and native Kubernetes jobs provides flexibility for a wide range of workloads. Integrated gang scheduling ensures efficient GPU allocation for high-intensity AI tasks. Operational across GKE, on-premises deployments, and any standard Kubernetes cluster.
Features
- Cloud and On-premise clusters integration.
- Jobs via Ray, AppWrapper and Kubernetes.
- Gang-scheduling for GPU compute.
- More advanced features.
apiVersion: kueue.x-k8s.io/v1beta1kind: ClusterQueuemetadata: name: "cluster-queue"spec: namespaceSelector: {} # match all namespaces resourceGroups: - coveredResources: [ "cpu", "memory", "ephemeral-storage" ] flavors: - name: "default-flavor" resources: - name: "cpu" nominalQuota: 10000 # Infinite quota. - name: "memory" nominalQuota: 10000Gi # Infinite quota. - name: "ephemeral-storage" nominalQuota: 10000Gi # Infinite quota.