Senior Platform Engineer at Pluralis Research (Expired)

AI Jobs Australia

Senior Platform Engineer

Expired

This role has expired and is no longer accepting applications. Browse similar roles →

Pluralis Research

Sydney NSW

remote

Full Time

Apply for this job

Posted 5 months ago

This role is expired

These roles are hiring now

View all similar roles →

AI Factory Customer Engineer

Armada

Sydney, NSW

remote

Technical interface between customers and Product/Engineering teams
5+ years data center engineering, infrastructure or solution architecture
Liquid-cooled data centers, modular data centers, NVIDIA GPU systems

Posted 14d ago

Senior DevOps AI Engineer

Obsidian Security

Australia

remote

DevOps engineering for AI/ML infrastructure and deployment pipelines
5+ years DevOps experience with AI/ML systems
Kubernetes, CI/CD, cloud platforms, ML operations

Posted 14d ago

AI Infrastructure Lead (AU)

DroneShield

Sydney, NSW

Lead AI Infrastructure team, drive delivery and improve team efficiency
7+ years infrastructure/SRE/platform engineering experience
Kubernetes, Linux, MLOps, distributed systems, team leadership

Posted 19d ago

Azure Platform Consultant

Arinco

Melbourne, VIC

hybrid

Deliver Azure migrations, infrastructure solutions and AI workload implementatio
5+ years Azure infrastructure and enterprise migration experience
Azure Bicep, Terraform, IaC, GitHub, DevOps, AI technologies

Posted 22d ago

Overview

Pluralis Research is pioneering Protocol Learning—a fully decentralised way to train and deploy AI models that opens this layer to individuals rather than well resourced corporates. By pooling compute from many participants, incentivising their efforts, and preventing any single party from controlling a model’s full weights, we’re creating a genuinely open, collaborative path to frontier-scale AI.

We’re looking for a Senior Platform Engineer with experience in startups, or senior devops in big tech with a passion for ML. Helping to scale and own our systems infrastructure orchestration, and services integration.

Responsibilities

Multi-Cloud Infrastructure: Design resource management systems provisioning and orchestrating compute across AWS, GCP, and Azure using infrastructure-as-code (Pulumi/Terraform). Handle dynamic scaling, state synchronization, and concurrent operations across hundreds of heterogeneous nodes.
Distributed Training Systems: Architect fault-tolerant infrastructure for distributed ML. GPU clusters, NVIDIA runtime, S3 checkpointing, Large dataset management and streaming, health monitoring, and resilient retry strategies.
Real-World Networking: Build systems that simulate and handle real-world network conditions — bandwidth shaping, latency injection, packet loss — while managing dynamic node churn and ensuring efficient data flow across workers with heterogeneous connectivity, because our training happens on consumer nodes and non co-located infrastructure, not in a datacenter.

What You’ll Bring

Ideally, you’ll have 5+ years of work experience with deep experience in:

Infrastructure-as-Code: Production Pulumi/Terraform/CloudFormation managing multi-cloud deployments. Lifecycle orchestration, automated provisioning, self-healing systems at scale.
Python Engineering: Idiomatic async Python with error handling, retry logic, concurrent execution. Asyncio, SSH libraries, cloud SDKs, CLI tools.
Container & GPU: Docker, Kubernetes/EKS, GPU workloads, heterogeneous clusters. multi-GPU optimization, resource scheduling.
Networking: Decentralized topologies and routing, NAT hole punching, P2P multi-address coordination, traffic shaping, real-world bandwidth constraints.
ML Infrastructure: Distributed training workflows, checkpoint management, data sharding, model versioning, long-running job operations.
Observability & SRE: Monitoring systems (Prometheus/Grafana), logging, SLOs, incident response, bottleneck profiling, performance optimization.

What we’re looking for

Experience in a startup environment with an emphasis on micro-services orchestration or big tech background
Deep understanding of multi-cloud infra & distributed training systems
A team player with high attention to detail
A strong passion to work at the intersection of AI and decentralized systems

FYI’s

We only hire in Australia and the United States. Visa sponsorship is limited to these countries.
Applicants must have professional-level English proficiency (written and spoken).
Pluralis is a remote team across Australia and the US. You’ll need to be comfortable working across timezones and collaborating with a diverse, distributed group.
Recruiters: we aren’t looking for agency support at this time. We’ll reach out if we need help.

Backed by Union Square Ventures and other tier-1 investors, we’re a world-class, deeply technical team of ML researchers. Pluralis is unapologetically ideological. We view the world as a better place if we are able to implement what we are attempting, and Protocol Learning as the only plausible approach to preventing a handful of massive corporations monopolising model development, access and release, and achieving massive economic capture. If this resonates, please apply.