Skip to main content

DevOps Lead

Expired
This role has expired and is no longer accepting applications. Browse similar roles →
Seeing Machines
Canberra, ACT
Full Time

Apply for this job

Posted 3 months ago
This role is expired

These roles are hiring now

View all similar roles →

HPC AI & Kubernetes Platform Engineer

CSIRO
Canberra, ACT | Melbourne, VIC
$118,102 - $127,808 per yr
  • Design, deploy & manage Kubernetes and AI infrastructure on GPU clusters
  • Relevant Bachelor's degree or equivalent work experience required
  • Kubernetes, Docker, Python, Linux, IaC tools (Helm, Ansible, Terraform)
Posted 16d ago

Software Engineer, Sovereign Cloud

Google
Canberra, ACT | Sydney, NSW
  • Develop and operate secure private cloud services for government
  • 2+ years software programming experience
  • C++, Java, Python or Go, data structures, algorithms
Posted 17d ago
Featured

Senior AI Platform Engineer

Sportsbet
Sportsbet
Melbourne, VIC
hybrid
  • Design, build and operate Sportsbet's emerging AI Platforms on AWS
  • Deep expertise in AWS cloud services, AI and ML
  • Python, IaC, cloud networking, security, AI/ML architecture
Posted 21d ago

Support Engineer

ResetData
Sydney NSW
  • Hands-on technical support across cloud, storage, and Kubernetes
  • Strong command-line skills and a drive to learn and troubleshoot
  • Sydney-based; must be eligible for NV1 security clearance
Posted 1d ago

Main Purpose of Role

The DevOps Lead ensures the reliability, scalability, and performance of critical systems and services. This role bridges development and operations, fostering a culture of automation, resilience, and continuous improvement. The manager leads a team of SREs to apply best practices, manage incidents, and drive operational excellence.

Qualifications, Skills and Experience: 

  • Bachelor’s degree in Computer Science, Engineering, or related field

  • Proven experience in SRE or DevOps leadership roles

  • Strong knowledge of:

    • Cloud platforms (AWS, Azure, GCP)

    • Container orchestration (Kubernetes, Docker)

    • Infrastructure automation (Terraform, Ansible, Jenkins, Lava)

  • Expertise in programming languages (Python, Java)

  • Proficiency with source control systems (GitHub Enterprise)

  • Familiarity with monitoring tools: Prometheus, Grafana, PRTG

  • Excellent communication and stakeholder management skills

  • Experience with distributed systems and high-availability architectures

  • Knowledge of security and compliance frameworks (ISO27001, SOC 2)

  • Certifications in cloud technologies or ITIL

  • Experience with Agile, Scrum, and Atlassian Jira

  • Familiarity with Google Cloud AI & ML services, including:

    • Vertex AI (end-to-end ML platform)

    • AutoML (custom model training)

    • BigQuery ML (machine learning in SQL)

    • Cloud AI APIs (Vision, Natural Language, Translation)

    • TensorFlow on Google Cloud

Other Attributes

  • Strategic thinker with strong problem-solving skills

  • Ability to thrive in a fast-paced, evolving environment

  • Collaborative and empathetic leadership style

Key Elements & Activities of the RoleLeadership & Team Development

  • Be a hands-on leader who connects with direct reports, peers, and partners both operationally and strategically

  • Provide technical leadership and coaching, maintaining credibility in systems engineering, tools, and DevOps

  • Promote a culture of learning, collaboration, and continuous improvement through Agile and Scrum

  • Ensure the team has development pathways, meaningful objectives, and KPIs aligned to a clear technology roadmap

Reliability & Performance

  • Manage, optimise, and deliver Systems, DevOps, and ML Ops as a service to internal stakeholders

  • Define, publish, and measure Service Level Objectives (SLOs) and Indicators (SLIs)

  • Oversee incident response, service request fulfilment, change management, optimisation backlogs, and post‑implementation/incident reviews

  • Deliver efficiencies through problem management, release management, and continuous improvement

  • Leverage Google Cloud AI and other tools for predictive analytics and anomaly detection

  • Focus on consumption and cost-to-serve via demand shaping, capacity planning, and environment governance

Automation & Efficiency

  • Develop and propagate frameworks, pipelines, and system engineering templates across platforms

  • Evangelise engineering practices, microservices, CI/CD, infrastructure-as-code, and security-by-design

  • Partner with Technology, Delivery, and Support teams to ensure alignment between software development and platform engineering

  • Drive automation initiatives to promote self-help and self-enablement, reducing manual effort

Cross-Functional Collaboration

  • Build strong relationships with stakeholders across Technology, Engineering, Architecture, and Seeing Machines support services

  • Work closely with development teams to design scalable and resilient systems

  • Align priorities across engineering, product, and operations teams

Governance & Compliance

  • Influence architecture and governance standards to balance innovation, scalability, and compliance

  • Establish cloud governance policies, access controls, and compliance standards

  • Ensure systems are aligned to Seeing Machines DR and BCP expectations

Monitoring & Observability

  • Enable monitoring systems, standards, and services that support predictive and reactive responses

  • Develop and publish dashboards showing system health across Seeing Machines

  • Deliver information and reporting according to an agreed cadence

Key LiaisonsInternal

  • Technology Division

  • Enterprise Systems & Services Department

  • Project Leads

  • All SM senior stakeholders

External

  • Product Vendors

  • Service Providers

Qualifications, Skills and Experience:

  • Bachelor's degree in Computer Science, Engineering, or related field
  • Proven experience in SRE or DevOps leadership roles
  • Strong knowledge of:
    • Cloud platforms (AWS, Azure, GCP)
    • Container orchestration (Kubernetes, Docker)
    • Infrastructure automation (Terraform, Ansible, Jenkins, Lava)
  • Expertise in programming languages (Python, Java)
  • Proficiency with source control systems (GitHub Enterprise)
  • Familiarity with monitoring tools: Prometheus, Grafana, PRTG
  • Excellent communication and stakeholder management skills
  • Experience with distributed systems and high-availability architectures
  • Knowledge of security and compliance frameworks (ISO27001, SOC 2)
  • Certifications in cloud technologies or ITIL
  • Experience with Agile, Scrum, and Atlassian Jira
  • Familiarity with Google Cloud AI & ML services, including:
    • Vertex AI (end-to-end ML platform)
    • AutoML (custom model training)
    • BigQuery ML (machine learning in SQL)
    • Cloud AI APIs (Vision, Natural Language, Translation)
    • TensorFlow on Google Cloud

Other Attributes

  • Strategic thinker with strong problem-solving skills
  • Ability to thrive in a fast-paced, evolving environment
  • Collaborative and empathetic leadership style