| Location | San Francisco, CA - US, Sunnyvale, CA - US |
| Commitment | FullTime |
| Remote | See job post for details |
New☀️ Plug a solar panel into your balcony outlet
Job Description
Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.
We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI.
We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services.
If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.
About the Role:
At Crusoe, our Production Engineering team ensures the reliability, scalability, and operational excellence of Crusoe’s AI-optimized cloud platform. We’re looking for a Staff Production Engineer with deep experience in distributed systems and hands-on exposure to large language models to help build and operate managed AI services at scale.
This role sits at the intersection of software engineering and infrastructure, focusing on designing, operating, and improving the production systems that power Crusoe’s managed AI platform. You will help ensure highly available, performant, and cost-efficient infrastructure capable of supporting compute-intensive, latency-sensitive AI workloads for customers running large-scale training and inference.
What You’ll Work On:
- Design and operate reliable production systems for managed AI services, with a focus on serving and scaling LLM workloads
- Build automation, tooling, and reliability systems to support distributed AI pipelines and inference platforms
- Define, measure, and improve SLIs and SLOs across AI workloads to ensure performance and reliability targets are consistently met
- Partner with AI, platform, and infrastructure teams to improve reliability, efficiency, and scaling of large-scale training and inference clusters
- Build observability and telemetry systems to monitor latency-sensitive AI services and identify performance bottlenecks
- Investigate and resolve reliability issues in distributed production environments using logs, metrics, tracing, and profiling
- Contribute to the architecture of next-generation AI infrastructure and distributed systems designed for large-scale production environments
- Drive improvements in operational automation, incident response, and system resiliency across Crusoe’s AI platform
What You’ll Bring:
- Strong software engineering background, with experience building and operating production-grade systems beyond scripting or basic automation
- Demonstrated experience designing and operating large-scale distributed systems
- Hands-on experience working with LLMs or AI/ML infrastructure, including training or inference systems
- A Production Engineering / SRE mindset, including experience with:
- Defining and measuring SLIs and SLOs
- Building monitoring and observability systems
- Driving performance and reliability improvements in production environments
- Designing fault-tolerant systems and automated testing strategies
- Proficiency in at least one modern programming language such as Python, Go, Java, or C++
- Experience working with Kubernetes or container orchestration platforms
- Strong collaboration and communication skills across engineering teams
- Ability to thrive in a fast-moving, mission-driven environment
Bonus Points:
- Experience scaling LLM training or inference workloads in production environments
- Experience building or operating AI platforms or managed AI services
Benefits:
- Industry competitive pay
- Restricted Stock Units in a fast growing, well-funded technology company
- Health insurance package options including HDHP and PPO, vision, and dental for you and your dependents
- Employer contributions to HSA accounts
- Paid Parental Leave
- Paid life insurance, short-term and long-term disability
- Teladoc
- 401(k) with a 100% match up to 4% of salary
- Generous paid time off and holiday schedule
- Cell phone reimbursement
- Tuition reimbursement
- Subscription to the Calm app
- MetLife Legal
- Company paid commuter benefit; $300 per month
Compensation:
Compensation will be paid in the range of $204,000 – $247,000 + bonus. Restricted Stock Units are included in all offers. Compensation will be determined by the applicant’s education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.
Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.
Crusoe Energy Systems number of job openings over time by month
ClimateTechList is the web's largest aggregator of climate, clean tech, renewable energy & green jobs. Contact us if you'd like to use partner or use our current or historical jobs data in any way.
Apply to Job
👉 Please mention that you found the job on ClimateTechList, this helps us get more climate tech companies listed here, thanks!
Get a referral to Crusoe Energy Systems
If possible, try to get a warm intro/referral to Crusoe Energy Systems before applying! Do a LinkedIn search to see who you may know at the company. See this LinkedIn post from Steven for more details on this tactic.
Join ClimateTechList Talent Collective
Want to be matched with companies directly? Apply to the talent collective.
Here's how it works:
You submit an application
We'll share your profile with climate tech companies potentially interested in chatting with you
We'll reach out if there's a company interested in talking to you.
No spam. Unsubscribe any time.
Join ClimateTechList Talent Collective
Want to be matched with companies directly? Apply to the talent collective.
Here's how it works:
You submit an application
We'll share your profile with climate tech companies potentially interested in chatting with you
We'll reach out if there's a company interested in talking to you.