Hardware Systems Engineer


Job Details

About the department

Cloudflare s Infrastructure group is responsible for building our global network. Our Hardware Engineering team helps research, develop, test, and deploy new equipment enabling 20% of the world s internet traffic to be served smoothly. Deployed across 285 cities in 100+ countries, the hardware we select helps improve the security, reliability, and performance of the Internet.

About the Role

We need to make thoughtful infrastructure choices affecting a significant portion of the Internet. Hardware we work with includes servers, routers, switches, optical equipment, power distribution units, cables, optics, and more. As a Hardware Systems Engineer, you will work with colleagues on the Hardware Engineering, Product teams, and Hardware Sourcing teams to troubleshoot and maintain Cloudflare s worldwide fleet of storage and compute servers.

What you'll do

  • Develop and maintain automation tools to update firmware on servers and components in Cloudflare s fleet
  • Work with software teams to validate bug fixes and performance of new firmware revisions
  • Test and deploy firmware updates to the fleet, monitoring the progress of the rollout for compliance and reliability
  • Work with server and component vendors to obtain, debug, and maintain the latest updates
  • Work with our Site Reliability Engineering teams to triage bug reports
  • Support our Data Centre Engineering teams in resolving hardware issues
  • Communicate your results and updates through blog posts, internal talks, and tickets

Examples of desirable skills, knowledge and experience

  • Bachelor s degree in Computer Engineering, Electrical Engineering, or Computer Science
  • Desire to learn about the Cloudflare hardware used by almost 20% of all web sites
  • Desire to learn how a diverse server fleet is managed at scale
  • Desire to learn the tools Cloudflare uses to maintain and monitor our hardware
  • Knowledge of PXE booting
  • Knowledge of configuration management, in particular we use salt to manage our fleet
  • Knowledge of Redfish, IPMI and server remote management protocols
  • Knowledge of running production mission critical systems

Bonus Points

  • Familiarity with server hardware architecture
  • Knowledge of debugging server hardware faults and the ability to engage with our sourcing team and vendors to improve quality
  • Experience of managing large fleets comprising of thousands of servers
  • Experience of observability and monitoring tools such as Prometheus and Grafana, and the ability to observe trends over time
  • Experience scripting and programming, in particular python and bash
  • Experience with software development tools and processes such as git, Bitbucket and TeamCity and Jira
#J-18808-Ljbffr





 CloudFlare

 05/29/2024

 All cities,CA