Company Overview:

Our client is a research-driven organisation led by passionate mathematicians and computer scientists. The Research Technology team lies at the heart of the company, managing one of the largest HPC clusters in the world. This team is critical to the firm's success, facilitating trades with daily volumes exceeding $250 billion globally.

 

Team Overview:

The Research Technology team is a full-stack team that collaborates closely with researchers to develop a highly performant, reliable, and transparent system. The team builds custom software to support an exa-scale filesystem, job scheduler, and zero-touch platforms for seamless integration with data centre operations. They are also responsible for developing custom file formats, compression algorithms, GPU tooling, and network management software to optimise performance.

 

Key Responsibilities:

  • Design and build software for the HPC cluster, focusing on performance, reliability, and scalability.
  • Mentor junior team members and push the boundaries of the team’s capabilities.
  • Engage constructively with researchers to find novel and scalable solutions.
  • Promote and implement radical changes and alternative ways of thinking while maintaining a pragmatic approach to minimise operational risks.
  • Manage and maintain a complex live system 24/7, delivering changes on short notice or tight deadlines.

 

What You Will Be Working On:

  • Developing an exascale filesystem handling billions of directories, a trillion files, and a million clients with complete resiliency against hardware failure.
  • Enhancing a dynamic job scheduler managing over 10 million entries and 100,000 concurrent tasks.
  • Building zero-touch platforms for monitoring, operating, and upgrading tens of thousands of machines.
  • Creating custom file formats, compression algorithms, and GPU tooling to optimise performance from 20,000 high-end GPUs.
  • Expanding the HPC cluster to provide access to more teams and multiple data centres.
  • Improving measurement and optimisation of resource usage across the entire cluster.

 

Essential Attributes:

  • Strong academic grounding in computer science fundamentals, including algorithms and data structures.
  • Proficiency in at least one statically typed language; experience with Golang and Rust is beneficial but not required. Scripting is primarily in Python.
  • Approximately 5-10 years of experience in designing and building large-scale distributed systems with highly scalable solutions.
  • Excellent problem-solving and analytical skills.
  • Familiarity with the Linux operating system, particularly in diagnosing performance and scalability issues.
  • Ability to multitask, manage multiple projects simultaneously, and prioritise effectively.
  • High self-motivation and the ability to work independently without supervision.
  • Understanding machine learning frameworks and compute offload devices, such as GPUs, is an advantage.

 

This role offers the opportunity to work in a fast-paced, research-driven environment where you can significantly impact the firm’s HPC infrastructure and overall success. We encourage you to apply if you are a self-starter passionate about developing cutting-edge technology.

  • London, United Kingdom Location
  • PermanentJob Type
  • Competitive salary

Research Technology Developer

Apply for this job

UploadChoose a CV

To find out about how we process your data, please read our privacy policy.

Send

Looking for a career move?

Details of live roles are update on our jobs board. Not all positions can be advertised and many are retained. It is therefore best to contact a specialist consultant to discuss all suitable opportunities.

Looking to grow your team?

If you are looking to expand your team, require superior market intelligence or to discuss other services then do not hesitate to contact a specialist consultant.

Looking to join Campbell North?

We are always interested in speaking to experienced recruiters who share our values and have expertise in our markets.