Epoch AI, Epoch AI

Position ID:

3427-MATH [#24802]

Position Title:

Project Lead, Mathematics Reasoning Benchmark

Position Type:

Other

Position Location:

San Francisco, California 94117, United States of America

Appl Deadline:

2024/08/18 23:59:59 finished (2024/07/10, finished 2024/11/19, listed until 2025/01/10)

Position Description:

*** this position has been closed. ***

Project Lead, Mathematics Reasoning Benchmark

We’re looking for someone to lead the creation of a new mathematics reasoning benchmark for AI systems that will consist of around 1000 original and difficult questions. We expect this person's main responsibility to be finding and managing collaborators who wish to submit their questions to the benchmark. These collaborating contractors will likely be senior experts in a specific area of math. This is a temporary role: the expected completion time for the benchmark is around 6 months, and the project lead position will only last until the benchmark is completed.

The format of the questions in the benchmark will be similar to Project Euler: to solve the questions, it will often be necessary to combine coding with math knowledge, and correctness will be judged only on the basis of a final answer that’s designed to be difficult to guess in advance. In addition, each question will come with verification scripts written in Python to ensure that benchmark evaluation on new models can be done in an entirely automated fashion. For these reasons, the ideal project lead should have a solid background in math, programming, management, and connections in the world of academic math.

The successful candidate will report to Ege Erdil, senior researcher at Epoch, and work closely with the rest of Epoch AI’s research team. This role is fully remote, and we are able to hire in many countries. This role is only open to full time candidates. We will be evaluating applications on a rolling basis and close applications on August 18th or before that if we make a hire.

Key Responsibilities

Finding math experts in specific areas of math who can write difficult problems in their specific areas of expertise.
Providing guidance to question authors on what kinds of questions they are expected to write.
Maintaining good coordination across the entire team working on the project while keeping the desired end state and completion date of the benchmark in mind. An example of this would be to keep track of the overall distribution of question difficulty and directing question authors to write easier or harder questions if the question output has deviated too far from target.
Ensuring that adequate quality control protocols are implemented to reduce the occurrence rate of errors in the question statements or their answers.
Occasionally reviewing a sample of the questions submitted to the benchmark to ensure their correctness and their compliance with the question-writing guidelines.

What We Are Looking For

General mathematics expertise at the level of a Ph.D. in mathematics as a minimum, likely above this level.
Competence with the Python programming language, including its main packages designed for mathematical computation (e.g. SymPy).
Experience with solving Project Euler questions or similar problems combining coding skills with math knowledge is a plus, and so is prior math competition experience.
Ideally, a network of math experts and academics to rely upon to find question authors.
Ideally, experience with coordinating small teams working on projects of a similar scope to this one.
Interest in ML benchmarking in general and this project in particular.

What We Offer

Compensation

The baseline salary for this role is $200,000 USD, prorated according to how long the benchmark takes to complete. As the role is only expected to last six months, we expect to pay the project lead around $100,000 in total over the project's lifetime.

Contracts are set in local currencies.

Other Benefits

Comprehensive global benefits package

While they vary by country, we make every effort to ensure that our benefits package is equitable and high-quality for all staff. For most countries, the package includes medical insurance, life insurance and pension plan.

Generous paid time off leave, including:

Unlimited vacation (within reason)
Unlimited (within reason) personal and sick leave

Equipment stipend of the equivalent of $2000 USD every 3 years to cover costs of purchasing work and office equipment, prorated by contract length.
Paid work trips, including staff retreats and relevant conferences.
Additional in-person co-working stipend of the equivalent of $2000 USD annually to work in the same location as other staff, prorated by contract length
Professional development stipend equivalent to $2000 USD annually to spend on learning or development opportunities, prorated by contract length
Opportunity to contribute to a high-impact non-profit organization — our research is trusted by key decision makers globally.
Other benefits as allowed at the discretion of Epoch’s leadership and local availability.

About Epoch AI

Epoch AI is a research institute that investigates trends in machine learning and the economic consequences of AI. Our work informs policy-making at key government institutes and governance at leading industry AI labs.

You can learn more about our work in this summary dashboard, on our blog, or in this profile by Time magazine.

Additional Information

Please email careers@epochai.org if you have any questions about this role, accessibility requests, or if you want to request an extension to the deadline.
While we welcome applicants from all time zones, you may be expected to attend meetings during working hours between UTC-8 and UTC+3 time zones, where most of our staff are based.
Please submit all of your application materials in English and note that we require professional level English proficiency.
Travel is not a requirement for this position. However, a majority of our staff travel a few times per year for conferences, retreats, and other work-related purposes.
Epoch AI is committed to building an inclusive, equitable, and supportive community for you to thrive and do your best work. We’re committed to finding the best people for our team, so please don’t hesitate to apply for a role regardless of your age, gender identity/expression, political identity, personal preferences, physical abilities, veteran status, neurodiversity or any other background.
Epoch AI is fiscally sponsored by Rethink Priorities.

We are not accepting applications for this job through MathJobs.Org right now. Please apply at https://careers.rethinkpriorities.org/postings/7a97539d-2ee3-499d-aa0b-e2d28308090e .

Contact: Maria de la Lama

Email: email address

Postal Mail:

: 530 Divisadero St. PMB #796
San Francisco, CA 94117