Scientists are creating “humanity’s last exam” to test AI and see when it has reached expert-level intelligence.
People are being asked to submit their questions and create “the world’s most difficult artificial intelligence test” by the Center for AI Safety (CAIS) and Scale AI.
“Existing tests now have become too easy and we can no longer track AI developments well, or how far they are from becoming expert-level,” said the quiz creators in a statement about the test.
A few years ago, AI was giving almost random answers to questions on exams – that’s no longer the case.
Last week, OpenAI’s newest model, known as OpenAI o1, “destroyed the most popular reasoning benchmarks”, according to Dan Hendrycks, executive director of CAIS.
However, AI still isn’t able to answer difficult research questions and other intellectual questions.
It also appears to score poorly on tests involving planning and visual pattern-recognition puzzles, according to Stanford University’s AI Index Report from April.
Consequently, “humanity’s last exam” will require abstract reasoning to test how clever AI really is.
The submissions shouldn’t be any ordinary quiz questions.
“We found questions written by undergraduates tend to be too easy for the models,” the creators of the quiz said.
Instead, they recommend that question writers have five or more years of experience in a technical industry job like SpaceX, or are a PhD student or above.
The submissions should be difficult for non-experts to answer and “not easily answerable via a quick online search”, and trick questions should be avoided.
“As a rule of thumb, if a randomly selected undergraduate can understand what is being asked, it is likely too easy for the frontier LLMs of today and tomorrow,” said the quiz creators.
People who submit successful questions will be invited as co-authors on the paper and have a chance to win money from a $500,000 (£378,400) prize pool, with the writers of the best questions earning $5,000 (£3,780) each.
Questions should be submitted by 1 November.