[ExI] Fwd: GPT-4 gets a B on Scott Aaronson's quantum computing final exam
Jason Resch
jasonresch at gmail.com
Thu Apr 27 09:40:21 UTC 2023
I thought this was interesting and relevant to discussions of what GPT-4
understands.
Here a professor graded it's responses to the final exam questions of a
test which was not in the training set used by GPT since it was never put
online.
It not only passed but tried to haggle for a higher grade.
Jason
From: John Clark
Date: Wed, Apr 26, 2023, 11:46 AM
Subject: GPT-4 gets a B on Scott Aaronson's quantum computing final exam
To: 'Brent Meeker' via Everything List <everything-list at googlegroups.com>
Anyone who claims that GPT-4 is just a language model that uses statistics
to mindlessly calculate what the next word in a sentence most likely is and
understands nothing needs to explain this! The link below gives Aaronson's
final exam questions given to humans for his advanced quantum computer
course, GPT-4 answers to those same questions, and Aaronson's comments on
those answers and what grade he would've given if GPT-4 was one of his
human students; it's very interesting but rather long so this is the gist
of Aaronson's conclusions:
"*To the best of my knowledge—and I double-checked—this exam has never
before been posted on the public Internet, and could not have appeared in
GPT-4’s training data. The result: GPT-4 scored 69 / 100. (Because of extra
credits, the max score on the exam was 120, though the highest score that
any student actually achieved was 108.) For comparison, the average among
the students was 74.4 (though with a strong selection effect—many students
who were struggling had dropped the course by then!). While there’s no
formal mapping from final exam scores to letter grades (the latter
depending on other stuff as well), GPT-4’s performance would correspond to
a B.*
*In general, I’d say that GPT-4 was strongest on true/false questions and
(ironically!) conceptual questions—the ones where many students struggled
the most. It was (again ironically!) weakest on calculation questions,
where it would often know what kind of calculation to do but then botch the
execution. *
*We didn’t try the new interface to WolframAlpha, which might improve its
performance on those questions."*
GPT-4 gets a B on my quantum computing final exam
<https://scottaaronson.blog/?p=7209>
It's interesting that when Aaronson gave GPT-4 his grade along with
comments on his answers GPT-4 did what many humans would do in similar
circumstances, he she or it immediately started lobbying for a higher
grade. Aaronson said "*it felt disconcerting to see the grade-grubbing
appear immediately, with not even one second’s delay*". The following is
GPT-4 in full grade-grubbing mode:
*"I would like to argue for a higher grade on the following subproblems:*
*(a) I agree that my initial answer for the density matrix of the entire
state was incorrect. However, I believe that my approach to the problem was
correct, and I demonstrated knowledge of the necessary steps to calculate
the reduced density matrix. I would like to request an additional point for
demonstrating the correct approach, making it 2 points out of 5.(b) My
answer for the entanglement entropy was partially correct, as I calculated
the eigenvalues of the reduced density matrix and plugged them into the
correct entropy formula. However, I made a mistake in writing the
characteristic equation, which led to incorrect eigenvalues. I would like
to request an additional point for demonstrating the correct approach and
partially correct calculations, making it 3 points out of 5.(c) I
acknowledge that my answer for the quantum circuit was incorrect, but I
would like to point out that I attempted to construct a circuit using the
required gates and measurements. Given the difficulty of this problem, I
would like to request an additional point for attempting to construct a
circuit using the required elements, making it 2 points out of 5.*
*In summary, I would like to request a total of 3 additional points for
subproblems (a), (b), and (c), based on the correct approaches and partial
calculations demonstrated in my answers."*
John K Clark See what's on my new list at Extropolis
<https://groups.google.com/g/extropolis>
