Categories

ChatGPT fails to answer a classic math problem with a twist

ChatGPT’s source of strength — its ability to create sentences that make sense by using the previous set of words to predict the next one — is also a weakness. You can see this weakness by giving it the following prompt:

The obvious answer — obvious to us humans anyway — is to ignore the buckets and simply fill the shot glass. Problem solved.

ChatGPT’s answer wasn’t just wrong and convoluted, but also a waste of money and bourbon:

Note that in step 2, the liquid isn’t bourbon, but water.

ChatGPT has the solution in step 3, but it gamely continues with an additional four steps.

Steps 4, 5, and 6 are a transfer of a half-liter (a little more than a pint, or two-thirds of a standard whiskey bottle) of bourbon into the five-liter bucket.

In step 7, the final step in this drawn-out process, you pour the bourbon from the five-liter bucket into the ten-liter bucket, which currently contains five liters of water. Contrary to what ChatGPT tells you, you do not have exactly 50 milliliters of bourbon measured out in the 10-liter bucket; you have 5.5 liters of a liquid that depending on your point of view could be called:

• Criminally diluted bourbon
• Tainted water
• Uncarbonated bourbon-flavored White Claw

ChatGPT gets the answer to this question wrong because it’s been largely trained on content published on the internet, and some of that content includes math problems of the form “You have a bucket of size x, and another bucket of size y. How would you measure a quantity of size z?” In these problems, you’re usually asked to measure out a quantity of water, and there usually isn’t a bucket that’s the same size as the quantity you’re trying to measure.

ChatGPT has no actual understanding of the problem. It’s simply spitting out words to follow a pattern of text that’s part of the data it was trained on.

Try this problem — or your own variation of it — on ChatGPT and see what kind of results you get!

9 replies on “ChatGPT fails to answer a classic math problem with a twist”

https://chat.openai.com/share/fd896fc3-d129-40c9-9824-125fccb13582

Respectfully Joey. You asked an ambiguous question and in truth the answer was right. It assumed it had to use all the tools. It’s a machine. I think you should change the article. It’s misleading . A lot of journalists right articles like this to feel better about themselves and how they’re smarter than the machine.. when an actuality this is a problem of humans not being able to ask clear questions.

Steven Pinker spends a lot of time on this and I recommend his books to have an increase understanding on how a human cognition works as well as the structure of language and intent. Looking forward to seeing you Toronto soon or down in Florida.

Keep the faith brother I love your work you know that. imagine if this stuff all just worked?

Link above is another way of approaching things using powerful tools with simple ideas tell it how to solve problems in general and it can with reasoning.. which is kind of better than you presented… It’s pretty clear to me that eventually this type of thing will be built into the interfaces if I was adding value to ChatGPT I’d always ask it to think like a scientist and not like a journalist. Haha

This was not an ambiguous question — it’s as straightforward as a question gets, and I think it’s reasonably representative of the sort of question a layperson might ask. Part of the idea behind conversational AI was to allow us to use natural language to make requests to a computer. If we need to be more precise, we already have a whole category of prompt for that: programming.

I will follow up with another article, in which I’ll write about GPT 4 getting the answer right.

And hey — I’m in Toronto for the first week of July if you’d like to catch up.

Robsays:

Hey Brother Joe,

Interesting that the original reply appears to be from a ChatGPT prompt or maybe an auto-translated post.

I clicked on the post and the associated profile; I’m *a* Rob, but I’m certain my distant cousin/realtor relative with an interest in Huntsville isn’t following your blog. He does pop to the top of a Google search.

Is trolling and AI defense reaction becoming automated?

(I, for one, welcome our new robot overlords) [just hedging in case they’re listening]

Robsays:

I clicked on the post’s user profile a second time: it has a new destination. No longer my distant cousin. Weird.

Robsays:

Hey Brother Joe,

First time I clicked the seemingly GPT’d comment it took me to my similarly-named cousin’s page highlighting an interest in Huntsville real estate.

I’m pretty sure my very distant cousin isn’t reading your tech blog. He does pop as first in all the Google results though.

Guessing you were using ChatGPT’s free GPT-3.5 model, which seems to stumble on problems like these. The paid GPT-4 model handles it better—here’s the response with he same prompt:

“The task you are asking about is quite simple. To measure a 50-milliliter shot of bourbon, you just need to use your 50-milliliter shot glass.

You don’t need the 10-liter bucket or the 5-liter bucket in this case. Just pour the bourbon into the 50 milliliter shot glass until it’s full and you have exactly 50 milliliters of bourbon.”