I really don’t care about IQ tests; ChatGPT does not perform at a human level. I’ve spent hours with it. Sometimes it does come off like a human with an IQ of about 83, all concentrated in verbal skills. Sometimes it sounds like a human with a much higher IQ than that (and a bunch of naive prejudices). But if you take it out of its comfort zone and try to get it to think, it sounds more like a human with profound brain damage. You can take it step by step through a chain of simple inferences, and still have it give an obviously wrong, pattern-matched answer at the end. I wish I’d saved what it told me about cooking and neutrons. Let’s just say it became clear that it did was not using an actual model of the physical world to generate its answers.
Other examples are cherry picked. Having prompted DALL-E and Stable Diffusion quite a bit, I’m pretty convinced those drawings are heavily cherry picked; normally you get a few that match your prompt, plus a bunch of stuff that doesn’t really meet the specs, not to mention a bit of eldritch horror. That doesn’t happen if you ask a human to draw something, not even if it’s a small child. And you don’t have to iterate on the prompt so much with a human, either.
Competitive coding is a cherry-picked problem, as easy as a coding challenge gets… the tasks are tightly bounded, described in terms that almost amount to code themselves, and come with comprehensive test cases. On the other hand, “coding assistants” are out there annoying people by throwing really dumb bugs into their output (which is just close enough to right that you might miss those bugs on a quick glance and really get yourself into trouble).
Comments are closed.