Because the goal is to produce output that looks like native English text, the models are trained to assign high probabilities to existing text samples, and evaluated based on how well they predict other (previously unseen) samples. Which, for a language model, is a fine objective function. It will favor models that produce syntactically correct text, use common idioms over semantically similar but uncommon phrases, don't shift topics too often, etc. Some level of actual understanding does exist in these models2, but it's on the level of knowing that two words or phrases have similar meanings, or that certain parts of a paragraph relate to each other. There is understanding, but no capacity for reasoning.
Trying to use a language model to generate code is like trying to use a submarine to fly to the moon. That's not what it's for; why are you trying to use it for that? Stop doing that. But at the same time, arguing that the submarine is bad at flying is rather missing the point. Nobody who actually understands NLP is claiming otherwise.23
There do exist systems that are designed to produce code, and trained to optimize correctness. (e.g. Genetic Programming). That's a bit too far outside my area of expertise for me to make any claims as to where the state of the art is on those, so I'm not sure whether answers generated by them should be allowed or not. But if you were to use an AI tool to generate code, that's the sort of thing you should be looking at; they're designed for the task. Similarly, you could ask if language models could be used as a tool to edit questions you've written by hand, perhaps to check the grammar or recommend new ways to phrase answers so they flow better. They'd be fairly good at that sort of thing (probably. I haven't used any of those tools myself (the rambling, stream-of-consciousness answer might have given that away), but the math supports the idea that they should work34). Translation is another task where (similar) systems work fairly well. (Machine translations still aren't perfect, but they're much better than they were 10 years ago, and improvement in language models is a big part of that.) Just always be aware of what tool you're using, and whether it's the right one for the job.
2 Where "understands" is shorthand for "encodes the information in such a way that it can condition its decisions (i.e. probability distribution functions) upon it"
3 Well, not many. There'll always be a few who get caught up in the hype. They shouldn't.
34 If trained on well-written text