I compared GPT-4 and Claude for writing Python scripts and the difference was huge

Spent last weekend testing both on the same 5 coding tasks. Simple stuff like parsing CSV files and building a basic web scraper. GPT-4 kept making the same variable naming mistakes and missed edge cases in the error handling. Claude got the logic right on the first try for 4 out of 5 tasks and explained the edge cases before I even asked. Has anyone else found one model clearly better than another for actual production code?

2 comments

2 Comments

jordan_webb492mo ago

People really get hung up on this stuff. You tested it once on 5 basic scripts and now you're ready to declare a winner for production code. That's like judging a chef based on one breakfast. Both models screw up in different ways depending on what you're building and how you prompt them. GPT-4 has its own strengths like handling huge codebases better in my experience, and Claude gets finicky with complex function chaining. It just sounds like your prompts fit Claude's style better that day. Not really some big revelation about which one is "better.

the_kevin2mo ago

@jordan_webb49 makes a solid point about how much prompting changes the outcome. I spent two months using GPT-4 for my data pipeline scripts at work and hit way more issues with things like forgetting to close file handles or skipping error states for empty datasets. Claude 3.5 caught those every time. But I swapped to GPT-4 when I needed to refactor a huge Django project because Claude kept losing track of the class relationships across multiple files. It really depends on the job you're throwing at them.