Most people don’t know anything beyond ChatGPT and Copilot.
If we are talking programmers, maybe include claude, gemini, deepseek and perplexity search, though this is not always true.
…Point being, OpenAI does have a short term ‘default’ and known brand advantage, unfortunately.
That being said, there’s absolutely manipulation of LLMs, though not what OP is thinking persay. I see more of:
Benchmaxxing with a huge sycophancy bias (which works particularly well in LM Arena).
Benchmaxxing with massive thinking blocks, which is what OP is getting at. I’ve found Qwen is particularly prone to this, and it does drive up costs.
Token laziness from some of OpenAI’s older models, as if they were trained to give short responses to save GPU time.
“Deep Frying” models for narrow tasks (coding, GPQA style trivia, math, things like that) but making them worse outside of that, especially at long context.
…Straight up cheating by training on benchmark test sets.
Safety training to a ridiculous extent with stuff like Microsoft Phi, OpenAI, Claude, and such, for political reasons and to avoid bad PR.
In addition, ‘free’ chat UIs are geared for gathering data they can use to train on.
You’re right that there isn’t much like ad injection or deliberate token padding yet, but still.
Most people don’t know anything beyond ChatGPT and Copilot.
If we are talking programmers, maybe include claude, gemini, deepseek and perplexity search, though this is not always true.
…Point being, OpenAI does have a short term ‘default’ and known brand advantage, unfortunately.
That being said, there’s absolutely manipulation of LLMs, though not what OP is thinking persay. I see more of:
Benchmaxxing with a huge sycophancy bias (which works particularly well in LM Arena).
Benchmaxxing with massive thinking blocks, which is what OP is getting at. I’ve found Qwen is particularly prone to this, and it does drive up costs.
Token laziness from some of OpenAI’s older models, as if they were trained to give short responses to save GPU time.
“Deep Frying” models for narrow tasks (coding, GPQA style trivia, math, things like that) but making them worse outside of that, especially at long context.
…Straight up cheating by training on benchmark test sets.
Safety training to a ridiculous extent with stuff like Microsoft Phi, OpenAI, Claude, and such, for political reasons and to avoid bad PR.
In addition, ‘free’ chat UIs are geared for gathering data they can use to train on.
You’re right that there isn’t much like ad injection or deliberate token padding yet, but still.