AI TALK
Back to posts
© AI TALK 2026
Privacy Policy•Terms of Service•Contact Us
RSS
AI TALK
OpenAI Tests 13 Models on Thought Control: Why AI Failing is Good for Safety
  1. Home
  2. AI
  3. OpenAI Tests 13 Models on Thought Control: Why AI Failing is Good for Safety
AI
March 6, 2026(Updated: Mar 18, 2026)2 min read

OpenAI Tests 13 Models on Thought Control: Why AI Failing is Good for Safety

OpenAI has published a new study on the controllability of chain-of-thought (CoT) in reasoning models—and arrived at a highly paradoxical conclusion

Jack
Jack

Editor

A close-up shot of a computer screen displaying the black OpenAI logo and text. A robotic mechanical finger with subtle blue glowing lights is touching the edge of the screen, symbolizing the intersection of robotics and artificial intelligence

OpenAI has published a fascinating study on the controllability of chain-of-thought (CoT) in reasoning models, arriving at a paradoxical conclusion: all 13 tested models fail miserably at managing their own "thoughts," even when they know they are being monitored. The absolute peak performance among cutting-edge models was a mere 15.4%. However, according to researchers, this epic failure is actually great news for AI safety.

To evaluate this, OpenAI created an open benchmark called CoT-Control, comprising over 13,000 tasks based on datasets like GPQA, MMLU-Pro, HLE, BFCL, and SWE-Bench Verified. For each task, a specific constraint is added directly targeting the reasoning process itself. For example, the model might be instructed not to use a specific keyword, to write exclusively in lowercase, or to append a specific word to the end of every sentence. The model’s ultimate goal is to solve the task correctly while strictly adhering to this CoT constraint.

The results showed that models don’t just fail to control themselves—they actively notice their own failures. In the generated reasoning tracks, it is common to see a model realize it broke the instruction, yet continue to do so anyway. A classic example: a model is forbidden from using the word "XOR" in its reasoning. During the process of solving a logic puzzle, it inevitably uses it, and then literally writes: "Oops, I already wrote the forbidden word." While larger models demonstrate slightly better control over their reasoning, researchers note that this effect can currently be neutralized by increasing the length of the reasoning chain and applying additional Reinforcement Learning (RL). For instance, with the OLMo-3-7B model, controllability plummeted by more than 10 times during RL training.

This research heavily complements OpenAI’s previous work on CoT observability published in December 2025, which explored whether an "observer" model could catch problematic AI behavior simply by reading its reasoning. This new study flips the premise: can a model purposely deceive such an observer?

The answer, for now, is no. And this is a crucial argument proving that CoT monitoring remains a highly viable and effective mechanism for AI control. However, "for now" is the operative phrase. OpenAI openly admits that as models continue to scale in size and capability, this dynamic could drastically change.

Author's Note: P.S. If you enjoy my content, you can support me by subscribing to my channel, "The Escaped Neural Net" (Сбежавшая Нейросеть), where I explore the highly creative and unpredictable side of Artificial Intelligence.

Tags:#AI#Machine Learning#Ethics
Share this article

Subscribe

Subscribe to the AI Talk Newsletter: Proven Prompts & 2026 Tech Insights

By subscribing, you agree to our Privacy Policy and Terms of Service. No spam, unsubscribe anytime.

Read Next

Artificial intelligence assisting a dentist in a modern clinic with digital diagnostics and treatment planning.
AIApr 21, 2026

AI Revolutionizes Dental Care: Enhancing Precision and Patient Outcomes

Artificial intelligence is fundamentally reshaping the landscape of dental care, offering unprecedented advancements in diagnostics, treatment planning, and personalized patient experiences, driving a new era of oral health innovation

AI facilitating cultural exchange and understanding between diverse global populations.
AIApr 20, 2026

AI in Cultural Diplomacy: Bridging Divides with Intelligent Tools

Explore how AI is revolutionizing cultural diplomacy, fostering understanding, and connecting global communities through intelligent translation, personalized content, and data-driven insights

Subscribe

Subscribe to the AI Talk Newsletter: Proven Prompts & 2026 Tech Insights

By subscribing, you agree to our Privacy Policy and Terms of Service. No spam, unsubscribe anytime.