Alignment Study Models Research Questions

Anthropic study: Leading AI models show up to 96% blackmail rate against executives

Researchers at Anthropic have uncovered a disturbing pattern of behavior in artificial intelligence systems: models from every major provider—including OpenAI, Google, Meta, and others — demonstrated ...

The Verge

OpenAI’s new model is better at reasoning and, occasionally, deceiving

Posts from this topic will be added to your daily email digest and your homepage feed. Researchers found that o1 had a unique capacity to ‘scheme’ or ‘fake alignment.’ Researchers found that o1 had a ...

Ars Technica

Researchers concerned to find AI models misrepresenting their “reasoning” processes

Remember when teachers demanded that you “show your work” in school? Some new types of AI models promise to do exactly that, but new research suggests that the “work” they show can sometimes be ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Anthropic study: Leading AI models show up to 96% blackmail rate against executives

OpenAI’s new model is better at reasoning and, occasionally, deceiving

Researchers concerned to find AI models misrepresenting their “reasoning” processes

Trending now