Good read on model safety, but doesn't feel easy when you put it side by side with Anthropic’s report on CoT faithfulness —not just because CoT monitorability is fragile, but also because efforts to make CoT more faithful didn’t really move the needle. And then there’s Coconut (continuous latent space reasoning), which doesn’t give human-readable CoT at all. Seems like some reductionist approaches—like the deeper behavioral analysis Goodfire does—are still essential