Steering interpretable language models with concept algebra

by luulinh90s | View on Hacker News