Towards Test-Time Refusals via Concept Negation

Publication
In the 37th Annual Conference on Neural Information Processing Systems (NeurIPS) (CCF-A)