Paper List

Tag: activation_steering

2 items with this tag.

  • Apr 17, 2026

    ActAdd: Steering Language Models With Activation Engineering

    • activation_steering
    • actadd
    • linear_representations
  • Apr 15, 2026

    Universal Steering & Monitoring: Toward universal steering and monitoring of AI models

    • activation_steering
    • monitoring
    • concept_vectors

Created with Quartz v4.5.1 © 2026

  • GitHub