Second Brain

Tag: interpretability

3 items with this tag.

  • Feb 04, 2025

    Layer by Layer - Uncovering Hidden Representations in Language Models

    • interpretability
  • Jun 17, 2024

    Refusal in Language Models Is Mediated by a Single Direction

    • transformers
    • mechinterp
    • interpretability
  • Jan 12, 2023

    Progress measures for grokking via mechanistic interpretability

    • interpretability
    • mechinterp

Created with Quartz v4.5.2 © 2025