A Personal Journal of Learning and Discovery

Home

❯

20251116104113⁝ Safety Alignment in LLM

20251116104113⁝ Safety Alignment in LLM

02 kwi 20261 min. czytania

  • llm
  • safety
  • alignment

https://github.com/p-e-w/heretic

interesting idea to remove censorship

we propose a novel white-box jailbreak method that surgically disables refusal with minimal effect on other capabilities.

https://arxiv.org/abs/2406.11717

See also 20251113123357b⁝ LLM for adjacent model notes.


Browse

  • Archive
  • Browse topics
  • Posts

Topics

All topics
  • ai66
  • llm31
  • philosophy24
  • software-engineering16
  • books13
  • model13
  • architecture11
  • ai-tools9

Recent notes

View all
  • 100a1a⁝ Books - Java Concurrency

    02 kwi 2026

  • 100a1⁝ Concurrency in Java

    02 kwi 2026

  • 100a⁝ Java

    02 kwi 2026

  • 100b1⁝ Rust Books

    02 kwi 2026

  • 100b⁝ Rust

    02 kwi 2026

Random note

Odnośniki zwrotne

  • A Personal Journal of Learning and Discovery

Graf

  • GitHub
  • LinkedIn