A Personal Journal of Learning and Discovery

Home

❯

20251116104113⁝ Safety Alignment in LLM

20251116104113⁝ Safety Alignment in LLM

16 lis 20251 min. czytania

  • llm
  • safety
  • alignment

https://github.com/p-e-w/heretic

interesting idea to remove censorship

we propose a novel white-box jailbreak method that surgically disables refusal with minimal effect on other capabilities.

https://arxiv.org/abs/2406.11717

See also 20251113123357b⁝ LLM for adjacent model notes.


Browse

  • Archive
  • Browse topics
  • Posts

Topics

All topics
  • ai68
  • llm33
  • philosophy24
  • software-engineering16
  • books13
  • model13
  • architecture11
  • ai-tools9

Recent notes

View all
  • 42ag1⁝ AI do legitymizacji decyzji przykład

    17 kwi 2026

  • 42ag⁝ Trendslop AI

    17 kwi 2026

  • 63a⁝ Cyberbunker dalsza historia miejsca

    17 kwi 2026

  • 63⁝ Cybercrime

    17 kwi 2026

  • 42a5b⁝ Ekonomia Tokenów

    16 kwi 2026

Random note

Odnośniki zwrotne

  • A Personal Journal of Learning and Discovery

Graf

  • GitHub
  • LinkedIn