Document Details


2406.09325v1
Download View Text Delete
Clip: REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space Tomer Ashuach Martin Tutek Yonatan Belinkov Technion – Israel Institute of Technology {tomerashuach,martin.tutek,belinkov}@campus.technion.ac.il Abstract Large language models (LLMs) risk inadvertently memorizing and divulging sensitive or personally identifiable information (PII) seen in training data, causing privacy concerns. Current approaches to address this issue involve costly dataset scrubbing, or model filtering through unlearning and model editing, which can be bypassed through extraction attacks. We propose REVS, a novel model editing method for unlearning sensitive information from LLMs. REVS identifies and modifies a small subset of neurons relevant for each piece of sensitive information. By projecting these neurons to the vocabulary space (unembedding), we pinpoint the components driving its generation. We then compute a model edit based on the
Filename: 2406.09325v1
Filetype: application/pdf
Size: 1476779 bytes
Uploaded On: 2024-06-16
Abstract:
Summary:
Tags:
Notes:
Visible: 1
Status: Parsed
Author:
CreationDate: 2024-06-14T00:59:09+00:00
Creator: LaTeX with hyperref
Keywords:
ModDate: 2024-06-14T00:59:09+00:00
PTEX.Fullbanner: This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5
Producer: pdfTeX-1.40.25
Subject:
Title:
Trapped: False
Pages: 18

Return to Document Library