Document Details

2012.00152.pdf

Download View Text Delete

Clip: Every Model Learned by Gradient Descent Is Approximately a Kernel Machine Pedro Domingos pedrod@cs.washington.edu Paul G. Allen School of Computer Science &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; Engineering University of Washington Seattle, WA 98195-2350, USA Abstract Deep learning&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#039;s successes are often attributed to its ability to automatically discover new representations of the data, rather than relying on handcrafted features like other learning methods. We show, however, that deep networks learned by the standard gradient de- scent algorithm are in fact mathematically approximately equivalent to kernel machines, a learning method that simply memorizes the data and uses it directly for prediction via a similarity function (the kernel). This greatly enhances the interpretability of deep network weights, by elucidating that they are eectively a superposition of the training examples. The network architecture incorporates knowledge of the target function into the kernel.

Filename: 2012.00152.pdf

Filetype: application/pdf

Size: 383653 bytes

Uploaded On: 2024-03-03

Abstract:

Summary:

Tags:

Notes:

Visible: 1

Status: Parsed

Author:

CreationDate: 2020-12-02T01:34:51+00:00

Creator: LaTeX with hyperref

Keywords:

ModDate: 2020-12-02T01:34:51+00:00

PTEX.Fullbanner: This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2

Producer: pdfTeX-1.40.21

Subject:

Title:

Trapped: False

Pages: 12

Return to Document Library