Каковы различия между разреженным кодированием и автоэнкодером?

35

Разреженное кодирование определяется как изучение слишком полного набора базовых векторов для представления входных векторов (<- зачем нам это нужно). Каковы различия между разреженным кодированием и автоэнкодером? Когда мы будем использовать разреженное кодирование и автоэнкодер?

— RockTheStar
источник

1

Разреженное кодирование фактически представляет собой определенный тип автоэнкодера, известный как разреженный автоэнкодер . Таким образом, вы можете рассматривать разреженное кодирование как подмножество авто-кодеров, если хотите.

— Здравствуйте, до свидания,

34

Finding the differences can be done by looking at the models. Let's look at sparse coding first.

Разреженное кодирование

Разреженное кодирование минимизирует цель где - матрица базисов, H - матрица кодов, а - матрица данных, которые мы хотим представить. реализует обмен между редкостью и реконструкцией. Обратите внимание, что если нам дано , то оценка проста по методу наименьших квадратов.

L_{sc} = \underset{reconstruction term}{\underset{⏟}{| | W H - X | |_{2}^{2}}} + \underset{sparsity term}{\underset{⏟}{λ | | H | |_{1}}}

$\mathcal{L}_{\text{sc}} = \underbrace{||WH - X||_2^2}_{\text{reconstruction term}} + \underbrace{\lambda ||H||_1}_{\text{sparsity term}}$

W

$W$

X

$X$

λ

$\lambda$

H

$H$

W

$W$

In the beginning, we do not have $H$ however. Yet, many algorithms exist that can solve the objective above with respect to $H$ . Actually, this is how we do inference: we need to solve an optimisation problem if we want to know the $h$ belonging to an unseen $x$ .

Auto encoders

D (d (e (x; θ^{r}); θ^{d}), x)

$D(d(e(x;\theta^r); \theta^d), x)$ but we will go along with a much simpler one for now:

L_{ae} = | | W σ (W^{T} X) - X | |^{2}

$\mathcal{L}_{\text{ae}} = ||W\sigma(W^TX) - X||^2$ where

σ

$\sigma$ is a nonlinear function such as the logistic sigmoid

σ (x) = \frac{1}{1 + \exp (- x)}

$\sigma(x) = {1 \over 1 + \exp(-x)}$ .

Similarities

Note that $\mathcal{L}_{sc}$ looks almost like $\mathcal{L}_{ae}$ once we set $H = \sigma(W^TX)$ . The difference of both is that i) auto encoders do not encourage sparsity in their general form ii) an autoencoder uses a model for finding the codes, while sparse coding does so by means of optimisation.

For natural image data, regularized auto encoders and sparse coding tend to yield very similar $W$ . However, auto encoders are much more efficient and are easily generalized to much more complicated models. E.g. the decoder can be highly nonlinear, e.g. a deep neural network. Furthermore, one is not tied to the squared loss (on which the estimation of $W$ for $\mathcal{L}_{sc}$ depends.)

Also, the different methods of regularisation yield representations with different characteristica. Denoising auto encoders have also been shown to be equivalent to a certain form of RBMs etc.

But why?

If you want to solve a prediction problem, you will not need auto encoders unless you have only little labeled data and a lot of unlabeled data. Then you will generally be better of to train a deep auto encoder and put a linear SVM on top instead of training a deep neural net.

However, they are very powerful models for capturing characteristica of distributions. This is vague, but research turning this into hard statistical facts is currently conducted. Deep latent Gaussian models aka Variational Auto encoders or generative stochastic networks are pretty interesting ways of obtaining auto encoders which provably estimate the underlying data distribution.

— bayerj
источник

Thanks for your answer! So is that mean sparse coding should not be used in any case, but autoencoder? Also, there should be an extra term in sparse coding which regularize W?

— RockTheStar

There is no general rule like that. SC has one benefit over AEs: encoding via optimisation can be very powerful.

— bayerj

Sorry, can you elaborate on this?

— RockTheStar

Having a fixed map that has been estimated in order to follow some constraint (here: sparse result) is less powerful than having an optimiser that tries to find a solution like that over possibly many iterations.

— bayerj

1

Sorry to bring this question up again. I think Autoencoder can encourage sparsity as well, i.e. sparse autoencoder.

— RockTheStar

11

In neuroscience the term Neural Coding is used to refer to the patterns of electrical activity of neurons induced by a stimulus. Sparse Coding by its turn is one kind of pattern. A code is said to be sparse when a stimulus (like an image) provokes the activation of just a relatively small number of neurons, that combined represent it in a sparse way. In machine learning the same optimization constraint used to create a sparse code model can be used to implement Sparse Autoencoders, which are regular autoencoders trained with a sparsity constraint. Bellow more detailed explanations for each of your questions are given.

Sparse coding is defined as learning an over-complete set of basis vectors to represent input vectors (<-- why do we want this)

First, at least since (Hubel & Wiesel, 1968) it's known that in the V1 region there are specific cells which respond maximally to edge-like stimulus (besides having others "useful" properties). Sparse Coding is a model which explains well many of the observed characteristics of this system. See (Olshausen & Field, 1996) for more details.

Second, it's being shown that the model which describes sparse coding is a useful technique for feature extraction in Machine Learning and yields good results in transfer learning tasks. Raina et al. (2007) showed that a set of "basis vectors" (features, as pen-strokes and edges) learned with a training set composed of hand-written characters improves classification in a hand-written digits recognition task. Later Sparse Coding based models has been used to train "deep" networks, stacking layers of sparse feature detectors to create a "sparse deep belief net" (Lee et al., 2007). More recently astonishing results in image recognition was achieved using sparse coding based models to construct a network with several layers (the famous "Google Brain"), which was capable of distinguish a image of a cat in a purely unsupervised manner (Le et al., 2013).

Third, it's probably possible to use the learned basis to perform compression. A haven't seen anyone really doing it though.

What are the difference between sparse coding and autoencoder?

An autoencoder is a model which tries to reconstruct its input, usually using some sort of constraint. Accordingly to Wikipedia it "is an artificial neural network used for learning efficient codings". There's nothing in autoencoder's definition requiring sparsity. Sparse coding based contraints is one of the available techniques, but there are others, for example Denoising Autoencoders, Contractive Autoencoders and RBMs. All makes the network learn good representations of the input (that are also commonly "sparse").

When will we use sparse coding and autoencoder?

You're probably interested in using an auto-encoder for feature extraction and/or pre-training of deep networks. If you implement an autoencoder with the sparsity constraint, you'll be using both.

— Saul Berardo
источник

This answer has many interesting points and important references. However, the first paragraph is wrong. Sparse coding and sparse auto encoders are different beasts.

— bayerj

Where is it stated that they are the same thing? Please, tell me and I'll correct the answer.

— Saul Berardo

In the first sentence.

— bayerj

"Sparse Coding is just one of the available techniques for training autoencoders". This sentence doesn't define "Sparse Coding" as the same "beast" as autoencoder. It says that, between all available techniques for training autoencoders, one of them is "Sparse Coding". I agree that the sentence has indeed some ambiguity, which I believe is clarified by the rest of the answer.

— Saul Berardo

You say that sparse coding is a method to train auto encoders. That is clearly not the case, as auto encoders have an explicit decoder that is not implemented with an optimisation algorithm. Given an arbitrary auto encoder you cannot train it with sparse coding.

— bayerj

1

A sparse coder is kind of like half an auto-encoder. An auto-encoder works like:

input  =>  neural net layer  =>  hidden outputs => neural net layer => output

For back-propagation, the error signal, the loss, is: input - output

If we apply a sparsity constraint on the hidden outputs, then most will be zeros, and a few will be 1s. Then the second layer is essentially a set of linear basis functions, that are added together, according to which of the hidden outputs are 1s.

In sparse coding, we only have the second half of this:

                                codes => neural net layer => output

The 'codes' is a bunch of real numbers, selecting for the basis functions represented by the weights in the neural net layer. Since in Olshausen's paper, they are applying a sparsity constraint to the codes, the codes are, just as in the sparse auto-encoder, sparse: mostly zeros with a few ones.

The difference we can see now clearly: for the sparse coding, there is no first half of the neural network: the codes are not provided for us automatically by a neural net.

How do we get the codes in sparse coding? We have to optimize ourselves, which we do by using gradient descent or similar, to find the set of codes that best provides output matching the input image. We have to do this for every image, including for every test image, each time.

— Hugh Perkins
источник

0

You might want to read this recent paper, https://arxiv.org/abs/1708.03735v2 on precisely this same topic. In this paper the authors show that indeed one can set up an autoencoder such that the ground truth dictionary is a critical point of that autoencoder's squared loss function.

— gradstudent
источник