Finding the differences can be done by looking at the models. Let's look at sparse coding first.
Разреженное кодирование
Разреженное кодирование минимизирует цель
где W - матрица базисов, H - матрица кодов, а X - матрица данных, которые мы хотим представить. λ реализует обмен между редкостью и реконструкцией. Обратите внимание, что если нам дано H , то оценка W проста по методу наименьших квадратов.
Lsc=||WH−X||22reconstruction term+λ||H||1sparsity term
WXλHW
In the beginning, we do not have H however. Yet, many algorithms exist that can solve the objective above with respect to H. Actually, this is how we do inference: we need to solve an optimisation problem if we want to know the h belonging to an unseen x.
Auto encoders
D(d(e(x;θr);θd),x)
but we will go along with a much simpler one for now:
Lae=||Wσ(WTX)−X||2
where
σ is a nonlinear function such as the logistic sigmoid
σ(x)=11+exp(−x).
Similarities
Note that Lsc looks almost like Lae once we set H=σ(WTX). The difference of both is that i) auto encoders do not encourage sparsity in their general form ii) an autoencoder uses a model for finding the codes, while sparse coding does so by means of optimisation.
For natural image data, regularized auto encoders and sparse coding tend to yield very similar W. However, auto encoders are much more efficient and are easily generalized to much more complicated models. E.g. the decoder can be highly nonlinear, e.g. a deep neural network. Furthermore, one is not tied to the squared loss (on which the estimation of W for Lsc depends.)
Also, the different methods of regularisation yield representations with different characteristica. Denoising auto encoders have also been shown to be equivalent to a certain form of RBMs etc.
But why?
If you want to solve a prediction problem, you will not need auto encoders unless you have only little labeled data and a lot of unlabeled data. Then you will generally be better of to train a deep auto encoder and put a linear SVM on top instead of training a deep neural net.
However, they are very powerful models for capturing characteristica of distributions. This is vague, but research turning this into hard statistical facts is currently conducted. Deep latent Gaussian models aka Variational Auto encoders or generative stochastic networks are pretty interesting ways of obtaining auto encoders which provably estimate the underlying data distribution.