LDA против word2vec

39

Я пытаюсь понять, в чем сходство скрытого распределения Дирихле и word2vec для вычисления сходства слов.

Как я понимаю, LDA отображает слова в вектор вероятностей скрытых тем, в то время как word2vec отображает их в вектор действительных чисел (относительно разложения по сингулярным точкам поточечной взаимной информации, см. О. Леви, Ю. Голдберг, «Нейронное вложение слов» как неявная матричная факторизация " ; см. также Как работает word2vec? ).

Меня интересуют как теоретические отношения (можно ли считать обобщением или вариацией другого), так и практические (когда использовать одно, а не другое).

Связанный:

Каковы некоторые стандартные способы вычисления расстояния между документами? - DataScience.SE

— Петр Мигдаль
источник

Я нашел эту презентацию на месте: slideshare.net/ChristopherMoody3/…

— Петр Мигдаль

Вы должны посмотреть на Doc2vec (он же параграф 2vec). Векторы документа суммируют документ вместо слов.

— sachinruk

19

Ответ на модели Темы и методы совместного использования слов охватывает разницу (пропустить грамматику word2vec - это сжатие точечной взаимной информации (PMI) ).

Так:

ни один метод не является обобщением другого,
word2vec позволяет нам использовать векторную геометрию (например, аналогию слов, например, $v_{king} - v_{man} + v_{woman} \approx v_{queen}$ , я написал обзор word2vec )
LDA видит более высокие корреляции, чем двухэлементный,
LDA дает понятные темы.

Некоторая разница обсуждается на слайдах word2vec, LDA, и представляет новый гибридный алгоритм: lda2vec - Кристофер Муди .

— Петр Мигдаль
источник

1

Я РИСКОВАННОЕ заявление «LDA дает интерпретируемую тему» , чтобы сказать , что темы LDA являются потенциально интерпретированы. Идея LDA о «теме» - это чисто математическая конструкция, которая не всегда соответствует тому, что человек считает темой.

— Уэйн

Ключевая концепция, которую вы пропустили, заключается в том, что LDA использует подход «мешок слов», поэтому он знает только о вхождениях в документе, в то время как word2vec (или, более точно, doc2vec) учитывает контекст слова.

— Уэйн

13

Два алгоритма немного различаются по своему назначению.

LDA нацелено главным образом на описание документов и коллекций документов путем назначения им тематических рассылок, которым, как вы упоминаете, назначены раздачи слов.

word2vec стремится внедрить слова в векторное пространство со скрытым фактором, идея, возникшая из распределенных представлений Bengio et al. Он также может быть использован для описания документов, но не предназначен для этой задачи.

— Бар
источник

1

Теоретически можно получить нечто аналогичное векторным вложениям word2vec, вычислив P (topic | word) из LDA, но, как сказал @Bar, эти модели были разработаны для различных задач. Если бы вы сравнили P (topic | word) LDA с векторными вложениями word2vec, я сомневаюсь, что они будут очень похожи. LDA фиксирует ассоциации на уровне документов, а word2vec - очень локальные.

— Зубин

4

There is a relation between LDA and $\bf {Topic2Vec}$ , a model used for learning Distributed Topic Representations $\bf together\ with$ Word Representations. LDA is used to construct a log-likelihood for CBOW and Skip-gram. The following explanation is inside the section 3 of the work Topic2Vec: Learning Distributed Representations of Topics:

"When training, given a word-topic sequence of a document $D=\{w_1 : z_1, ...,w_M : z_M \}$ , where $z_i$ is the word $w_i$ 's topic inferred from LDA, the learning objective functions can be deﬁned to maximize the following log-likelihoods, based on CBOW and Skip-gram, respectively."

L_{C B O W} (D) = \frac{1}{M} \sum_{i = 1}^{M} (\log p (w_{i} | w_{e x t}) + \log p (z_{i} | w_{e x t}))

$\mathcal{L}_{CBOW}(D) = \frac1M \sum^{M}_{i=1}(\log p(w_i|w_{ext}) + \log p(z_i|w_{ext}))$

L_{S k i p - g r a m} (D) = \frac{1}{M} \sum_{i = 1}^{M} \sum_{- k \leq c \leq k, c \neq 0} (\log p (w_{i + c} | w_{i}) + \log p (w_{i + c} | z_{i}))

$\mathcal{L}_{Skip-gram}(D)= \frac1M \sum^{M}_{i=1}\sum_{-k\le c\le k,c\neq0}(\log p(w_{i+c}|w_i) + \log p(w_{i+c}|z_i))$

In section 4.2, the authors explain: " topics and words are equally represented as the low-dimensional vectors, we can IMMEDIATELY CALCULATE THE $\bf {COSINE\ SIMILARITY}$ between words and topics. For each topic, we select higher similarity words".

Moreover, you wil find inside that work some phrases like:

"probability is not the best choice for feature representation"

and

"LDA prefers to describe the statistical relationship of occurrences rather than real semantic information embedded in words, topics and documents"

which will help you understanding better the different models.

— Ricardo S.
источник

2

Other answers here cover the technical differences between those two algorithms, however I think the core difference is their purpose: Those two algorithms were designed to do different things:

word2vec ultimately yields a mapping between words and a fixed length vector. If we were to compare it with another well known approach, it would make more sense to do so using another tool that was designed for the same intend, like the Bag of Words (BOW model). This one does the same but lacks some desired features of word2vec like using the order of words and assigning semantic meaning to the distances between word representations.

LDA on the other hand creates a mapping from a varied length document to a vector. This document can be a sentence, paragraph or full text file but it is not a single word. It would make more sense to compare it with doc2vec that does the same job and is introduced by Tomas Mikolov here (the author uses the term paragraph vectors). Or with LSI for that matter.

So to directly answer your two questions:

None of them is a generalization or variation of the other
Use LDA to map a document to a fixed length vector. You can then use this vector in a traditional ML algorithm like a classifier that accepts a document and predicts a sentimental label for example.
Use word2vec to map a word to a fixed length vector. You can similarly use these vectors to feed ML models were the input are words, for example when developing an auto-completer that feeds on previous words and attempts to predict the next one.

— pilu
источник

1

From a practical standpoint...

LDA starts with a bag-of-words input which considers what words co-occur in documents, but does not pay attention to the immediate context of words. This means the words can appear anywhere in the document and in any order, which strips out a certain level of information. By contrast word2vec is all about the context in which a word is used -- though perhaps not exact order.

LDA's "topics" are a mathematical construct and you shouldn't confuse them with actual human topics. You can end up with topics that have no human interpretation -- they're more like artifacts of the process than actual topics -- and you can end up with topics at different levels of abstraction, including topics that basically cover the same human topic. It's a bit like reading tea leaves.

I've found LDA useful to explore data, but not so useful for providing a solution, but your mileage may vary.

Word2vec doesn't create topics directly at all. It projects words into a high-dimensional space based on similar usage, so it can have its own surprises in terms of words that you think of as distinct -- or even opposite -- may be near each other in space.

You can use either to determine if words are "similar". With LDA: do the words have similar weights in the same topics. With word2vec: are they close (by some measure) in the embedding space.

You can use either to determine if documents are similar. With LDA, you would look for a similar mixture of topics, and with word2vec you would do something like adding up the vectors of the words of the document. ("Document" could be a sentence, paragraph, page, or an entire document.) Doc2vec is a modified version of word2vec that allows the direct comparison of documents.

While LDA throws away some contextual information with its bag-of-words approach, it does have topics (or "topics"), which word2vec doesn't have. So it's straightforward to use doc2vec to say, "Show me documents that are similar to this one", while with LDA it's straightforward to say, "Show me documents where topic A is prominent." (Again, knowing that "topic A" emerges from a mathematical process on your documents and you then figure out what human topic(s) it mostly corresponds to.)

— Wayne
источник