В чем суть теоремы Байеса?

36

Каковы основные идеи, то есть понятия, связанные с теоремой Байеса ? Я не прошу каких-либо выводов сложных математических обозначений.

probability bayesian theory

— user333
источник

4

Связанный: stats.stackexchange.com/questions/22/…

— Шейн

3

Я также хочу предложить эту ссылку как своего рода объяснение низкого уровня: yudkowsky.net/rational/bayes

— steffen

1

Теорема Байеса может быть путаницей без визуального представления - как это часто бывает в математике. Почему бы не использовать квадраты вероятности или деревья вероятностей для байесовских вероятностей? Когда поступают новые данные, они закрывают часть пробного пространства (например, тест на положительный результат на болезнь выключается на отрицательный тест). Тогда выборочное пространство становится лишь подмножеством вероятностей, которые, возможно, тестируются положительно, и каждый рассматривает только это. Трудность, с которой я столкнулся, заключается в применении байесовского распределения к вероятностным распределениям вместо дискретных вероятностей. Математика ужасно ужасна!

22

Теорема Байеса является относительно простым, но фундаментальным результатом теории вероятностей, который позволяет вычислять некоторые условные вероятности. Условные вероятности - это как раз те вероятности, которые отражают влияние одного события на вероятность другого.

Проще говоря, в своей самой известной форме он утверждает, что вероятность гипотезы с учетом новых данных ( P (H | D) ; называемая апостериорной вероятностью) равна следующему уравнению: вероятность наблюдаемых данных с учетом гипотезы ( P (D | H) ; называется условной вероятностью), умноженная на вероятность истинности теории до появления новых доказательств ( P (H) ; называется предварительной вероятностью H), деленная на вероятность увидеть эти данные, период ( P (D ); называется предельной вероятностью D).

Формально уравнение выглядит так:

$alt text$

Значимость теоремы Байеса во многом обусловлена тем, что ее правильное использование является предметом спора между школами мысли о вероятности. Для субъективного байесовского (который интерпретирует вероятность как субъективные степени веры) теорема Байеса обеспечивает краеугольный камень для проверки теории, выбора теории и других практик, включив их субъективные вероятностные суждения в уравнение и работая с ним. Для тех, кто часто (который интерпретирует вероятность как ограничение относительных частот ), это использование теоремы Байеса является злоупотреблением, и вместо этого они стремятся использовать значимые (не субъективные) априоры (как это делают объективные байесовцы при еще одной интерпретации вероятности).

— Джон Л. Тейлор
источник

1

хороший ответ У меня есть небольшая болтовня: использование слов «субъективный» и «объективный» не совсем уместно, потому что никакие методы не являются «объективными». Я бы сказал, что более частые и «объективные» байесовцы просто получают распределение вероятностей, используя определенные правила или стандарты. Таким образом, вместо того, чтобы тейлоринг для конкретного конкретного случая, частый / объективный байесовский будет применять выбор «по умолчанию» (таким образом, скрывая свою субъективность).

— вероятностная

Если вы измеряете что-то реальное (скажем, рост детей в возрасте 6 лет), то что такое P (D)? Это PDF данных? В этом случае вы просто рассчитываете заднюю точку, например, так:

?

P (x | H | D) = \frac{P (x | D | H) P (x | H)}{P (x | D)}

$P(x|H|D) = \frac{P(x|D|H)P(x|H)}{P(x|D)}$

— naught101

13

Извините, но здесь, кажется, есть некоторая путаница: теорема Байеса не подходит для обсуждения бесконечного Байеса. частотной дискуссии. Это теорема, которая согласуется с обеими школами мысли (учитывая, что она согласуется с аксиомами вероятности Колмогорова).

Конечно, теорема Байеса является ядром байесовской статистики, но сама теорема универсальна. Столкновение между частыми лицами и байесовцами в основном относится к тому, как можно определить предыдущие распределения или нет.

Итак, если речь идет о теореме Байеса (а не о байесовской статистике):

Теорема Байеса определяет, как можно вычислить конкретные условные вероятности. Представьте себе, например, что вы знаете: вероятность того, что у кого-то будет симптом A, если у него болезнь X p (A | X); вероятность того, что у кого-то вообще есть заболевание X p (X); вероятность того, что у кого-то вообще есть симптом A p (A). С помощью этих 3 частей информации вы можете рассчитать вероятность того, что у кого-то будет заболевание X, учитывая, что у него есть симптом A p (X | A).

— Дэйв Келлен
источник

1

I disagree in part with your initial paragraph because the questions asks about the concept of Bayes theorem. The Frequentist-Bayesian debate is relevant to this part of the question. The Kolmogorov axioms do not give Bayes theorem the same conceptual importance as the "probability as extended logic" axioms do.

— probabilityislogic

8

Bayes' theorem is a way to rotate a conditional probability $P(A|B)$ to another conditional probability $P(B|A)$ .

A stumbling block for some is the meaning of $P(B|A)$ . This is a way to reduce the space of possible events by considering only those events where $A$ definitely happens (or is true). So for instance the probability that a thrown, fair, dice lands showing six, $P(\mbox{dice lands six})$ , is 1/6, however the probability that a dice lands six given that it landed an even number, $P(\mbox{dice lands six}|\mbox{dice lands even})$ , is 1/3.

You can derive Bayes' theorem yourself as follows. Start with the ratio definition of a conditional probability:

$P(B|A) = \frac{P(AB)}{P(A)}$

where $P(AB)$ is the joint probability of $A$ and $B$ and $P(A)$ is the marginal probability of $A$ .

Currently the formula makes no reference to $P(A|B)$ , so let's write down the definition of this too:

$P(A|B) = \frac{P(BA)}{P(B)}$

The little trick for making this work is seeing that $P(AB) = P(BA)$ (since a Boolean algebra is underneath all of this, you can easily prove this with a truth table by showing $AB = BA$ ), so we can write:

$P(A|B) = \frac{P(AB)}{P(B)}$

Now to slot this into the formula for $P(B|A)$ , just rewrite the formula above so $P(AB)$ is on the left:

$P(AB) = P(A|B)P(B)$

and hey presto:

$P(B|A) = \frac{P(A|B)P(B)}{P(A)}$

As for what the point is to rotating a conditional probability in this way, consider the common example of trying to infer the probability that someone has a disease given that they have a symptom, i.e., we know that they have a symptom - we can just see it - but we cannot be certain whether they have a disease and have to infer it. I'll start with the formula and work back.

$P(\mbox{disease}|\mbox{symptom}) = \frac{P(\mbox{symptom}|\mbox{disease})P(\mbox{disease})}{P(\mbox{symptom})}$

So to work it out, you need to know the prior probability of the symptom, the prior probability of the disease (i.e., how common or rare are the symptom and disease) and also the probability that someone has a symptom given we know someone has a disease (e.g., via expensive time consuming lab tests).

It can get a lot more complicated than this, e.g., if you have multiple diseases and symptoms, but the idea is the same. Even more generally, Bayes' theorem often makes an appearance if you have a probability theory of relationships between causes (e.g., diseases) and effects (e.g., symptoms) and you need to reason backwards (e.g., you see some symptoms from which you want to infer the underlying disease).

— AndyF
источник

5

There are two main schools of thought is Statistics: frequentist and Bayesian.

Bayes theorem is to do with the latter and can be seen as a way of understanding how the probability that a theory is true is affected by a new piece of evidence. This is known as conditional probability. You might want to look at this to get a handle on the math.

— Tony Breyal
источник

4

Let me give you a very very intuitional insight. Suppose you are tossing a coin 10 times and you get 8 heads and 2 tails. The question that would come to your mind is whether this coin is biased towards heads or not.

Now if you go by conventional definitions or the frequentist approach of probability you might say that the coin is unbiased and this is an exceptional occurrence. Hence you would conclude that the possibility of getting a head next toss is also 50%.

But suppose you are a Bayesian. You would actually think that since you have got exceptionally high number of heads, the coin has a bias towards the head side. There are methods to calculate this possible bias. You would calculate them and then when you toss the coin next time, you would definitely call a heads.

So, Bayesian probability is about the belief that you develop based on the data you observe. I hope that was simple enough.

— htrahdis
источник

Of course, there is more data in a coin toss than just the result - A sensible bayesian will still probably bet even, because of the weight of past data, and because the coin and coin flip looks fair. Unless, perhaps, you can't see the coin, or the coin being flipped. In which case you don't even know if the data isn't just forged to start with, and you may as well toss your priors out the window...

— naught101

3

Bayes' theorem relates two ideas: probability and likelihood. Probability says: given this model, these are the outcomes. So: given a fair coin, I'll get heads 50% of the time. Likelihood says: given these outcomes, this is what we can say about the model. So: if you toss a coin 100 times and get 88 heads (to pick up on a previous example and make it more extreme), then the likelihood that the fair coin model is correct is not so high.

One of the standard examples used to illustrate Bayes' theorem is the idea of testing for a disease: if you take a test that's 95% accurate for a disease that 1 in 10000 of the population have, and you test positive, what are the chances that you have the disease?

The naive answer is 95%, but this ignores the issue that 5% of the tests on 9999 out of 10000 people will give a false positive. So your odds of having the disease are far lower than 95%.

My use of the vague phrase "what are the chances" is deliberate. To use the probability/likelihood language: the probability that the test is accurate is 95%, but what you want to know is the likelihood that you have the disease.

Slightly off topic: The other classic example which Bayes theorem is used to solve in all the textbooks is the Monty Hall problem: You're on a quiz show. There is a prize behind one of three doors. You choose door one. The host opens door three to reveal no prize. Should you change to door two given the chance?

I like the rewording of the question (courtesy of the reference below): you're on a quiz show. There is a prize behind one of a million doors. You choose door one. The host opens all the other doors except door 104632 to reveal no prize. Should you change to door 104632?

My favourite book which discusses Bayes' theorem, very much from the Bayesian perspective, is "Information Theory, Inference and Learning Algorithms ", by David J. C. MacKay. It's a Cambridge University Press book, ISBN-13: 9780521642989. My answer is (I hope) a distillation of the kind of discussions made in the book. (Usual rules apply: I have no affiliations with the author, I just like the book).

3

Bayes theorem in its most obvious form is simply a re-statement of two things:

the joint probability is symmetric in its arguments $P(HD|I)=P(DH|I)$
the product rule $P(HD|I)=P(H|I)P(D|HI)$

So by using the symmetry:

P (H D | I) = P (H | I) P (D | H I) = P (D | I) P (H | D I)

$P(HD|I)=P(H|I)P(D|HI)=P(D|I)P(H|DI)$

Now if $P(D|I) \neq 0$ you can divide both sides by $P(D|I)$ to get:

P (H | D I) = P (H | I) \frac{P (D | H I)}{P (D | I)}

$P(H|DI)=P(H|I)\frac{P(D|HI)}{P(D|I)}$

So this is it? How can something so simple be so awesome? As with most things "its the journey that's more important than the destination". Bayes theorem rocks because of the arguments that lead to it.

What is missing from this is that the product rule and sum rule $P(H|I)=1-P(\overline{H}|I)$ , can be derived using deductive logic based on axioms of consistent reasoning.

Now the "rule" in deductive logic is that if you have a relationship "A implies B" then you also have "Not B implies Not A". So we have "consistent reasoning implies Bayes theorem". This means "Not Bayes theorem implies Not consistent reasoning". i.e. if your result isn't equivalent to a Bayesian result for some prior and likelihood then you are reasoning inconsistently.

This result is called Cox's theorem and was proved in "Algebra of Probable inference" in the 1940's. A more recent derivation is given in Proability theory: The logic of science.

— probabilityislogic
источник

2

I really like Kevin Murphy's intro the to Bayes Theorem http://www.cs.ubc.ca/~murphyk/Bayes/bayesrule.html

The quote here is from an economist article:

http://www.cs.ubc.ca/~murphyk/Bayes/economist.html

The essence of the Bayesian approach is to provide a mathematical rule explaining how you should change your existing beliefs in the light of new evidence. In other words, it allows scientists to combine new data with their existing knowledge or expertise. The canonical example is to imagine that a precocious newborn observes his first sunset, and wonders whether the sun will rise again or not. He assigns equal prior probabilities to both possible outcomes, and represents this by placing one white and one black marble into a bag. The following day, when the sun rises, the child places another white marble in the bag. The probability that a marble plucked randomly from the bag will be white (ie, the child's degree of belief in future sunrises) has thus gone from a half to two-thirds. After sunrise the next day, the child adds another white marble, and the probability (and thus the degree of belief) goes from two-thirds to three-quarters. And so on. Gradually, the initial belief that the sun is just as likely as not to rise each morning is modified to become a near-certainty that the sun will always rise.

— kgarten
источник