Как доказать, что грамматика однозначна?

25

Моя проблема в том, как я могу доказать, что грамматика однозначна? У меня есть следующие грамматики:

S \to s t a t e m e n t ∣ if e x p r e s s i o n then S ∣ if e x p r e s s i o n then S else S

$S → statement ∣ \mbox{if } expression \mbox{ then } S ∣ \mbox{if } expression \mbox{ then } S \mbox{ else } S$

и сделать это к однозначной грамматике, я думаю, что это правильно:

$S → S_1 ∣ S_2$
$S_1 → \mbox{if } expression \mbox{ then } S ∣ \mbox{if } expression \mbox{ then } S_2 \mbox{ else } S_1$
$S_2 → \mbox{if } expression \mbox{ then } S_2 \mbox{ else } S_2 ∣ statement$

I know that a unambiguous grammar has one parse tree for every term.

— user1594
источник

20

There is (at least) one way to prove unambiguity of a grammar $G = (N,T,\delta,S)$ for language $L$ . It consists of two steps:

Prove $L \subseteq \mathcal{L}(G)$ .
Prove $[z^n]S_G(z) = |L_n|$ .

The first step is pretty clear: show that the grammar generates (at least) the words you want, that is correctness.

The second step shows that $G$ has as many syntax trees for words of length $n$ as $L$ has words of length $n$ -- with 1. this implies unambiguity. It uses the structure function of $G$ which goes back to Chomsky and Schützenberger [1], namely

$\qquad \displaystyle S_G(z) = \sum_{n=0}^\infty t_nz^n$

with $t_n = [z^n]S_G(z)$ the number of syntax trees $G$ has for words of length $n$ . Of course you need to have $|L_n|$ for this to work.

The nice thing is that $S_G$ is (usually) easy to obtain for context-free languages, although finding a closed form for $t_n$ can be difficult. Transform $G$ into an equation system of functions with one variable per nonterminal:

$\qquad \displaystyle \left[ A(z) = \sum\limits_{(A, a_0 \dots a_k) \in \delta} \ \prod\limits_{i=0}^{k} \ \tau(a_i)\ : A \in N \right] \text{ with } \tau(a) = \begin{cases} a(z) &, a \in N \\ z &, a \in T \\ \end{cases}.$

This may look daunting but is really only a syntactical transformation as will become clear in the example. The idea is that generated terminal symbols are counted in the exponent of $z$ and because the system has the same form as $G$ , $z^n$ occurs as often in the sum as $n$ terminals can be generated by $G$ . Check Kuich [2] for details.

Solving this equation system (computer algebra!) yields $S(z) = S_G(z)$ ; now you "only" have to pull the coefficient (in closed, general form). The TCS Cheat Sheet and computer algebra can often do so.

Example

Consider the simple grammar $G$ with rules

$\qquad \displaystyle S \to aSa \mid bSb \mid \varepsilon$ .

It is clear that $\mathcal{L}(G) = \{ww^R \mid w \in \{a,b\}^*\}$ (step 1, proof by induction). There are $2^{\frac{n}{2}}$ palindromes of length $n$ if $n$ is even, $0$ otherwise.

Setting up the equation system yields

$\qquad \displaystyle S(z) = 2z^2S(z) + 1$

whose solution is

$\qquad \displaystyle S_G(z) = \frac{1}{1-2z^2}$ .

The coefficients of $S_G$ coincide with the numbers of palindromes, so $G$ is unambiguous.

The Algebraic Theory of Context-Free Languages by Chomsky, Schützenberger (1963)
On the entropy of context-free languages by Kuich (1970)

— Raphael
источник

3

As you know @Raphael, ambiguity is not decidable, so at least one of your steps cannot be mechanised. Any idea which ones? Getting a closed form for

t_{n}

$t_n$ ?

— Martin Berger

2

The equation system may not be solvable algorithmically if the degree is too high, and pulling the exact coefficients out of the generating functions can be (too) hard. In "practice", though, one often deals with grammars of small "degree" -- note that, say, Chomsky normal form leads to equation systems of small degree -- and there are methods to get at least

\sim

$\sim$ -asymptotics for the coefficients; this may be sufficient to establish ambiguity. Note that in order to prove unambiguity, showing

S_{L} (z) = S_{G} (z)

$S_L(z) = S_G(z)$ without pulling coefficients is enough; proving this identity may be hard, though.

— Raphael

Thank you @Raphael. Do you know of any texts that develop in detail how undecidability comes into play even if one uses e.g. Chomsky normal form? (I can't get hold of Kuich.)

— Martin Berger

@MartinBerger I just rediscovered your comment in my todo list; sorry for the long silence. There are three steps which (I think) are not computable in general: 1) Determine

S_{G}

$S_G$ . 2) Compute

| L_{n} |

$|L_n|$ . 3) Determine

[z^{n}] S_{g} (z)

$[z^n]S_g(z)$ . In particular, what representation of

L

$L$ to use for 2)?

— Raphael

Why is representation of

L

$L$ a problem? We can use any of the multiple ways of representing CFGs for compilers for example. Maybe you mean how to represent

L_{n}

$L_n$ ?

— Martin Berger

6

This is a good question, but some Googling would have told you that there is no general method for deciding ambiguity, so you need to make your question more specific.

— reinierpost
источник

2

The OP asks for proof techniques, not algorithms.

— Raphael

I think so, too; it might be mentioned in the question.

— reinierpost

1

Google is not an oracle of truth, because knowlede is not democratic, and Google results are. I wouldn't count on Google in this case, because people often copy-cat one from another without checking the correctness of what they copy. Without showing a proof, they might be wrong.

— SasQ

5

@SasQ: You read my words too literally. What Google gives me is the URLs to aticles that explain things.

— reinierpost

4

For some grammars, a proof by induction (over word length) is possible.

Consider for example a grammar $G$ over $\Sigma = \{a,b\}$ given by the following rules:

$\qquad \displaystyle S \to aSa \mid bSb \mid \varepsilon$

All words of length $\leq 1$ in $L(G)$ -- there's only $\varepsilon$ -- have only one left-derivation.

Assume that all words of length $\leq n$ for some $n \in \mathbb{N}$ have only one left-derivation.

Now consider arbitrary $w = w_1 w' w_n \in L(G) \cap \Sigma^n$ for some $n > 0$ . Clearly, $w_1 \in \Sigma$ . If $w_1 = a$ , we know that the first rule in every left-derivation has to be $S \to aSa$ ; if $w_1 = b$ , it has to be $S \to bSb$ . This covers all cases. By induction hypothesis, we know that there is exactly one left-derivation for $w'$ . In combination, we conclude that there is exactly one left-derivation for $w$ as well.

This becomes harder if

there are multiple non-terminals,
the grammar is not linear, and/or
the grammar is left-recursive.

It may help to strengthen the claim to all sentential forms (if the grammar has no unproductive non-terminals) and "root" non-terminals.

I think the conversion to Greibach normal form maintains (un)ambiguity, to applying this step first may take care of left-recursion nicely.

The key is to identify one feature of every word that fixes (at least) one derivation step. The rest follows inductively.

— Raphael
источник

3

Basically, it's a child generation problem. Start with the first expression, and generate it's children .... Keep doing it recursively (DFS), and after quite a few iterations, see if you can generate the same expanded expression from two different children. If you are able to do that, it's ambiguous. There is no way to determine the running time of this algorithm though. Assume it's safe, after maybe generating 30 levels of children :) (Of course it could bomb on the 31st)

— Karthik Kumar Viswanathan
источник

1

The OP asks for proof techniques, not algorithms.

— Raphael

2

that can't possibly be a way to prove if a grammar is ambiguous or not. As a matter of fact when that bombing happens is undecidable.

— Sнаđошƒаӽ