Una introducción a los algoritmos del Parsing
Pregunta inicial… ¿Cómo se puede determinar si un código escrito en un lenguaje de programación tiene sintaxis correcta?
Leftmost Derivations Una derivación a izquierda de una cadena sólo permite resolver en cada paso la variable más a la izquierda aaBA => aaBa NO hace parte de una derivación a izquiereda. Si w está en L(G) entonces w admite una derivación a izquierda (Teorema 4.1.1).
Ambiguedad Una gramática G es ambigua si existe w en L(G) que admite dos derivaciones a izquierda. s aS | Sa | a Es ambigua porque aa admite dos derivaciones a izquierda: S => aS =>aaS => Sa =>aa Un Lenguaje es inherentemente ambiguo, si todas las gramáticas que lo generan son ambiguas.
Ejemplo s bS | Sb | a Genera el lenguaje b*ab*. Es ambigua porque bab admite dos derivaciones a izquierda: S => bS =>bSb=>babS => Sb =>bSb=>bab b*ab* se puede generar por las gramáticas no ambiguas: S bS | aA A bA | S bS | A A Ab | a Existe una correspondencia biyectiva entre los árboles de derivación y las derivaciones a izquierda (a derecha).
Grafo de una gramática. S aS | bB | B aB | bS | bC C aC | S aS bB aaS abB abaB bbS bbC aaaS aabB aa abaB abbS abbC baaB babSbabC bbaS bbbB bb bbaC bb
Recorrido transversal descendente
EJEMPLO EJEMPLO Dada la gramática: AE: V = {S, A, T} Σ = {b, +, (, )} Σ = {b, +, (, )} P: 1. S → A P: 1. S → A 2. A → T 2. A → T 3. A → A + T 3. A → A + T 4. T → b 4. T → b 5. T → (A) 5. T → (A) analizar la cadena (b + b)
(b) (b) b (T)((A)) b (T)((A)) (b+T) ( b+b) (b+T) ( b+b) T(T+T)((A)+T) T(T+T)((A)+T) (A) (A+T) (A+T+T)(T+T+T) S A (A+T+T+T) S A (A+T+T+T) b+T b+T(b)+T T+T(A)+T(T)+T((A))+T A+T (A+T)+T (T+T)+T A+T (A+T)+T (T+T)+T(A+T+T)+Tb+T+T A+T+TT+T+T(A)+T+T(T)+T+T (A+T)+T+Tb+T+T+T A+T+T+TT+T+T+T(A)+T+T+T 1A+T+T+T+TT+T+T+T+T A+T+T+T+T+T (A),T+T,A+T+T T+T,A+T+T,(T) T+T,A+T+T,(T),(A+T)
Breadth-First Top-down Parsing Algorithm input: context-free grammar G = (V, Σ, P, S) string p Σ* string p Σ* queue Q queue Q 1. initialize T with root S INSERT(S, Q) INSERT(S, Q) 2. repeat 2.1. q ≔ REMOVE(Q) 2.1. q ≔ REMOVE(Q) 2.2. i ≔ i ≔ done ≔ false 2.3. done ≔ false Let q = uAv where A is the leftmost variable in q. Let q = uAv where A is the leftmost variable in q repeat 2.4. repeat if there is no A rule numbered greater than i then done ≔ true if there is no A rule numbered greater than i then done ≔ true if not done then if not done then Let A → w be the first A rule with number grater than i. Let j Let A → w be the first A rule with number grater than i. Let j be the number of this rule. be the number of this rule if uwv ∉ Σ* and the terminal prefix or uwv matches a if uwv ∉ Σ* and the terminal prefix or uwv matches a prefix of p then prefix of p then INSERT(uwv, Q) INSERT(uwv, Q) Add node uwv to T. Set a pointer from Add node uwv to T. Set a pointer from uwv to q. uwv to q. end if end if i ≔ j i ≔ j until done or p = uwv until done or p = uwv until EMPTY( Q) or p = uwv until EMPTY( Q) or p = uwv 3. if p = uwv then accept else reject
Recorrido descendente profundo
AE: V = {S, A, T} Σ = {b, +, (, )} Σ = {b, +, (, )} P: 1. S → A P: 1. S → A 2. A → T 2. A → T 3. A → A + T 3. A → A + T 4. T → b 4. T → b 5. T → (A) 5. T → (A) p= (b + b) S A [S,1] T [A,2] b [T,4] (A) [T,5] (T) [(A),2] [(T),4] (b) ((A)) [(T),5] (A+T) [(A),3] [(A+T),2] (T+T) (b+T) [(T+T),4] (b+b) [(b+T),4] (b+b)
Depth-First Top-down Algorithm input: context-free grammar G = (V, Σ, P, S) string p Σ* string p Σ* stack S 1.PUSH( [S, 0], S) 2.repeat 2.1 [q, i] = POP(S) 2.1 [q, i] = POP(S) 2.2dead-end = false 2.3repeat Let q = uAv where A is the leftmost variable in q. Let q = uAv where A is the leftmost variable in q if u is not a prefix of p then dead-end = true if there are no A rules numbered greater i then dead-end = true if not dead-end then Let A → w be the first A rule with number greater than i. Let A → w be the first A rule with number greater than i. Let j be the number of this rule PUSH( [q, j], S) q = uwv i = 0 end if until dead-end or q Σ * until q = p or EMPTY(S) 3.if q = p then accept else reject
AE: V = {S, A, T} Σ = {b, +, (, )} Σ = {b, +, (, )} P: 1. S → A P: 1. S → A 2. A → T 2. A → T 3. A → A + T 3. A → A + T 4. T → b 4. T → b 5. T → (A) 5. T → (A) p= (b )+ b S A [S,1] T [A,2] b [T,4] (A) [T,5] (T) [(A),2] [(T),4] (b) ((A)) [(T),5] (A+T) [(A),3] [(A+T),2] (T+T) (b+T) [(T+T),4] ((A)+T) [(T+T),5] [(A+T),3] (A+T+T) [(A+T+T),2]
Algoritmos Ascendentes Reducción: Dado w encontrar las w’ tales que w’=>w. En este caso w’ es una reducción de w. Pattern Matching Scheme: Se descompone w en w=uv, los sufijos de u se comparan con los lados derechos de las reglas. Un “matching” se obtiene cuando se encuentra u=u 1 q y una regla A q entonces w se reduce a u 1 Av.
Reducción de (A+T) u v Regla Reducción (A+T) ( A +T) S A (S+T) ( A+ T) ( A+T ) A A+T (A) (A+T ) A T (A+A) ( A+T)
Algoritmo ascendente transversal AE: V = {S, A, T} Σ = {b, +, (, )} Σ = {b, +, (, )} P: 1. S → A P: 1. S → A 2. A → T 2. A → T 3. A → A + T 3. A → A + T 4. T → b 4. T → b 5. T → (A) 5. T → (A) p= (b + b) (b+b) (b+T) (T+b) (A+b) (T+T) (b+A) (A+b), (T+T), (b+A) (S+b) (T+T), (b+A), (S+b) (A+T) (T+T), (b+A), (S+b), (A+T) (T+A) (b+A), (S+b), (A+T), (T+A) (b+S) (S+b), (A+T), (T+A), (b+S) (A+T), (T+A), (b+S), (S+T) (A+A) (T+A), (b+S), (S+T), (A+A),(A) (A) (T+S) (b+S), (S+T), (A+A),(A), (T+S) (S+T), (A+A),(A), (T+S) (A+S) (A+A),(A), (T+S), (S+A) (S+T) (S+A) (A), (T+S), (S+A),(A+S) (S) T (T+S), (S+A), (A+S), (S), T (S+A), (A+S), (S), T (S+S) (A+S), (S), T, (S+S) T, (S+S) A (S+S), A A S S
Breadth-First Bottom-up Parser Input: context-free grammar G = (V, Σ, P, S) string p Σ* queue Q 1.Initialize T with root p INSERT(p,Q) 2.repeat q ≔ REMOVE(Q) q ≔ REMOVE(Q) 2.1. for each rule A → w in P do 2.1. for each rule A → w in P do if q = uwv with v Σ* then if q = uwv with v Σ* then INSERT(uAv, Q) Add node uAv to T. Set a pointer from uAv to q. end if end for until q = S or EMPTY(Q) 3.If q = S then accept else reject
(T+b) AE: V = {S, A, T} Σ = {b, +, (, )} Σ = {b, +, (, )} P: 1. S → A P: 1. S → A 2. A → T 2. A → T 3. A → A + T 3. A → A + T 4. T → b 4. T → b 5. T → (A) 5. T → (A) (A+b) (T+T) [ (T, 2, +b) ] [ (T+b, 4, ) ] (T+A) (A+T) [ (A+b, 2,) ] (T+A) (A+b)
Depth-Bottom-up Parsing Algorithm input: context-free grammar G = (V, Σ, P, S) with nonrecursive start symbol string p Σ* stack S 1. PUSH([λ, 0, p], S) 2. repeat 2.1 [u, i, v] ≔ POP(S) 2.1 [u, i, v] ≔ POP(S) 2.2 dead-end ≔ false 2.2 dead-end ≔ false 2.3 repeat Find the first j > i with rule number j that satisfies Find the first j > i with rule number j that satisfies i) A → w with u = qw and A ≠ S or i) A → w with u = qw and A ≠ S or ii) S → w with u = w and v = λ ii) S → w with u = w and v = λ if there is such a j then if there is such a j then PUSH ([u, j, v], S) PUSH ([u, j, v], S) u ≔ qA u ≔ qA i ≔ i ≔ 0 end if end if if there is no such j and v ≠ λ then if there is no such j and v ≠ λ then shift(u, v) shift(u, v) i ≔ i ≔ 0 end if end if if there is no such j and v = λ then dead-end ≔ true if there is no such j and v = λ then dead-end ≔ true until (u = S) or dead-end until (u = S) or dead-end until (u = S) or EMPTY(S) until (u = S) or EMPTY(S) 3. if EMPTY(S) then reject else accept
AE: V = {S, A, T} Σ = {b, +, (, )} Σ = {b, +, (, )} P: 1. S → A P: 1. S → A 2. A → T 2. A → T 3. A → A + T 3. A → A + T 4. T → b 4. T → b 5. T → (A) 5. T → (A) u i v 0 (b+b) [, 0, (b+b) ] ( 0 b+b) (b 0 +b) [ (b, 4, +b) ] (T 0 +b) [ (T, 2, +b) ] (A 0 +b) (A+ 0 b) (A+b 0 ) [ (A+b, 4, ) ] (A+T 0 ) [ (A+T, 2, ) ] (A+A 0 ) (A+A) 0 (A+T 2 ) [ (A+T, 3, ) ] (A 0 ) (A ) 0 [ (A), 5, ] T 0 [ T, 2, ] A 0 [ A, 2, ] S 0
Notas Bibliográficas Ambigüedad: Floyd[1962], Cantor[1962], Chomsky and Schutzenberger [1963]. Lenguajes Inherentemente ambiguos: Harrison[1978], Ginsburg and Ullian[1966]. Depth-first: Dennig, Dennis and Qualitz[1978]. Referencia Clásica: Knuth: “The Art of Computer programing: Vol I Fundamental Algorithms”