La descarga está en progreso. Por favor, espere

La descarga está en progreso. Por favor, espere

An algorithm for suffix stripping

Presentaciones similares


Presentación del tema: "An algorithm for suffix stripping"— Transcripción de la presentación:

1 An algorithm for suffix stripping
Algoritmo de Porter An algorithm for suffix stripping Escuela Nacional de Antropología e Historia (ENAH) Agosto – diciembre de 2015

2 Objetivo Mejores representaciones de documentos CONNECT CONNECTED
CONNECTING CONNECT CONNECTION CONNECTIONS

3 Objetivo Mejores representaciones de documentos CONNECT CONNECTED
CONNECTING CONNECTION CONNECTIONS D1 2 5 4 CONNECT D1 15

4 Consideraciones Se busca mejorar los programas de computadora y no hacer análisis lingüístico CONNECTION vs CONNECTIONS RELATE vs RELATIVITY

5 Algoritmo Lista de sufijos Lista de reglas

6 Algoritmo c = consonante = {A, E, I, O, U, YC} TOY  T, Y
SYZYGY  S, Z, G v = vocal = ~consonante V = {vvv…} C = {ccc…}

7 Algoritmo (VC) {m} .- VC repetido m veces
m .- medida (measure) de una palabra [C](VC){m}[V] Ejemplos: m=0 TR, EE, TREE, Y, BY. m=1 TROUBLE, OATS, TREES, IVY. m=2 TROUBLES, PRIVATE, OATEN, ORRERY.

8 Algoritmo Reglas (condition) S1 -> S2
This means that if a word ends with the suffix S1, and the stem before S1 satisfies the given condition, S1 is replaced by S2. Ejemplo: (m > 1) EMENT -> Here S1 is `EMENT' and S2 is null. This would map REPLACEMENT to REPLAC, since REPLAC is a word part for which m = 2.

9 Algoritmo Condiciones
*S - the stem ends with S (and similarly for the other letters). *v* - the stem contains a vowel. *d - the stem ends with a double consonant (e.g. - TT, -SS). *o - the stem ends cvc, where the second c is not W, X or Y (e.g. -WIL, -HOP).

10 Algoritmo Condiciones
(m>1 and (*S or *T)) : stem with m>1 ending in S or T (*d and not (*L or *S or *Z)): stem ending witha double consonant other than L, S or Z

11 Algoritmo Secuencias SSES -> SS IES -> I SS -> SS S ->
CARESSES maps to CARESS since SSES is the longest match for S1. Equally CARESS maps to CARESS (S1=`SS') and CARES to CARE (S1=`S').

12 Fin


Descargar ppt "An algorithm for suffix stripping"

Presentaciones similares


Anuncios Google