Descargar la presentación
La descarga está en progreso. Por favor, espere
Publicada porGuilermo Tovar Modificado hace 11 años
1
Experimentos preliminares de verificación de locutores con una base de datos realista
José Antonio Rubio García , José Manuel Pardo Muñoz, Ricardo de Córdoba Herralde, Javier Macías Guarasa Grupo de Tecnología del Habla, Dpto. de Ingeniería Electrónica, E.T. S. de Ingenieros de Telecomunicación, U. Politécnica de Madrid
2
Indice Introducción Base de datos Sistema base Mejoras Conclusiones
In order to be able to read a text in a natural way, the words in the sentences have to be grammatically classified, using both dictionaries and linguistic rules. There are two Set of rules designed for this task. The first one takes into account the endings and the beginnings of the words, for trying to guess the syntactic function of the word. Due to the size of the dictionaries (proper and common names), less than 1% of the words are tagged through the rules After these two steps, most of the words have been assigned the correct label (and some other incorrect ones): recall >99%. In order to disambiguate the text, another Set of rules is applied. The aim of this second Set is to account for the local and non-local syntactic context for selecting the appropriate label (among the tags offered by the dictionaries and the first rules). Using less than 100 of these rules, just less than 10 percent of the words remain as ambiguous words (high precision) The aim of this labelling is to be the input to the syntactic analyser that, using a chart parser and a robust CFG grammar, divides the sentence into small phrases (where each phrase boundary is a candidate for pause insertion). For this pause insertion rules we use heuristic rules, but we plan to work on a NN approach.
3
Introducción Objetivos: Desarrollar un sistema para verificación /identificación de locutores en condiciones realistas. Las pruebas serán independientes de texto. Evaluar el sistema con habla en distintas condiciones. Optimizar el sistema desde el punto de vista de coste computacional y la tasa de error. In order to be able to read a text in a natural way, the words in the sentences have to be grammatically classified, using both dictionaries and linguistic rules. There are two Set of rules designed for this task. The first one takes into account the endings and the beginnings of the words, for trying to guess the syntactic function of the word. Due to the size of the dictionaries (proper and common names), less than 1% of the words are tagged through the rules After these two steps, most of the words have been assigned the correct label (and some other incorrect ones): recall >99%. In order to disambiguate the text, another Set of rules is applied. The aim of this second Set is to account for the local and non-local syntactic context for selecting the appropriate label (among the tags offered by the dictionaries and the first rules). Using less than 100 of these rules, just less than 10 percent of the words remain as ambiguous words (high precision) The aim of this labelling is to be the input to the syntactic analyser that, using a chart parser and a robust CFG grammar, divides the sentence into small phrases (where each phrase boundary is a candidate for pause insertion). For this pause insertion rules we use heuristic rules, but we plan to work on a NN approach.
4
Base de datos 26 locutores Cada locutor habla lo siguiente
5 minutos de conversación normal (T1 y T2) Lectura de 5 frases en modo normal (FL1 y FL2) Lectura de las mismas 5 frases de forma rápida (FR1 y FR2) Todo ello dos veces separadas en el tiempo (primera y segunda grabación, notación 1 y 2). In order to be able to read a text in a natural way, the words in the sentences have to be grammatically classified, using both dictionaries and linguistic rules. There are two Set of rules designed for this task. The first one takes into account the endings and the beginnings of the words, for trying to guess the syntactic function of the word. Due to the size of the dictionaries (proper and common names), less than 1% of the words are tagged through the rules After these two steps, most of the words have been assigned the correct label (and some other incorrect ones): recall >99%. In order to disambiguate the text, another Set of rules is applied. The aim of this second Set is to account for the local and non-local syntactic context for selecting the appropriate label (among the tags offered by the dictionaries and the first rules). Using less than 100 of these rules, just less than 10 percent of the words remain as ambiguous words (high precision) The aim of this labelling is to be the input to the syntactic analyser that, using a chart parser and a robust CFG grammar, divides the sentence into small phrases (where each phrase boundary is a candidate for pause insertion). For this pause insertion rules we use heuristic rules, but we plan to work on a NN approach.
5
Base de datos La conversación se divide en nueve bloques de texto para hacer pruebas indepte de texto. Las frases tienen textos diferentes entre sí y con los tramos de conversación In order to be able to read a text in a natural way, the words in the sentences have to be grammatically classified, using both dictionaries and linguistic rules. There are two Set of rules designed for this task. The first one takes into account the endings and the beginnings of the words, for trying to guess the syntactic function of the word. Due to the size of the dictionaries (proper and common names), less than 1% of the words are tagged through the rules After these two steps, most of the words have been assigned the correct label (and some other incorrect ones): recall >99%. In order to disambiguate the text, another Set of rules is applied. The aim of this second Set is to account for the local and non-local syntactic context for selecting the appropriate label (among the tags offered by the dictionaries and the first rules). Using less than 100 of these rules, just less than 10 percent of the words remain as ambiguous words (high precision) The aim of this labelling is to be the input to the syntactic analyser that, using a chart parser and a robust CFG grammar, divides the sentence into small phrases (where each phrase boundary is a candidate for pause insertion). For this pause insertion rules we use heuristic rules, but we plan to work on a NN approach.
6
Sistema base Utilización de modelos de mezclas multigaussianas (64 mezclas) In order to be able to read a text in a natural way, the words in the sentences have to be grammatically classified, using both dictionaries and linguistic rules. There are two Set of rules designed for this task. The first one takes into account the endings and the beginnings of the words, for trying to guess the syntactic function of the word. Due to the size of the dictionaries (proper and common names), less than 1% of the words are tagged through the rules After these two steps, most of the words have been assigned the correct label (and some other incorrect ones): recall >99%. In order to disambiguate the text, another Set of rules is applied. The aim of this second Set is to account for the local and non-local syntactic context for selecting the appropriate label (among the tags offered by the dictionaries and the first rules). Using less than 100 of these rules, just less than 10 percent of the words remain as ambiguous words (high precision) The aim of this labelling is to be the input to the syntactic analyser that, using a chart parser and a robust CFG grammar, divides the sentence into small phrases (where each phrase boundary is a candidate for pause insertion). For this pause insertion rules we use heuristic rules, but we plan to work on a NN approach.
7
Resultados In order to be able to read a text in a natural way, the words in the sentences have to be grammatically classified, using both dictionaries and linguistic rules. There are two Set of rules designed for this task. The first one takes into account the endings and the beginnings of the words, for trying to guess the syntactic function of the word. Due to the size of the dictionaries (proper and common names), less than 1% of the words are tagged through the rules After these two steps, most of the words have been assigned the correct label (and some other incorrect ones): recall >99%. In order to disambiguate the text, another Set of rules is applied. The aim of this second Set is to account for the local and non-local syntactic context for selecting the appropriate label (among the tags offered by the dictionaries and the first rules). Using less than 100 of these rules, just less than 10 percent of the words remain as ambiguous words (high precision) The aim of this labelling is to be the input to the syntactic analyser that, using a chart parser and a robust CFG grammar, divides the sentence into small phrases (where each phrase boundary is a candidate for pause insertion). For this pause insertion rules we use heuristic rules, but we plan to work on a NN approach.
8
Resultados In order to be able to read a text in a natural way, the words in the sentences have to be grammatically classified, using both dictionaries and linguistic rules. There are two Set of rules designed for this task. The first one takes into account the endings and the beginnings of the words, for trying to guess the syntactic function of the word. Due to the size of the dictionaries (proper and common names), less than 1% of the words are tagged through the rules After these two steps, most of the words have been assigned the correct label (and some other incorrect ones): recall >99%. In order to disambiguate the text, another Set of rules is applied. The aim of this second Set is to account for the local and non-local syntactic context for selecting the appropriate label (among the tags offered by the dictionaries and the first rules). Using less than 100 of these rules, just less than 10 percent of the words remain as ambiguous words (high precision) The aim of this labelling is to be the input to the syntactic analyser that, using a chart parser and a robust CFG grammar, divides the sentence into small phrases (where each phrase boundary is a candidate for pause insertion). For this pause insertion rules we use heuristic rules, but we plan to work on a NN approach. FL2
9
Resultados II Los mejores resultados se obtienen con el modo de frases leídas normal. Se obtienen mejores resultados con datos de la misma sesión (otro texto) aunque tenga distinto modo de habla que con datos de la otra sesión. El modo de habla es más relevante cuando los porcentajes de error son menores In order to be able to read a text in a natural way, the words in the sentences have to be grammatically classified, using both dictionaries and linguistic rules. There are two Set of rules designed for this task. The first one takes into account the endings and the beginnings of the words, for trying to guess the syntactic function of the word. Due to the size of the dictionaries (proper and common names), less than 1% of the words are tagged through the rules After these two steps, most of the words have been assigned the correct label (and some other incorrect ones): recall >99%. In order to disambiguate the text, another Set of rules is applied. The aim of this second Set is to account for the local and non-local syntactic context for selecting the appropriate label (among the tags offered by the dictionaries and the first rules). Using less than 100 of these rules, just less than 10 percent of the words remain as ambiguous words (high precision) The aim of this labelling is to be the input to the syntactic analyser that, using a chart parser and a robust CFG grammar, divides the sentence into small phrases (where each phrase boundary is a candidate for pause insertion). For this pause insertion rules we use heuristic rules, but we plan to work on a NN approach.
10
Mejoras n. 1 Normalización de medidas de distancia
Donde Sc es el locutor de prueba y Si son todos los locutores. Finalmente: In order to be able to read a text in a natural way, the words in the sentences have to be grammatically classified, using both dictionaries and linguistic rules. There are two Set of rules designed for this task. The first one takes into account the endings and the beginnings of the words, for trying to guess the syntactic function of the word. Due to the size of the dictionaries (proper and common names), less than 1% of the words are tagged through the rules After these two steps, most of the words have been assigned the correct label (and some other incorrect ones): recall >99%. In order to disambiguate the text, another Set of rules is applied. The aim of this second Set is to account for the local and non-local syntactic context for selecting the appropriate label (among the tags offered by the dictionaries and the first rules). Using less than 100 of these rules, just less than 10 percent of the words remain as ambiguous words (high precision) The aim of this labelling is to be the input to the syntactic analyser that, using a chart parser and a robust CFG grammar, divides the sentence into small phrases (where each phrase boundary is a candidate for pause insertion). For this pause insertion rules we use heuristic rules, but we plan to work on a NN approach.
11
Resultados In order to be able to read a text in a natural way, the words in the sentences have to be grammatically classified, using both dictionaries and linguistic rules. There are two Set of rules designed for this task. The first one takes into account the endings and the beginnings of the words, for trying to guess the syntactic function of the word. Due to the size of the dictionaries (proper and common names), less than 1% of the words are tagged through the rules After these two steps, most of the words have been assigned the correct label (and some other incorrect ones): recall >99%. In order to disambiguate the text, another Set of rules is applied. The aim of this second Set is to account for the local and non-local syntactic context for selecting the appropriate label (among the tags offered by the dictionaries and the first rules). Using less than 100 of these rules, just less than 10 percent of the words remain as ambiguous words (high precision) The aim of this labelling is to be the input to the syntactic analyser that, using a chart parser and a robust CFG grammar, divides the sentence into small phrases (where each phrase boundary is a candidate for pause insertion). For this pause insertion rules we use heuristic rules, but we plan to work on a NN approach.
12
Resultados In order to be able to read a text in a natural way, the words in the sentences have to be grammatically classified, using both dictionaries and linguistic rules. There are two Set of rules designed for this task. The first one takes into account the endings and the beginnings of the words, for trying to guess the syntactic function of the word. Due to the size of the dictionaries (proper and common names), less than 1% of the words are tagged through the rules After these two steps, most of the words have been assigned the correct label (and some other incorrect ones): recall >99%. In order to disambiguate the text, another Set of rules is applied. The aim of this second Set is to account for the local and non-local syntactic context for selecting the appropriate label (among the tags offered by the dictionaries and the first rules). Using less than 100 of these rules, just less than 10 percent of the words remain as ambiguous words (high precision) The aim of this labelling is to be the input to the syntactic analyser that, using a chart parser and a robust CFG grammar, divides the sentence into small phrases (where each phrase boundary is a candidate for pause insertion). For this pause insertion rules we use heuristic rules, but we plan to work on a NN approach.
13
Mejoras n. 2 Normalización de medidas de distancia con modelo global
Donde Sb es un único modelo entrenado con todos los locutores de la base de datos In order to be able to read a text in a natural way, the words in the sentences have to be grammatically classified, using both dictionaries and linguistic rules. There are two Set of rules designed for this task. The first one takes into account the endings and the beginnings of the words, for trying to guess the syntactic function of the word. Due to the size of the dictionaries (proper and common names), less than 1% of the words are tagged through the rules After these two steps, most of the words have been assigned the correct label (and some other incorrect ones): recall >99%. In order to disambiguate the text, another Set of rules is applied. The aim of this second Set is to account for the local and non-local syntactic context for selecting the appropriate label (among the tags offered by the dictionaries and the first rules). Using less than 100 of these rules, just less than 10 percent of the words remain as ambiguous words (high precision) The aim of this labelling is to be the input to the syntactic analyser that, using a chart parser and a robust CFG grammar, divides the sentence into small phrases (where each phrase boundary is a candidate for pause insertion). For this pause insertion rules we use heuristic rules, but we plan to work on a NN approach.
14
Modelo 3 Utilización de una matriz de covarianza de cada locutor como modelo La distancia es una distancia entre matrices de covarianza In order to be able to read a text in a natural way, the words in the sentences have to be grammatically classified, using both dictionaries and linguistic rules. There are two Set of rules designed for this task. The first one takes into account the endings and the beginnings of the words, for trying to guess the syntactic function of the word. Due to the size of the dictionaries (proper and common names), less than 1% of the words are tagged through the rules After these two steps, most of the words have been assigned the correct label (and some other incorrect ones): recall >99%. In order to disambiguate the text, another Set of rules is applied. The aim of this second Set is to account for the local and non-local syntactic context for selecting the appropriate label (among the tags offered by the dictionaries and the first rules). Using less than 100 of these rules, just less than 10 percent of the words remain as ambiguous words (high precision) The aim of this labelling is to be the input to the syntactic analyser that, using a chart parser and a robust CFG grammar, divides the sentence into small phrases (where each phrase boundary is a candidate for pause insertion). For this pause insertion rules we use heuristic rules, but we plan to work on a NN approach.
15
Modelo 3 In order to be able to read a text in a natural way, the words in the sentences have to be grammatically classified, using both dictionaries and linguistic rules. There are two Set of rules designed for this task. The first one takes into account the endings and the beginnings of the words, for trying to guess the syntactic function of the word. Due to the size of the dictionaries (proper and common names), less than 1% of the words are tagged through the rules After these two steps, most of the words have been assigned the correct label (and some other incorrect ones): recall >99%. In order to disambiguate the text, another Set of rules is applied. The aim of this second Set is to account for the local and non-local syntactic context for selecting the appropriate label (among the tags offered by the dictionaries and the first rules). Using less than 100 of these rules, just less than 10 percent of the words remain as ambiguous words (high precision) The aim of this labelling is to be the input to the syntactic analyser that, using a chart parser and a robust CFG grammar, divides the sentence into small phrases (where each phrase boundary is a candidate for pause insertion). For this pause insertion rules we use heuristic rules, but we plan to work on a NN approach.
16
Comparacion de tiempos de entrenamiento
In order to be able to read a text in a natural way, the words in the sentences have to be grammatically classified, using both dictionaries and linguistic rules. There are two Set of rules designed for this task. The first one takes into account the endings and the beginnings of the words, for trying to guess the syntactic function of the word. Due to the size of the dictionaries (proper and common names), less than 1% of the words are tagged through the rules After these two steps, most of the words have been assigned the correct label (and some other incorrect ones): recall >99%. In order to disambiguate the text, another Set of rules is applied. The aim of this second Set is to account for the local and non-local syntactic context for selecting the appropriate label (among the tags offered by the dictionaries and the first rules). Using less than 100 of these rules, just less than 10 percent of the words remain as ambiguous words (high precision) The aim of this labelling is to be the input to the syntactic analyser that, using a chart parser and a robust CFG grammar, divides the sentence into small phrases (where each phrase boundary is a candidate for pause insertion). For this pause insertion rules we use heuristic rules, but we plan to work on a NN approach.
17
Conclusiones El modelo 1 ofrece mejoras sustanciales respecto al base (17 de las 36 pruebas) El modelo de mejora 2 ofrece mejoras en la tasa media en todos los casos y significativas en 12 de 36 casos El modelo 3 no ofrece mejoras en tasa pero sí en tiempo de entrenamiento y de test In order to be able to read a text in a natural way, the words in the sentences have to be grammatically classified, using both dictionaries and linguistic rules. There are two Set of rules designed for this task. The first one takes into account the endings and the beginnings of the words, for trying to guess the syntactic function of the word. Due to the size of the dictionaries (proper and common names), less than 1% of the words are tagged through the rules After these two steps, most of the words have been assigned the correct label (and some other incorrect ones): recall >99%. In order to disambiguate the text, another Set of rules is applied. The aim of this second Set is to account for the local and non-local syntactic context for selecting the appropriate label (among the tags offered by the dictionaries and the first rules). Using less than 100 of these rules, just less than 10 percent of the words remain as ambiguous words (high precision) The aim of this labelling is to be the input to the syntactic analyser that, using a chart parser and a robust CFG grammar, divides the sentence into small phrases (where each phrase boundary is a candidate for pause insertion). For this pause insertion rules we use heuristic rules, but we plan to work on a NN approach.
18
Conclusiones Experimentos realizados con menos tiempo de entrenamiento empeoran el sistema significativamente Los próximos experimentos será con umbrales dependientes del locutor In order to be able to read a text in a natural way, the words in the sentences have to be grammatically classified, using both dictionaries and linguistic rules. There are two Set of rules designed for this task. The first one takes into account the endings and the beginnings of the words, for trying to guess the syntactic function of the word. Due to the size of the dictionaries (proper and common names), less than 1% of the words are tagged through the rules After these two steps, most of the words have been assigned the correct label (and some other incorrect ones): recall >99%. In order to disambiguate the text, another Set of rules is applied. The aim of this second Set is to account for the local and non-local syntactic context for selecting the appropriate label (among the tags offered by the dictionaries and the first rules). Using less than 100 of these rules, just less than 10 percent of the words remain as ambiguous words (high precision) The aim of this labelling is to be the input to the syntactic analyser that, using a chart parser and a robust CFG grammar, divides the sentence into small phrases (where each phrase boundary is a candidate for pause insertion). For this pause insertion rules we use heuristic rules, but we plan to work on a NN approach.
Presentaciones similares
© 2025 SlidePlayer.es Inc.
All rights reserved.