INTEL CONFIDENTIAL Metodología de programación paralela Intel Software College.

INTEL CONFIDENTIAL Metodología de programación paralela Intel Software College

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 2 Objetivo Al final de este módulo Será capaz de realizar un prototipo y estimar el esfuerzo requerido para paralelizar regiones que consumen tiempo

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 3 Agenda Un ciclo de desarrollo genérico Caso de estudio: Generación de números primos Algunos problemas de rendimiento comunes

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 4 ¿Qué es paralelismo? Dos o más procesos o hilos se ejecutan al mismo tiempo Paralelismo para arquitecturas con varios núcleos Múltiples procesos –Comunicación a través de IPCs (Inter-Process Communication) Un solo proceso, múltiples hilos –Comunicación a través de memoria compartida

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 5 n = número de procesadores T paralelo = { (1-P) + P/n } T serial Aceleración = T serial / T paralelo Ley de Amdahl Describe el límite máximo de aceleración con ejecución paralela El código serial limita la aceleración (1-P) P T serial (1-P) P/2 0.5 + 0.25 1.0/0.75 = 1.33 n = 2 n = ∞ ∞ P/ ∞ … 0.5 + 0.0 1.0/0.5 = 2.0

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 6 Procesos e Hilos Los sistemas operativos modernos cargan programas como procesos –Tiene recursos –Ejecución Un proceso inicia ejecutando su punto de entrada como un hilo Los hilos pueden crear otros hilos dentro del proceso Cada hilo obtiene su propio stack Todos los hilos dentro de un proceso comparten código y segmentos de datos Procesos e Hilos Code segment Data segment thread main() … thread Stack

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 7 Hilos – Beneficios y riesgos Beneficios Mayor rendimiento y mejor utilización de recursos –Incluso en sistemas con un solo procesador – para esconder latencia e incrementar el tiempo de respuesta Comunicación entre procesos a través de memoria compartida es más eficiente Riesgos Incrementa la complejidad de la aplicación Difícil de depurar (condiciones de concurso, interbloqueos, etc.)

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 8 Preguntas comunes cuando se paralelizan aplicaciones ¿Dónde paralelizar? ¿Cuánto tiempo lleva paralelizar? ¿Cuánto esfuerzo para rediseñar se requiere? ¿Es útil paralelizar una región específica? ¿Qué tanto se espera acelerar? ¿El rendimiento va de acuerdo a mis expectativas? ¿Será escalable a más hilos/datos añadidos? ¿Qué modelo de paralelización utilizar?

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 9 Generación de Números Primos bool TestForPrime(int val) { // let’s start checking from 3 int limit, factor = 3; limit = (long)(sqrtf((float)val)+0.5f); while( (factor <= limit) && (val % factor) ) factor += 2; return (factor > limit); } void FindPrimes(int start, int end) { int range = end - start + 1; for( int i = start; i <= end; i += 2 ) { if( TestForPrime(i) ) globalPrimes[gPrimesFound++] = i; ShowProgress(i, range); } ifactor 61 3 5 7 63 3 65 3 5 67 3 5 7 69 3 71 3 5 7 73 3 5 7 9 75 3 5 77 3 5 7 79 3 5 7 9

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 10 Actividad 1 Ejecutar la versión serial de los números primos Buscar el directorio PrimeSingle Compilar con Intel® Parallel Composer en Visual Studio Ejecutar algunas veces con rangos diferentes

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 11 Metodología de Desarrollo Análisis Buscar código donde se realiza cómputo intensivo Diseño (Introducir Hilos) Determinar como implementar una solución paralelizada Depurar Detectar cualquier problema como resultado de usar hilos Afinar para mejorar el rendimiento Lograr el mejor rendimiento en paralelo

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 12 Ciclo de Desarrollo Análisis –Intel® Parallel Amplifier Diseño (Introducir Hilos) –Intel® Performance libraries: IPP y MKL –OpenMP* (Intel® Parallel Composer) –Creción explícita de hilos (Win32*, Pthreads*) Depuración –Intel® Parallel Inspector –Intel Debugger Afinar para mejorar el rendimiento –Intel® Parallel Amplifier

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 13 Usaremos el proyecto PrimeSingle para el análisis PrimeSingle Ejemplo:./PrimeSingle 1 1000000 Análisis de Hotspot Usa Parallel Amplifier para encontrar los hotspots en la aplicación bool TestForPrime(int val) { // let’s start checking from 3 int limit, factor = 3; limit = (long)(sqrtf((float)val)+0.5f); while( (factor <= limit) && (val % factor)) factor ++; return (factor > limit); } void FindPrimes(int start, int end) { // start is always odd int range = end - start + 1; for( int i = start; i <= end; i+= 2 ){ if( TestForPrime(i) ) globalPrimes[gPrimesFound++] = i; ShowProgress(i, range); } Identifica las regiones que consumen tiempo

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 14 Análisis - Call Stack Inspeccionar el código desde el nodo (TestForPrime) Buscar un loop para paralelizar Si no se encuentra ninguno, avanza en el call stack hasta encontrar un ciclo adecuado o llamada a función para paralelizar Este es el nivel en el árbol de llamadas donde necesitamos paralelizar Usado para encontrar el nível apropiado en el árbol de llamadas para paralelizar

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 15 Actividad 2 Ejecuta el código con el rango de ‘1 5000000’ para obtener la medición base Tomar nota para referencias futuras Hacer el análisis de Hotspots en el código serial ¿Qué función se lleva la mayor parte del tiempo?

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 16 Metodología de Diseño de Foster De “Designing and Building Parallel Programs” por Ian Foster Cuatro pasos: ParticionarParticionar –Dividir cómputo y datos ComunicaciónComunicación –Intercambio de datos entre cómputos AglomeraciónAglomeración –Agrupar tareas para mejorar rendimiento MapeoMapeo –Asignar tareas a procesadores/hilos

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 17 Diseñando programas paralelos Particionar Divide el problema en tareasComunicar Determina la cantidad y el patrón de comunicaciónAglomerar Combinar tareasMapear Asignar tareas aglomeradas a los hilos generados Problema Tareas iniciales Comunicación Tareas combinadas Programa final

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 18 Modelos de Programación Paralela Descomposición funcional Paralelismo de tareas Dividir el cómputo, asociarle datos Tareas independientes del mismo problema Descomposición de datos La misma operación ejecutando diferentes datos Dividir datos en piezas, asociarles cómputo

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 19 Métodos de descomposición Descomposición funcional Enfocarse a cómputo puede revelar la estructura en un problema Grid reprinted with permission of Dr. Phu V. Luong, Coastal and Hydraulics Laboratory, ERDC Descomposición por dominio Enfocarse en la estructura de datos más grande o más frecuentemente accesada Paralelismo en los datos –La misma operación aplicada a todos los datos Modelo atmosférico Modelo Oceano Modelo terrestre Modelo de hidrología

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 20 Descomposición en Pipeline La computación se hace en etapas independientes Descomposición funcional Los hilos se asignan a una etapa a computar Línea de ensamble de automóviles Descomposición de datos Los hilos procesan todas las etapas de una sola instancia Un trabajador construye un auto completito

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 21 Ejemplo de LAME Encoder LAME MP3 encoder Proyecto Open source Herramienta educativa El objetivo de este proyecto es Mejorar la calidad Mejorar la velocidad de la codificación a MP3

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 22 Estrategia de LAME Pipeline Frame N Frame N + 1 Tiempo Otro N Preludio N Acústicos N Codificación N T 2 T 1 Acústicos N+1 Preludio N+1 Otro N+1 Codificación N+1 Acústicos N+2 Preludio N+2 T 3 T 4 Preludio N+3 Barrera Jerárquica OtroPreludioAcústicosCodificación Frame Extraer siguiente frame Caracterización del frame Poner parámetros del encoder Analisis FFT long/short Ensamblar el filtro Aplicar filtros Suprimir ruidos Cuantiza y cuenta bits Agregar encabezado del frame Verificar si es correcto Escribe al disco

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 23 Diseño ¿Cuál es el beneficio esperado? ¿Cómo logramos esto con el menor esfuerzo? ¿Cuánto se lleva paralelizar? ¿Cuánto esfuerzo se requiere para rediseñar? Prototipo rápido con OpenMP Aceleración(2P) = 100/(96/2+4) = ~1.92X

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 24 OpenMP Paralelismo Fork-join: El hilo maestro se divide en un grupo de hilos como sea necesario El paralelismo va incrementando –Un programa secuencial evoluciona a un programa paralelo Regiones Paralelas Hilo maestro

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 25 Diseño #pragma omp parallel for for( int i = start; i <= end; i+= 2 ){ if( TestForPrime(i) ) globalPrimes[gPrimesFound++] = i; ShowProgress(i, range); } OpenMP Crea hilos aquí para Esta región paralela for Divide iteraciones de el ciclo for

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 26 Actividad 3 Ejecuta la versión OpenMP del código Localiza el directorio PrimeOpenMP y la solución Compila el código Ejecuta con ‘1 5000000’ para comparar –¿Cuál es la aceleración?

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 27 Diseño ¿Cuál es el beneficio esperado? ¿Cómo logras esto con el menor esfuerzo? ¿Cuánto tiempo se llevó paralelizar? ¿Cuánto esfuerzo se requiere para rediseñar? ¿Es la mejor aceleración posible? Aceleración de 1.40X (menor que 1.92X)

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 28 Depuración ¿Es la implementación correcta de paralelismo? No! Los resultados son diferentes cada ejecución …

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 29 Depuración Intel® Parallel Inspector indica errores notorios en la paralelizacion como condiciones de concurso e interbloqueos Análisis de errores en la paralelización Intel® Parallel Inspector Dónde están los Interbloqueos o Condiciones de Concurso Colector De Datos en tiempo de ejecución

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 30 Intel® Parallel Inspector (Errores en la Paralelización) Seleccionar información en relación con condiciones de concurso e interbloqueos Ver la descripción general de errores de la paralelización (Ti3) Selecciona el error e inspecciona el código

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 31 Actividad 4 Usa Parallel Inspector para analizar la aplicación paralelizada Usa Intel Parallel Inspector para encontrar la condición de concurso que hace que el cálculo de números primos sea incorrecto Ejecuta la aplicación ¿Se reportan errores?

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 32 Depuración ¿Cuánto esfuerzo de requiere para rediseñar? ¿Cuánto nos llevará paralelizar? Parallel Inspector solo reportó 3 dependencias, por lo tanto no hay mayores compliaciones

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 33 Depuración #pragma omp parallel for for( int i = start; i <= end; i+= 2 ){ if( TestForPrime(i) ) #pragma omp critical globalPrimes[gPrimesFound++] = i; ShowProgress(i, range); } #pragma omp critical { gProgress++; percentDone = (int)(gProgress/range *200.0f+0.5f) } Creará una sección crítica para esta referencia Creará una sección crítica para ambas referencias

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 34 Actividad 5 Modifica y ejecuta la versión de Open MP del código Añade pragmas de regiones críticas al código Compila el código Ejecuta Parallel Inspector (Errores de Paralelización) –Si los errores siguen presentes, haz las correcciones apropiadas al código y ejecuta de nuevo en Parallel Inspector Ejecuta con ‘1 5000000’ para comparar –Compila y ejecuta sin debugging –¿Cuál es la aceleración?

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 35 Depuración 1.33 Respuesta correcta, pero el rendimiento bajo al ~1.33X ¿Es lo mejor que podemos esperar de este algoritmo? No! De acuerdo a la Ley de Amdahl, podemos esperar una aceleración cerca de 1.9X

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 36 Problemas comunes de rendimiento Sobrecarga en paralelo Dada por la creación de hilos, planificación… Sincronización Datos globales excesivos, contención de los mismos objetos de sincronización Carga desbalanceada Distribución no adecuada del trabajo en paralelo Granularidad No hay suficiente trabajo paralelo

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 37 Afinando para mejorar el rendimiento Parallel Amplifier (Locks y Waits) apunta a cuellos de botella en el rendimiento en aplicaciones con hilos Locks & Waits Locks & Waits Intel® Parallel Amplifier Intel® Parallel Amplifier

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 38 Parallel Amplifier/ Locks y Waits La gráfica muestra una porción de tiempo significativa en condición ociosa como resultado de la sección crítica FindPrimes() y ShowProgress() están significativamente impactadas por el tiempo ocioso ocurriendo en la sección crítica

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 39 Parallel Amplifier/ Locks y Waits ShowProgress() consume 558/657 (85%) del tiempo permaneciendo ocioso en una sección crítica Haz Double Click en ShowProgress() en la sección crítica más larga para ver el código fuente

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 40 Parallel Amplifier/ Resumen El tiempo transcurrido muestra.571 sec Tiempo de espera/ Núcleos = 1.87/4 =.47 sec Esperando el 82% del tiempo transcurrido en una sección crítica La mayoría del tiempo 1 núcleo y ocasionalmente están ocupados 2

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 41 Parallel Amplifier/ Concurrencia Concurrencia (Function –Caller Function Tree) ShowProgress es llamada de FindPrimes y representa mayormente la razón por la cual la concurrencia es pobre

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 42 Parallel Amplifier/ Concurrencia Concurrencia (Thread –Function –Caller Function Tree) Esta pantalla muestra como cada hilo contribuye al problema de concurrencia Expandiendo cualquier hilo las funciones que más contibuyen

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 43 Rendimiento Double Click en ShowProgress en la segunda sección crítica más larga Esta implementación tiene llamadas de sincronización implícita - printf Esto limita la mejora del rendimiento debido a cambios de contexto resultantes De regreso a la etapa de diseño

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 44 Actividad 6 Usar Parallel Amplifier para analizar la aplicación con hilos Usar la herramienta Parallel Amplifier (Análisis de Locks y Waits) Identifica los waits y locks que más contribuyen

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 45 Rendimiento ¿Es esto mucha contención esperada? El algoritmo tiene muchas más actualizaciones a variables que las 10 necesitadas para mostrar el progreso void ShowProgress( int val, int range ) { int percentDone; gProgress++; percentDone = (int)((float)gProgress/(float)range*200.0f+0.5f); if( percentDone % 10 == 0 ) printf("\b\b\b\b%3d%", percentDone); } void ShowProgress( int val, int range ) { int percentDone; gProgress++; percentDone = (int)((float)gProgress/(float)range*200.0f+0.5f); if( percentDone % 10 == 0 ) printf("\b\b\b\b%3d%", percentDone); } void ShowProgress( int val, int range ) { int percentDone; static int lastPercentDone = 0; #pragma omp critical { gProgress++; percentDone = (int)((float)gProgress/(float)range*200.0f+0.5f); } if( percentDone % 10 == 0 && lastPercentDone < percentDone / 10){ printf("\b\b\b\b%3d%", percentDone); lastPercentDone++; } void ShowProgress( int val, int range ) { int percentDone; static int lastPercentDone = 0; #pragma omp critical { gProgress++; percentDone = (int)((float)gProgress/(float)range*200.0f+0.5f); } if( percentDone % 10 == 0 && lastPercentDone < percentDone / 10){ printf("\b\b\b\b%3d%", percentDone); lastPercentDone++; } Este cambio debe arreglar el problema de contención

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 46 Diseño Meta Elimina la contención debido a la sincronización implícita 2.32X ¡La aceleración es 2.32X ! ¿Es correcto?

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 47 Rendimiento Nuestra medida base original tenía la actualización del algoritmo de progreso “defectuoso” ¿Es lo mejor que podemos esperar de este algoritmo? 1.40X ¡La aceleración actual es 1.40X (<<1.9X)!

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 48 Actividad 7 Modifica la función ShowProgress (en la versiones serial y OpenMP) para solo mostrar la salida necesaria Recompila y ejecuta el programa –Asegurarse que no se están usando banderas de instrumentación ¿Cuál es la aceleración con respecto a la versión serial? if( percentDone % 10 == 0 && lastPercentDone < percentDone){ printf("\b\b\b\b%3d%", percentDone); lastPercentDone += 10; } if( percentDone % 10 == 0 && lastPercentDone < percentDone){ printf("\b\b\b\b%3d%", percentDone); lastPercentDone += 10; }

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 49 Revisando el Rendimiento Locks y Waits – Antes y Después de cambiar a la función printf En la versión más rápida – el printf es llamado ~ 10x menos veces - lastPercentDone < percentDone / 10 Locks y Waits “self wait count” y “poor” muestran una diferencia significativa entre versiones

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 50 Revisando el Rendimiento Veamos los Locks de OpenMP… void FindPrimes(int start, int end) { // start is always odd int range = end - start + 1; #pragma omp parallel for for( int i = start; i <= end; i += 2 ) { if( TestForPrime(i) ) #pragma omp critical globalPrimes[gPrimesFound++] = i; ShowProgress(i, range); } void FindPrimes(int start, int end) { // start is always odd int range = end - start + 1; #pragma omp parallel for for( int i = start; i <= end; i += 2 ) { if( TestForPrime(i) ) #pragma omp critical globalPrimes[gPrimesFound++] = i; ShowProgress(i, range); } Hay un lock en un ciclo void FindPrimes(int start, int end) { // start is always odd int range = end - start + 1; #pragma omp parallel for for( int i = start; i <= end; i += 2 ) { if( TestForPrime(i) ) globalPrimes[InterlockedIncrement(&gPrimesFound)] = i; ShowProgress(i, range); } void FindPrimes(int start, int end) { // start is always odd int range = end - start + 1; #pragma omp parallel for for( int i = start; i <= end; i += 2 ) { if( TestForPrime(i) ) globalPrimes[InterlockedIncrement(&gPrimesFound)] = i; ShowProgress(i, range); }

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 51 Revisando el Rendimiento Veamos el segundo lock void ShowProgress( int val, int range ) { int percentDone; static int lastPercentDone = 0; #pragma omp critical { gProgress++; percentDone = (int)((float)gProgress/(float)range*200.0f+0.5f); } if( percentDone % 10 == 0 && lastPercentDone < percentDone / 10){ printf("\b\b\b\b%3d%", percentDone); lastPercentDone++; } void ShowProgress( int val, int range ) { int percentDone; static int lastPercentDone = 0; #pragma omp critical { gProgress++; percentDone = (int)((float)gProgress/(float)range*200.0f+0.5f); } if( percentDone % 10 == 0 && lastPercentDone < percentDone / 10){ printf("\b\b\b\b%3d%", percentDone); lastPercentDone++; } Este es un lock que está siendo llamado dentro de un ciclo void ShowProgress( int val, int range ) { long percentDone, localProgress; static int lastPercentDone = 0; localProgress = InterlockedIncrement(&gProgress); percentDone = (int)((float) localProgress /(float)range*200.0f+0.5f); if( percentDone % 10 == 0 && lastPercentDone < percentDone / 10){ printf("\b\b\b\b%3d%", percentDone); lastPercentDone++; } void ShowProgress( int val, int range ) { long percentDone, localProgress; static int lastPercentDone = 0; localProgress = InterlockedIncrement(&gProgress); percentDone = (int)((float) localProgress /(float)range*200.0f+0.5f); if( percentDone % 10 == 0 && lastPercentDone < percentDone / 10){ printf("\b\b\b\b%3d%", percentDone); lastPercentDone++; }

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 52 Actividad 8 Modifica las regiones críticas de OpenMP para usar InterlockedIncrement en vez de Re-compila y ejecuta el programa ´¿Cuál es la aceleración con respecto a la versión serial?

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 53 Análisis del balanceo de carga Usa el análisis de concurrencia de Parallel Amplifier Selecciona “Thread –Function -Caller Function Tree” Observa que los 4 hilos hacen cantidades de trabajo desiguales

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 54 Arreglando el Desbalanceo de Carga Distribuye el trabajo de manera más uniforme void FindPrimes(int start, int end) { // start is always odd int range = end - start + 1; #pragma omp parallel for schedule(static,8) for( int i = start; i <= end; i += 2 ) { if( TestForPrime(i) ) globalPrimes[InterlockedIncrement(&gPrimesFound)] = i; ShowProgress(i, range); } void FindPrimes(int start, int end) { // start is always odd int range = end - start + 1; #pragma omp parallel for schedule(static,8) for( int i = start; i <= end; i += 2 ) { if( TestForPrime(i) ) globalPrimes[InterlockedIncrement(&gPrimesFound)] = i; ShowProgress(i, range); } 1.68X La aceleración lograda es 1.68X

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 55 Modifica el código para un mejor balanceo de carga Añade la cláusula schedule (static, 8) al pragma de OpenMP parallel for Re-compila y ejecuta el programa ¿Cuál es la aceleración comparada con la versión serial? Actividad 9

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 56 Análisis final de balanceo de carga 1.80X La aceleración lograda es 1.80X

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 57 Análisis Comparativo Paralelizar aplicaciones requiere varias iteraciones a través del ciclo de desarrollo de software

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 58 Metodología de paralelización Lo que se Cubrió Cuatro pasos del ciclo de desarrollo para escribir aplicaciones paralelas desde el código serial y las herramientas de Intel® para soportar cada paso Análisis Diseño (Introducir Hilos) Depurar para la correctud Afinar el rendimiento Las aplicaciones paralelas requieren múltiples iteraciones de diseño, depuración y afinación de rendimiento Usar las herramientas para mejorar productividad

Intel ® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 59

INTEL CONFIDENTIAL Metodología de programación paralela Intel Software College.

Presentaciones similares

Presentación del tema: "INTEL CONFIDENTIAL Metodología de programación paralela Intel Software College."— Transcripción de la presentación:

Presentaciones similares

Sobre el proyecto

Feedback

Iniciar la sesión

Autorizarse a través de una red social:

INTEL CONFIDENTIAL Metodología de programación paralela Intel Software College.

Presentaciones similares

Presentación del tema: "INTEL CONFIDENTIAL Metodología de programación paralela Intel Software College."— Transcripción de la presentación:

Presentaciones similares

Sobre el proyecto

Feedback