La descarga está en progreso. Por favor, espere

La descarga está en progreso. Por favor, espere

LAS LISTAS 2 Y LAS CORPORAS DAY 12 - 2/6/15 SPAN 4350 Cultura computacional en español Harry Howard Tulane University.

Presentaciones similares


Presentación del tema: "LAS LISTAS 2 Y LAS CORPORAS DAY 12 - 2/6/15 SPAN 4350 Cultura computacional en español Harry Howard Tulane University."— Transcripción de la presentación:

1 LAS LISTAS 2 Y LAS CORPORAS DAY 12 - 2/6/15 SPAN 4350 Cultura computacional en español Harry Howard Tulane University

2 Organización del curso 6-feb-2015CultCompES, Prof. Howard, Tulane University 2  http://www.tulane.edu/~howard/Span4350/ http://www.tulane.edu/~howard/Span4350/  http://www.tulane.edu/~howard/CompCultES/ 1. cultcomp 2. python 3. cadenas 4. unicode 5. exreg 6. listas 7. nltk_archives

3 Una lista es una secuencia de objetos entre corchetes. Repaso 6-feb-2015 3 CultCompES, Prof. Howard, Tulane University

4 La mayoría de los métodos de las cadenas funcionan con las listas 1. >>> len(L) 2. >>> sorted(L) 3. >>> len(sorted(L)) 4. >>> set(L) 5. >>> sorted(set(L)) 6. >>> len(set(L)) 7. >>> L+'!' 8. >>> len(L+'!') 9. >>> L*2 10. >>> len(L*2) 11. >>> L.count('mango') 12. >>> L.index('mango') 13. >>> L.rindex('mango') 14. >>> L[2:] 15. >>> L[:2] 16. >>> L[-2:] 17. >>> L[:-2] 18. >>> L[2:-2] 19. >>> L[-2:2] 20. >>> L[:] 21. >>> L[:-1]+['!'] 6-feb-2015CultCompES, Prof. Howard, Tulane University 4

5 split() vs. join() 6-feb-2015CultCompES, Prof. Howard, Tulane University 5

6 §6. Las listas 6-feb-2015 6 CultCompES, Prof. Howard, Tulane University

7 6.2.3. ¿Qué métodos se permiten con una lista pero no con una cadena? 1. >>> L1 = ['Miguel', 'Cervantes'] 2. >>> L1.append('de Saavedra') 3. >>> del L1[2] 4. >>> L1.insert(1, 'de Saavedra') 5. >>> L1.remove('de Saavedra') 6. >>> L1[0] = 'Miguelito' 7. >>> L1.append('de Saavedra') 8. >>> L1.pop(2) 9. >>> L1.reverse() 6-feb-2015CultCompES, Prof. Howard, Tulane University 7

8 http://www.tulane.edu/~howard/CompCultES/n ltk_archives.html 7. NLTK and Internet corpora 6-feb-2015 8 CultCompES, Prof. Howard, Tulane University

9 Configurar el directorio de trabajo global  Crea una carpeta "pyScripts" en tu carpeta de documentos.  En Spyder > Preferences > Global Working directory:  "At start-up, the global working directory is … the following directory (navega a "pyScripts" y pínchala)  "Files are opened from: … the global working directory.  "Files are created in: … the global working directory. 6-feb-2015CultCompES, Prof. Howard, Tulane University 9

10 7.1.1. How to navigate folders with os 1. >>> import os 2. >>> os.getcwd() 3. '/Users/harryhow/Documents/pyScripts' 4. # if the path is not to your pyScripts folder, then change it: 5. >>> os.chdir('/Users/{your_user_name}/Documents/pyScripts/') 6. >>> os.getcwd() 7. '/Users/{your_user_name}/Documents/pyScripts/' 13-Oct-2014NLP, Prof. Howard, Tulane University 10

11 7.1.2. Project Gutenberg http://www.gutenberg.org/ebooks/28554 13-Oct-2014NLP, Prof. Howard, Tulane University 11

12 7.1.3. How to download a file with urllib and convert it to a string with read() 1. >>> from urllib import urlopen 2. >>> url = 'http://www.gutenberg.org/cache/epub/28554/pg28554. txt' 3. >>> download = urlopen(url) 4. >>> downloadString = download.read() 5. >>> type(downloadString) 6. >>> len(downloadString) # 35739? 7. >>> downloadString[:50] 13-Oct-2014NLP, Prof. Howard, Tulane University 12

13 7.1.4. How to save a file to your drive with open(), write(), and close()  # it is assumed that Python is looking at your pyScripts folder  >>> tempFile = open('Cervantes.txt','w')  >>> tempFile.write(downloadString.encode('utf8'))  >>> tempFile.close()  # import os if you haven't already done so  >>> os.listdir('.') 13-Oct-2014NLP, Prof. Howard, Tulane University 13

14 7.1.5. How to look at a file with open() and read() 1. >>> tempFile = open('Cervantes.txt','r') 2. >>> text = tempFile.read() 3. >>> type(text) 4. >>> len(text) 5. >>> text[:50] 13-Oct-2014NLP, Prof. Howard, Tulane University 14

15 7.1.6. How to slice away what you don’t need 1. >>> text.index('*** START OF THIS PROJECT GUTENBERG EBOOK') 2. 499 3. >>> lineIndex = text.index('*** START OF THIS PROJECT GUTENBERG EBOOK') 4. >>> startIndex = text.index('\n',lineIndex) 5. >>> text[:startIndex] 6. >>> text.index('*** END OF THIS PROJECT GUTENBERG EBOOK') 7. >>> endIndex = text.index('*** END OF THIS PROJECT GUTENBERG EBOOK') 8. >>> story = text[startIndex:endIndex] 13-Oct-2014NLP, Prof. Howard, Tulane University 15

16 Now save it as “Wub.txt” 1. # it is assumed that Python is looking at your pyScripts folder 2. >>> tempFile = open('Cervantes.txt','w') 3. >>> tempFile.write(story.encode('utf8')) 4. >>> tempFile.close() 13-Oct-2014NLP, Prof. Howard, Tulane University 16

17 P3 sobre unicode y listas §7. Corpora El próximo día 6-feb-2015CultCompES, Prof. Howard, Tulane University 17


Descargar ppt "LAS LISTAS 2 Y LAS CORPORAS DAY 12 - 2/6/15 SPAN 4350 Cultura computacional en español Harry Howard Tulane University."

Presentaciones similares


Anuncios Google