La descarga está en progreso. Por favor, espere

La descarga está en progreso. Por favor, espere

Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Presentaciones similares


Presentación del tema: "Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University."— Transcripción de la presentación:

1 Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599

2 Copyright © Andrew W. Moore Slide 2 Contenido Porque son importantes. Notacion y fundamentos de PDF continuas. PDFs multivariadas continuas. Combinando variables aleatorias discretas y continuas.

3 Copyright © Andrew W. Moore Slide 3 Porque son importantes? Real Numbers occur in at least 50% of database records Can’t always quantize them So need to understand how to describe where they come from A great way of saying what’s a reasonable range of values A great way of saying how multiple attributes should reasonably co-occur

4 Copyright © Andrew W. Moore Slide 4 Porque son importantes? Can immediately get us Bayes Classifiers that are sensible with real-valued data You’ll need to intimately understand PDFs in order to do kernel methods, clustering with Mixture Models, analysis of variance, time series and many other things Will introduce us to linear and non-linear regression

5 Copyright © Andrew W. Moore Slide 5 A PDF of American Ages in 2000

6 Copyright © Andrew W. Moore Slide 6 Poblacion de PR por grupo de edad groupfreqmidpointfreq.rela 0-42845932.50.0737516 5-93014247.50.0781133 10-1430502512.50.0790465 15-1930557717.50.0791895 20-2429936222.50.0775789 25-2927741527.50.0718914 30-3426295932.50.0681452 35-3926515437.50.0687140 40-4425821142.50.0669147 45-4923996547.50.0621863 50-5423359752.50.0605361 55-5920655257.50.0535274 60-6416979662.50.0440022 65-6914186967.50.0367650 70-7411241672.50.0291323 75-798513777.50.0220630 80-845795382.50.0150184 85+5180187.50.0134241

7 Copyright © Andrew W. Moore Slide 7

8 Copyright © Andrew W. Moore Slide 8 A PDF of American Ages in 2000 Let X be a continuous random variable. If p(x) is a Probability Density Function for X then… = 0.36

9 Copyright © Andrew W. Moore Slide 9 Properties of PDFs That means…

10 Copyright © Andrew W. Moore Slide 10 Donde x-h/2<w<x+h/2). Luego, Asi p(w) tiende a p(x) cuando h tiende a cero

11 Copyright © Andrew W. Moore Slide 11 Notar que p(w) tiende a p(x) cuando h tiende a 0 Funcion de distribucion acumulativa. Esta es una funcion No decreciente Se ha mostrado que la derivada de la funcion de distribucion da la funcion de densidad

12 Copyright © Andrew W. Moore Slide 12 Properties of PDFs Therefore… La dcerivada de una fucnion no dcecreciente es mayor o igual que cero.

13 Copyright © Andrew W. Moore Slide 13 Cual es el significado de p(x)? Si p(5.31) = 0.06 and p(5.92) = 0.03 Entonces cuando un valor de X es muestreado de la distribucion, es dos veces mas probable que X este mas cerca a 5.31 que a 5.92.

14 Copyright © Andrew W. Moore Slide 14 Yet another way to view a PDF A recipe for sampling a random age. 1.Generate a random dot from the rectangle surrounding the PDF curve. Call the dot (age,d) 2.If d < p(age) stop and return age 3.Else try again: go to Step 1.

15 Copyright © Andrew W. Moore Slide 15 Test your understanding True or False:

16 Copyright © Andrew W. Moore Slide 16 Expectations E[X] = the expected value of random variable X = the average value we’d see if we took a very large number of random samples of X

17 Copyright © Andrew W. Moore Slide 17 Expectations E[X] = the expected value of random variable X = the average value we’d see if we took a very large number of random samples of X = the first moment of the shape formed by the axes and the blue curve = the best value to choose if you must guess an unknown person’s age and you’ll be fined the square of your error E[age]=35.897

18 Copyright © Andrew W. Moore Slide 18 Expectation of a function  =E[f(X)] = the expected value of f(x) where x is drawn from X’s distribution. = the average value we’d see if we took a very large number of random samples of f(X) Note that in general:

19 Copyright © Andrew W. Moore Slide 19 Variance  2 = Var[X] = the expected squared difference between x and E[X] = amount you’d expect to lose if you must guess an unknown person’s age and you’ll be fined the square of your error, and assuming you play optimally

20 Copyright © Andrew W. Moore Slide 20 Standard Deviation  2 = Var[X] = the expected squared difference between x and E[X] = amount you’d expect to lose if you must guess an unknown person’s age and you’ll be fined the square of your error, and assuming you play optimally  = Standard Deviation = “typical” deviation of X from its mean

21 Copyright © Andrew W. Moore Slide 21 Estadisticas para PR E(edad)=35.17 Var(edad)=501.16 Desv.Est(edad)=22.38

22 Copyright © Andrew W. Moore Slide 22 Funciones de densidad mas conocidas

23 Copyright © Andrew W. Moore Slide 23 Funciones de densidad mas conocidas La densidad uniforme o rectangular La densidad triangular La densidad exponencial La densidad Gamma y la Chi-square La densidad Beta La densidad Normal o Gaussiana Las densidades t y F.

24 Copyright © Andrew W. Moore Slide 24 La distribucion rectangular -w/20w/2 1/w

25 Copyright © Andrew W. Moore Slide 25 La distribucion triangular 0

26 Copyright © Andrew W. Moore Slide 26 The Exponential distribution

27 Copyright © Andrew W. Moore Slide 27 La distribucion Normal Estandar

28 Copyright © Andrew W. Moore Slide 28 La distribucion Normal General  =100  =15

29 Copyright © Andrew W. Moore Slide 29 General Gaussian  =100  =15 Shorthand: We say X ~ N( ,  2 ) to mean “X is distributed as a Gaussian with parameters  and  2 ”. In the above figure, X ~ N(100,15 2 ) Also known as the normal distribution or Bell- shaped curve

30 Copyright © Andrew W. Moore Slide 30 The Error Function Assume X ~ N(0,1) Define ERF(x) = P(X<x) = Cumulative Distribution of X

31 Copyright © Andrew W. Moore Slide 31 Using The Error Function Assume X ~ N( ,  2 ) P(X<x| ,  2 ) =

32 Copyright © Andrew W. Moore Slide 32 The Central Limit Theorem If (X 1,X 2, … X n ) are i.i.d. continuous random variables Then define As n-->infinity, p(z)--->Gaussian with mean E[X i ] and variance Var[X i ] Somewhat of a justification for assuming Gaussian noise is common

33 Copyright © Andrew W. Moore Slide 33 Estimadores de funcion de densidad Histograms K-nearest neighbors: Kernel density estimators h ancho de clase d k es la distancia hasta el k-esimo vecino

34 Copyright © Andrew W. Moore Slide 34 > x=c( 7.3, 6.8, 7.1, 2.5, 7.9, 6.5, 4.2, 0.5, 5.6, 5.9) > hist(x,freq=F,main="Estimacion de funcion de densidad- histograma") > rug(x,col=2)

35 Copyright © Andrew W. Moore Slide 35 Estimacion de densidad por knn en 20 pts con k=1,3,5,7

36 Copyright © Andrew W. Moore Slide 36 Estimación por kernels de una función de densidad univariada. En el caso univariado, el estimador por kernels de la función de densidad f(x) se obtiene de la siguiente manera. Consideremos que x 1,…x n es una variable aleatoria X con función de densidad f(x), definamos la función de distribución empirica por el cual es un estimador de la función de distribución acumulada F(x) de X. Considerando que la función de densidad f(x) es la derivada de la función de distribución F y usando aproximación para derivada se tiene que

37 Copyright © Andrew W. Moore Slide 37 donde h es un valor positivo cercano a cero. Lo anterior es equivalente a la proporción de puntos en el intervalo (x-h, x+h) dividido por 2h. La ecuación anterior puede ser escrita como: donde la función peso K está definida por 0 si |z|>1 K(z)= 1/2 si |z|  1

38 Copyright © Andrew W. Moore Slide 38 Muestra: 6, 8, 9 12, 20, 25,18, 31

39 Copyright © Andrew W. Moore Slide 39 este es llamado el kernel uniforme y h es llamado el ancho de banda el cual es un parámetro de suavización que indica cuanto contribuye cada punto muestral al estimado en el punto x. En general, K y h deben satisfacer ciertas condiciones de regularidad, tales como: K(z) debe ser acotado y absolutamente integrable en (- ,  ) Usualmente, pero no siempre, K(z)  0 y simétrico, luego cualquier función de densidad simétrica puede usarse como kernel.

40 Copyright © Andrew W. Moore Slide 40 Eleccion del ancho de banda h Donde n es el numero de datos y s la desviacion estandar de la muestra.

41 Copyright © Andrew W. Moore Slide 41 EL KERNEL GAUSSIANO En este caso el kernel representa una función peso más suave donde todos los puntos contribuyen al estimado de f(x) en x. Es decir,

42 Copyright © Andrew W. Moore Slide 42 EL KERNEL TRIANGULAR K(z)=1- |z| para |z|<1, 0 en otro caso. EL KERNEL "BIWEIGHT" 15/16(1-z 2 ) 2 para |z|<1 K(z)= 0 en otro caso

43 Copyright © Andrew W. Moore Slide 43 EL KERNEL EPANECHNIKOV para |z|< K(z)= 0 en otro caso

44 Copyright © Andrew W. Moore Slide 44 Estimacion de densidad en 20 pts usando kernel gaussiano con h=.5,”opt1”,”opt2”, 4

45 Copyright © Andrew W. Moore Slide 45 Variables aleatorias bidimensionales p(x,y) = probability density of random variables (X,Y) at location (x,y)

46 Copyright © Andrew W. Moore Slide 46 Estimadores de funcion de densidad bi-dimensionales Histogramas K-nearest neighbors: Kernel density estimators A area de la clase A k es el area ncluyendo hasta el k- esimo vecino

47 Copyright © Andrew W. Moore Slide 47 Estimacion de kernel bivariado Sean x i =(x,y) los valores observados y t=(t 1,t 2 ) un punto del plano donde se desea estimar la densidad conjunta Si h 1 =h 2 =h

48 Copyright © Andrew W. Moore Slide 48 (a1, a2)H -1 donde

49 Copyright © Andrew W. Moore Slide 49 Estimacion de densidad-Kernel Gaussiano bivariado

50 Copyright © Andrew W. Moore Slide 50 f1= kde2d(autompg1$V1, autompg1$V5,n=100) persp(f1$x,f1$y,f1$z)

51 Copyright © Andrew W. Moore Slide 51 contour(f1, levels=c(8e-6,2e-5, 2.8e-5),col=c(2,3,4), xlab="mpg",ylab="weight")

52 Copyright © Andrew W. Moore Slide 52 In 2 dimensions Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space…

53 Copyright © Andrew W. Moore Slide 53 In 2 dimensions Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space… P( 20<mpg<30 and 2500<weight<3000) = volumen under the 2-d surface within the red rectangle

54 Copyright © Andrew W. Moore Slide 54 In 2 dimensions Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space… P( [(mpg-25)/10] 2 + [(weight-3300)/1500] 2 < 1 ) = volumen under the 2-d surface within the red oval

55 Copyright © Andrew W. Moore Slide 55 In 2 dimensions Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space… Take the special case of region R = “everywhere”. Remember that with probability 1, (X,Y) will be drawn from “somewhere”. So..

56 Copyright © Andrew W. Moore Slide 56 In 2 dimensions Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space…

57 Copyright © Andrew W. Moore Slide 57 In m dimensions Let (X 1,X 2,…X m ) be an n-tuple of continuous random variables, and let R be some region of R m …

58 Copyright © Andrew W. Moore Slide 58 Independence If X and Y are independent then knowing the value of X does not help predict the value of Y mpg,weight NOT independent

59 Copyright © Andrew W. Moore Slide 59 Independence If X and Y are independent then knowing the value of X does not help predict the value of Y the contours say that acceleration and weight are independent

60 Copyright © Andrew W. Moore Slide 60 Multivariate Expectation E[mpg,weight] = (24.5,2600) The centroid of the cloud

61 Copyright © Andrew W. Moore Slide 61 Multivariate Expectation > f1= kde2d(autompg1$mpg, autompg1$weight,n=100) > dx=f1$x[2]-f1$x[1] > dy=f1$y[2]-f1$y[1] > dx [1] 0.379798 > dy [1] 35.62626 > meanmpg=sum(f1$x*f1$z)*dx*dy [1] 22.48855 > meanweight=sum(f1$y*f1$z)*dx*dy [1] 2848.638 >#estimated mean > mean(autompg1$weight) [1] 2977.584 > mean(autompg1$mpg) [1] 23.44592

62 Copyright © Andrew W. Moore Slide 62 Multivariate Expectation

63 Copyright © Andrew W. Moore Slide 63 Test your understanding All the time? Siempre Only when X and Y are independent? It can fail even if X and Y are independent?

64 Copyright © Andrew W. Moore Slide 64 Bivariate Expectation

65 Copyright © Andrew W. Moore Slide 65 Bivariate Covariance

66 Copyright © Andrew W. Moore Slide 66 Bivariate Covariance

67 Copyright © Andrew W. Moore Slide 67 Covarianza y desviacion estandar estimadas entre mpg y weight > cov(autompg1[,c(1,5)]) mpg weight mpg 60.91814 -5517.441 weight -5517.44070 721484.709 > sd(autompg1$mpg) [1] 7.805007 > sd(autompg1$weight) [1] 849.4026

68 Copyright © Andrew W. Moore Slide 68 Covariance Intuition E[mpg,weight] = (24.5,2600)

69 Copyright © Andrew W. Moore Slide 69 Covariance Intuition E[mpg,weight] = (24.5,2600) Principal Eigenvector of 

70 Copyright © Andrew W. Moore Slide 70 Regression Line Notice that the regression line pass trough (  x,  y )

71 Copyright © Andrew W. Moore Slide 71 Regression Line >l1=lm(weight~mpg,data=autompg1) > l1 Call: lm(formula = weight ~ mpg, data = autompg1) Coefficients: (Intercept) mpg 5101.11 -90.57 >#slope of regression line >slope= -5517.44/60.918 [1] -90.571

72 Copyright © Andrew W. Moore Slide 72 Primer Principal component > a=cov(autompg1[,c(1,5)]) > eigen(a) $values [1] 721526.90386 18.72329 $vectors [,1] [,2] [1,] -0.007647317 0.999970759 [2,] 0.999970759 0.007647317 #slope of primer principal component >.99997/-.00764 [1] –130.8861

73 Copyright © Andrew W. Moore Slide 73 Covariance Fun Facts True or False: If  xy = 0 then X and Y are independent. False True or False: If X and Y are independent then  xy = 0. True True or False: If  xy =  x  y then X and Y are deterministically related. True True or False: If X and Y are deterministically related then  xy =  x  y. false How could you prove or disprove these?

74 Copyright © Andrew W. Moore Slide 74 Test your understanding All the time? Only when X and Y are independent? Cierto It can fail even if X and Y are independent?

75 Copyright © Andrew W. Moore Slide 75 Marginal Distributions

76 Copyright © Andrew W. Moore Slide 76 Conditional Distributions

77 Copyright © Andrew W. Moore Slide 77 Conditional Distributions Why?

78 Copyright © Andrew W. Moore Slide 78 Independence Revisited It’s easy to prove that these statements are equivalent…

79 Copyright © Andrew W. Moore Slide 79 More useful stuff Bayes Rule (These can all be proved from definitions on previous slides)

80 Copyright © Andrew W. Moore Slide 80 Mixing discrete and continuous variables Bayes Rule Bayes Rule

81 Copyright © Andrew W. Moore Slide 81 P(educacion,salario>50k)

82 Copyright © Andrew W. Moore Slide 82 Estimation of the posterior P(Class/Education)

83 Copyright © Andrew W. Moore Slide 83 Mixing discrete and continuous variables P(EduYears,Wealthy)

84 Copyright © Andrew W. Moore Slide 84 Mixing discrete and continuous variables P(EduYears,Wealthy) P(Wealthy| EduYears)

85 Copyright © Andrew W. Moore Slide 85 Mixing discrete and continuous variables Renormalized Axes P(EduYears,Wealthy) P(Wealthy| EduYears) P(EduYears|Wealthy)

86 Copyright © Andrew W. Moore Slide 86 Ejercicios Suppose X and Y are independent real- valued random variables distributed between 0 and 1: What is p[min(X,Y)]? What is E[min(X,Y)]? Prove that E[X] is the value u that minimizes E[(X-u) 2 ] What is the value u that minimizes E[|X-u|]?


Descargar ppt "Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University."

Presentaciones similares


Anuncios Google