Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Copyright © Andrew W. Moore Slide 3 Porque son importantes? Real Numbers occur in at least 50% of database records Can’t always quantize them So need to understand how to describe where they come from A great way of saying what’s a reasonable range of values A great way of saying how multiple attributes should reasonably co-occur

Copyright © Andrew W. Moore Slide 4 Porque son importantes? Can immediately get us Bayes Classifiers that are sensible with real-valued data You’ll need to intimately understand PDFs in order to do kernel methods, clustering with Mixture Models, analysis of variance, time series and many other things Will introduce us to linear and non-linear regression

Copyright © Andrew W. Moore Slide 6 Poblacion de PR por grupo de edad groupfreqmidpointfreq.rela 0-42845932.50.0737516 5-93014247.50.0781133 10-1430502512.50.0790465 15-1930557717.50.0791895 20-2429936222.50.0775789 25-2927741527.50.0718914 30-3426295932.50.0681452 35-3926515437.50.0687140 40-4425821142.50.0669147 45-4923996547.50.0621863 50-5423359752.50.0605361 55-5920655257.50.0535274 60-6416979662.50.0440022 65-6914186967.50.0367650 70-7411241672.50.0291323 75-798513777.50.0220630 80-845795382.50.0150184 85+5180187.50.0134241

Copyright © Andrew W. Moore Slide 11 Notar que p(w) tiende a p(x) cuando h tiende a 0 Funcion de distribucion acumulativa. Esta es una funcion No decreciente Se ha mostrado que la derivada de la funcion de distribucion da la funcion de densidad

Copyright © Andrew W. Moore Slide 13 Cual es el significado de p(x)? Si p(5.31) = 0.06 and p(5.92) = 0.03 Entonces cuando un valor de X es muestreado de la distribucion, es dos veces mas probable que X este mas cerca a 5.31 que a 5.92.

Copyright © Andrew W. Moore Slide 14 Yet another way to view a PDF A recipe for sampling a random age. 1.Generate a random dot from the rectangle surrounding the PDF curve. Call the dot (age,d) 2.If d < p(age) stop and return age 3.Else try again: go to Step 1.

Copyright © Andrew W. Moore Slide 17 Expectations E[X] = the expected value of random variable X = the average value we’d see if we took a very large number of random samples of X = the first moment of the shape formed by the axes and the blue curve = the best value to choose if you must guess an unknown person’s age and you’ll be fined the square of your error E[age]=35.897

Copyright © Andrew W. Moore Slide 18 Expectation of a function  =E[f(X)] = the expected value of f(x) where x is drawn from X’s distribution. = the average value we’d see if we took a very large number of random samples of f(X) Note that in general:

Copyright © Andrew W. Moore Slide 19 Variance  2 = Var[X] = the expected squared difference between x and E[X] = amount you’d expect to lose if you must guess an unknown person’s age and you’ll be fined the square of your error, and assuming you play optimally

Copyright © Andrew W. Moore Slide 20 Standard Deviation  2 = Var[X] = the expected squared difference between x and E[X] = amount you’d expect to lose if you must guess an unknown person’s age and you’ll be fined the square of your error, and assuming you play optimally  = Standard Deviation = “typical” deviation of X from its mean

Copyright © Andrew W. Moore Slide 23 Funciones de densidad mas conocidas La densidad uniforme o rectangular La densidad triangular La densidad exponencial La densidad Gamma y la Chi-square La densidad Beta La densidad Normal o Gaussiana Las densidades t y F.

Copyright © Andrew W. Moore Slide 29 General Gaussian  =100  =15 Shorthand: We say X ~ N( ,  2 ) to mean “X is distributed as a Gaussian with parameters  and  2 ”. In the above figure, X ~ N(100,15 2 ) Also known as the normal distribution or Bell- shaped curve

Copyright © Andrew W. Moore Slide 32 The Central Limit Theorem If (X 1,X 2, … X n ) are i.i.d. continuous random variables Then define As n-->infinity, p(z)--->Gaussian with mean E[X i ] and variance Var[X i ] Somewhat of a justification for assuming Gaussian noise is common

Copyright © Andrew W. Moore Slide 36 Estimación por kernels de una función de densidad univariada. En el caso univariado, el estimador por kernels de la función de densidad f(x) se obtiene de la siguiente manera. Consideremos que x 1,…x n es una variable aleatoria X con función de densidad f(x), definamos la función de distribución empirica por el cual es un estimador de la función de distribución acumulada F(x) de X. Considerando que la función de densidad f(x) es la derivada de la función de distribución F y usando aproximación para derivada se tiene que

Copyright © Andrew W. Moore Slide 37 donde h es un valor positivo cercano a cero. Lo anterior es equivalente a la proporción de puntos en el intervalo (x-h, x+h) dividido por 2h. La ecuación anterior puede ser escrita como: donde la función peso K está definida por 0 si |z|>1 K(z)= 1/2 si |z|  1

Copyright © Andrew W. Moore Slide 39 este es llamado el kernel uniforme y h es llamado el ancho de banda el cual es un parámetro de suavización que indica cuanto contribuye cada punto muestral al estimado en el punto x. En general, K y h deben satisfacer ciertas condiciones de regularidad, tales como: K(z) debe ser acotado y absolutamente integrable en (- ,  ) Usualmente, pero no siempre, K(z)  0 y simétrico, luego cualquier función de densidad simétrica puede usarse como kernel.

Copyright © Andrew W. Moore Slide 46 Estimadores de funcion de densidad bi-dimensionales Histogramas K-nearest neighbors: Kernel density estimators A area de la clase A k es el area ncluyendo hasta el k- esimo vecino

Copyright © Andrew W. Moore Slide 53 In 2 dimensions Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space… P( 20<mpg<30 and 2500<weight<3000) = volumen under the 2-d surface within the red rectangle

Copyright © Andrew W. Moore Slide 54 In 2 dimensions Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space… P( [(mpg-25)/10] 2 + [(weight-3300)/1500] 2 < 1 ) = volumen under the 2-d surface within the red oval

Copyright © Andrew W. Moore Slide 55 In 2 dimensions Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space… Take the special case of region R = “everywhere”. Remember that with probability 1, (X,Y) will be drawn from “somewhere”. So..

Copyright © Andrew W. Moore Slide 59 Independence If X and Y are independent then knowing the value of X does not help predict the value of Y the contours say that acceleration and weight are independent

Copyright © Andrew W. Moore Slide 61 Multivariate Expectation > f1= kde2d(autompg1$mpg, autompg1$weight,n=100) > dx=f1$x[2]-f1$x[1] > dy=f1$y[2]-f1$y[1] > dx [1] 0.379798 > dy [1] 35.62626 > meanmpg=sum(f1$x*f1$z)*dx*dy [1] 22.48855 > meanweight=sum(f1$y*f1$z)*dx*dy [1] 2848.638 >#estimated mean > mean(autompg1$weight) [1] 2977.584 > mean(autompg1$mpg) [1] 23.44592

Copyright © Andrew W. Moore Slide 67 Covarianza y desviacion estandar estimadas entre mpg y weight > cov(autompg1[,c(1,5)]) mpg weight mpg 60.91814 -5517.441 weight -5517.44070 721484.709 > sd(autompg1$mpg) [1] 7.805007 > sd(autompg1$weight) [1] 849.4026

Copyright © Andrew W. Moore Slide 71 Regression Line >l1=lm(weight~mpg,data=autompg1) > l1 Call: lm(formula = weight ~ mpg, data = autompg1) Coefficients: (Intercept) mpg 5101.11 -90.57 >#slope of regression line >slope= -5517.44/60.918 [1] -90.571

Copyright © Andrew W. Moore Slide 72 Primer Principal component > a=cov(autompg1[,c(1,5)]) > eigen(a) $values [1] 721526.90386 18.72329 $vectors [,1] [,2] [1,] -0.007647317 0.999970759 [2,] 0.999970759 0.007647317 #slope of primer principal component >.99997/-.00764 [1] –130.8861

Copyright © Andrew W. Moore Slide 73 Covariance Fun Facts True or False: If  xy = 0 then X and Y are independent. False True or False: If X and Y are independent then  xy = 0. True True or False: If  xy =  x  y then X and Y are deterministically related. True True or False: If X and Y are deterministically related then  xy =  x  y. false How could you prove or disprove these?

Copyright © Andrew W. Moore Slide 86 Ejercicios Suppose X and Y are independent real- valued random variables distributed between 0 and 1: What is p[min(X,Y)]? What is E[min(X,Y)]? Prove that E[X] is the value u that minimizes E[(X-u) 2 ] What is the value u that minimizes E[|X-u|]?

Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Presentaciones similares

Presentación del tema: "Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University."— Transcripción de la presentación:

Presentaciones similares

Sobre el proyecto

Feedback

Iniciar la sesión

Autorizarse a través de una red social:

Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Presentaciones similares

Presentación del tema: "Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining Andrew W. Moore Professor School of Computer Science Carnegie Mellon University."— Transcripción de la presentación:

Presentaciones similares

Sobre el proyecto

Feedback