Churn Norris – Retain your customers with ML Pau Sempere Churn Norris – Retain your customers with ML
About me Pau Sempere Working with SQL Server since 2009 Microsoft Machine Learning MCSA Mentor @ SolidQ Email: psempere@solidq.com Twitter: @pausempere LinkedIn: https://www.linkedin.com/in/pausempere/
Agenda What is Churn? In balance there is virtue KISS How did we get here?
What is Churn?
What is Churn? CUSTOMERS STAY ABANDON
What is Churn? CUSTOMERS 70% 30% STAY ABANDON
What is Churn? CUSTOMERS 95% 5% STAY ABANDON
What is Churn? https://www.invespcro.com/blog/customer-acquisition-retention/
What is Churn? https://www.invespcro.com/blog/customer-acquisition-retention/
What is Churn? CUSTOMERS WHO? WHY?
What is Churn? //debate entre los asistentes, cada uno lo definirá de una manera particular. Si no participan, darles pie, proponer diferentes puntos de vista
Key concepts about Churn CLEAR DEFINITION HISTORICAL DATA CONTEXT PUNTOS CLAVE Esto aplica a todos los problemas de ML del universo, pero especialmente a Churn ya que habla de comportamiento humano
DEMO Basic Churn
Looking for a better model Random forest has: N_estimators Max_Depth Max_leaf_nodes … We need to find a better model exploring such combinations
DEMO Tuned model
In balance there is virtue
In balance there is virtue Churn is an unlikely event
In balance there is virtue OVER SAMPLING UNDER SAMPLING HYBRID METHODS
In balance there is virtue
In balance there is virtue
In balance there is virtue
In balance there is virtue
In balance there is virtue Extra techniques (algorithm based) WEIGHT VECTORS BALANCED EXPLORATION
DEMO Balancing Churn
KISS
KISS
Tree-based methods Decision Trees Random Forests Boosted Trees (lightgbm, xgboost, rxFastTrees…) DISCARD THE LESS USED FEATURES GAIN COVER FREQUENCY* Todos los métodos basados en árboles se basan en partir el subespacio de datos para encontrar subespacios más “puros”. De ahí nace “Gain” (cada algoritmo lo llama de una manera), representa la reducción de la entropía o la ganancia de información generación de subespacios más seguros de la clasificación Cover representa el numero relativo de observaciones relacionadas con esa feature Si tienes 100 observaciones, 4 feats y 3 arboles, y la feature 1 se ha usado para decidir los nodos hoja para 10, 5 y 2 observaciones en los arboles 1, 2 y 3 respectivamente, el cover será 10+5+2 = 17. Frequency es la cantidad de veces que la feature se usa. No todos los algoritmos ofrecen esta
Recursive Feature Elimination Different combinations of features are tried iteratively and performance data is gathered Requires an estimator (model)
DEMO A simpler life
How did we get here?
How did we get here? Churn is a business problem Business is key If our model is ONLY precise, is not enough
Why do we need to interpret? To understand the results we are receiving from our models Debug and improve models Avoid bias Prescriptive analytics
What can we see? Returned values Measures Regression Classification Measures Can we rely ONLY in measures?
Clever Hans
Partial dependency plots
DEMO Show me the model!
QUESTIONS?
THANK YOU!