Primer Parcial -> Tema 1 Minería de Datos Universidad del Cauca
Existen muchos métodos para tratar con el problema de la Selección de Instancias (SI). El más conocido es un algoritmo greedy denominado Condensed Nearest Neighbor Rule (CNN). CNN construye un subconjunto S del conjunto de entrenamiento T tal que todo ejemplo de T está más cerca a un ejemplo de S de la misma clase que a otro de S de clase distinta.
El algoritmo comienza seleccionando una instancia de cada clase de T y las inserta en S. Después, cada instancia de T se clasifica con 1-NN usando solamente las instancias que haya en S. Si una instancia no se clasifica bien, se añade a S, asegurando que se va a clasificar correctamente. Este proceso se repite hasta que no haya instancias en T que se clasifiquen incorrectamente.
Ejemplo: Diseño de un Clasificador para Iris Problema simple muy conocido: clasificación de lirios. Tres clases de lirios: setosa, versicolor y virginica. Cuatro atributos: longitud y anchura de pétalo y sépalo, respectivamente. 150 ejemplos, 50 de cada clase. Disponible en: Setosa Versicolor virginica
Ejemplos de conjuntos seleccionados sobre Iris: Reducción: 0%. Reducción: 97,78%. Acierto Test: 95,33 % Acierto Test: 93,33%
Condensed Nearest Neighbour (CNN) Hart 1968 Incremental Order dependent Neither minimal nor decision boundary consistent O(n 3 ) for brute-force method Can follow up with reduced NN [Gates72] Remove a sample if doing so does not cause any incorrect classifications 1.Initialize subset with a single training example 2.Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3.Return to 2 until no transfers occurred or the subset is full Produces consistent set
Condensed Nearest Neighbour (CNN) Hart 1968 Incremental Order dependent Neither minimal nor decision boundary consistent O(n 3 ) for brute-force method Can follow up with reduced NN [Gates72] Remove a sample if doing so does not cause any incorrect classifications 1.Initialize subset with a single training example 2.Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3.Return to 2 until no transfers occurred or the subset is full
Condensed Nearest Neighbour (CNN) Hart 1968 Incremental Order dependent Neither minimal nor decision boundary consistent O(n 3 ) for brute-force method Can follow up with reduced NN [Gates72] Remove a sample if doing so does not cause any incorrect classifications 1.Initialize subset with a single training example 2.Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3.Return to 2 until no transfers occurred or the subset is full
Condensed Nearest Neighbour (CNN) Hart 1968 Incremental Order dependent Neither minimal nor decision boundary consistent O(n 3 ) for brute-force method Can follow up with reduced NN [Gates72] Remove a sample if doing so does not cause any incorrect classifications 1.Initialize subset with a single training example 2.Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3.Return to 2 until no transfers occurred or the subset is full
Condensed Nearest Neighbour (CNN) Hart 1968 Incremental Order dependent Neither minimal nor decision boundary consistent O(n 3 ) for brute-force method Can follow up with reduced NN [Gates72] Remove a sample if doing so does not cause any incorrect classifications 1.Initialize subset with a single training example 2.Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3.Return to 2 until no transfers occurred or the subset is full
Condensed Nearest Neighbour (CNN) Hart 1968 Incremental Order dependent Neither minimal nor decision boundary consistent O(n 3 ) for brute-force method Can follow up with reduced NN [Gates72] Remove a sample if doing so does not cause any incorrect classifications 1.Initialize subset with a single training example 2.Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3.Return to 2 until no transfers occurred or the subset is full
Condensed Nearest Neighbour (CNN) Hart 1968 Incremental Order dependent Neither minimal nor decision boundary consistent O(n 3 ) for brute-force method Can follow up with reduced NN [Gates72] Remove a sample if doing so does not cause any incorrect classifications 1.Initialize subset with a single training example 2.Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3.Return to 2 until no transfers occurred or the subset is full