Classification Models Reference

Functions for creating and training classification models to predict categorical outcomes.

ML.CLASSIFICATION Namespace

ML.CLASSIFICATION.LOGISTIC()

Creates a Logistic Regression classifier for binary and multi-class classification.

Syntax:

=ML.CLASSIFICATION.LOGISTIC(C, penalty, fit_intercept, max_iter, tol)

Parameters:

C (Number, Optional): Inverse regularization strength (default: 1.0)
- Smaller values = stronger regularization
- Must be positive
penalty (String, Optional): Regularization type (default: “l2”)
- “l1”: Lasso regularization
- “l2”: Ridge regularization
- “elasticnet”: Combination of L1 and L2
- “none”: No regularization
fit_intercept (Boolean, Optional): Add intercept to decision function (default: TRUE)
max_iter (Integer, Optional): Maximum iterations for convergence (default: 100)
tol (Number, Optional): Tolerance for stopping criteria (default: 0.0001)

Returns: Logistic Regression classifier object

Use Case: Binary or multi-class classification with linear decision boundaries

Example:

# Basic logistic regression
Cell A1: =ML.CLASSIFICATION.LOGISTIC()
Result: <LogisticRegression>

# With L1 regularization
Cell A2: =ML.CLASSIFICATION.LOGISTIC(0.5, "l1")

# Train model
Cell B1: =ML.FIT(A1, X_train, y_train)

# Make predictions
Cell C1: =ML.PREDICT(B1, X_test)

ML.CLASSIFICATION.SVM()

Creates a Support Vector Machine (SVM) classifier with various kernel options.

Syntax:

=ML.CLASSIFICATION.SVM(C, kernel, degree, gamma, coef0)

Parameters:

C (Number, Optional): Regularization parameter (default: 1.0)
- Larger values = less regularization
- Must be positive
kernel (String, Optional): Kernel type (default: “rbf”)
- “linear”: Linear kernel (for linearly separable data)
- “poly”: Polynomial kernel
- “rbf”: Radial basis function (most common)
- “sigmoid”: Sigmoid kernel
degree (Integer, Optional): Polynomial degree for ‘poly’ kernel (default: 3)
gamma (String, Optional): Kernel coefficient (default: “scale”)
- “scale”: 1 / (n_features * X.var())
- “auto”: 1 / n_features
coef0 (Number, Optional): Independent term for ‘poly’/‘sigmoid’ kernels (default: 0.0)

Returns: SVM classifier object

Use Case: Complex decision boundaries, high-dimensional data, kernel methods

Example:

# RBF kernel SVM (default)
Cell A1: =ML.CLASSIFICATION.SVM()
Result: <SVC>

# Linear SVM
Cell A2: =ML.CLASSIFICATION.SVM(1.0, "linear")

# Polynomial SVM
Cell A3: =ML.CLASSIFICATION.SVM(1.0, "poly", 3, "scale", 1.0)

# Train model
Cell B1: =ML.FIT(A1, X_train, y_train)
Cell C1: =ML.PREDICT(B1, X_test)

ML.CLASSIFICATION.RANDOM_FOREST_CLF() ⭐

Creates a Random Forest Classifier (Premium feature).

Syntax:

=ML.CLASSIFICATION.RANDOM_FOREST_CLF(n_estimators, criterion, max_depth, min_samples_split, min_samples_leaf, max_features, bootstrap, random_state)

Parameters:

n_estimators (Integer, Optional): Number of trees in forest (default: 100)
criterion (String, Optional): Split quality measure (default: “gini”)
- “gini”: Gini impurity
- “entropy”: Information gain
- “log_loss”: Cross-entropy loss
max_depth (Integer, Optional): Maximum tree depth (default: None = unlimited)
min_samples_split (Integer, Optional): Min samples to split node (default: 2)
min_samples_leaf (Integer, Optional): Min samples at leaf (default: 1)
max_features (Number/String, Optional): Features per split (default: 1.0)
- Integer: Exact number of features
- Float: Fraction of features
- “sqrt”: Square root of total features
- “log2”: Log base 2 of total features
bootstrap (Boolean, Optional): Use bootstrap samples (default: TRUE)
random_state (Integer, Optional): Random seed for reproducibility

Returns: Random Forest Classifier object

Use Case: Complex patterns, feature importance, robust multi-class classification

Example:

# Basic Random Forest
Cell A1: =ML.CLASSIFICATION.RANDOM_FOREST_CLF()
Result: <RandomForestClassifier>

# Optimized forest
Cell A2: =ML.CLASSIFICATION.RANDOM_FOREST_CLF(200, "entropy", 15, 5, 2, "sqrt", TRUE, 42)

# Train and predict
Cell B1: =ML.FIT(A1, X_train, y_train)
Cell C1: =ML.PREDICT(B1, X_test)

Common Patterns

Binary Classification

# Load Iris dataset and select binary classes
Cell A1: =ML.DATASETS.IRIS()

# Separate features and target
Cell B1: =ML.DATA.SELECT_COLUMNS(A1, {0,1,2,3})
Cell C1: =ML.DATA.SELECT_COLUMNS(A1, 4)

# Split train/test
Cell D1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(B1, 0.3, 42, 0)  # Train X
Cell D2: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(B1, 0.3, 42, 1)  # Test X
Cell E1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(C1, 0.3, 42, 0)  # Train y
Cell E2: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(C1, 0.3, 42, 1)  # Test y

# Create and train model
Cell F1: =ML.CLASSIFICATION.LOGISTIC(1.0, "l2")
Cell G1: =ML.FIT(F1, D1, E1)

# Predict and evaluate
Cell H1: =ML.PREDICT(G1, D2)
Cell I1: =ML.EVAL.SCORE(G1, D2, E2)

SVM with Preprocessing Pipeline

# Create preprocessing and model
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell A2: =ML.CLASSIFICATION.SVM(1.0, "rbf")

# Create pipeline
Cell B1: =ML.PIPELINE(A1, A2)

# Train pipeline
Cell C1: =ML.FIT(B1, X_train, y_train)

# Predict
Cell D1: =ML.PREDICT(C1, X_test)

# Get accuracy
Cell E1: =ML.EVAL.SCORE(C1, X_test, y_test)

Multi-Class Classification

# Load digits dataset (10 classes)
Cell A1: =ML.DATASETS.DIGITS()

# Prepare data
Cell B1: =ML.DATA.SELECT_COLUMNS(A1, "0:63")  # Features
Cell C1: =ML.DATA.SELECT_COLUMNS(A1, 64)      # Target

# Split data
Cell D1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(B1, 0.2, 42, 0)
Cell D2: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(B1, 0.2, 42, 1)
Cell E1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(C1, 0.2, 42, 0)
Cell E2: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(C1, 0.2, 42, 1)

# Create Random Forest for multi-class
Cell F1: =ML.CLASSIFICATION.RANDOM_FOREST_CLF(100, "entropy", , , , "sqrt", TRUE, 42)
Cell G1: =ML.FIT(F1, D1, E1)

# Predict and evaluate
Cell H1: =ML.PREDICT(G1, D2)
Cell I1: =ML.EVAL.SCORE(G1, D2, E2)

Comparing Classification Models

# Create multiple classifiers
Cell A1: =ML.CLASSIFICATION.LOGISTIC()
Cell A2: =ML.CLASSIFICATION.SVM(1.0, "linear")
Cell A3: =ML.CLASSIFICATION.SVM(1.0, "rbf")
Cell A4: =ML.CLASSIFICATION.RANDOM_FOREST_CLF(100)

# Train all models
Cell B1: =ML.FIT(A1, X_train, y_train)
Cell B2: =ML.FIT(A2, X_train, y_train)
Cell B3: =ML.FIT(A3, X_train, y_train)
Cell B4: =ML.FIT(A4, X_train, y_train)

# Compare accuracy scores
Cell C1: =ML.EVAL.SCORE(B1, X_test, y_test)  # Logistic
Cell C2: =ML.EVAL.SCORE(B2, X_test, y_test)  # Linear SVM
Cell C3: =ML.EVAL.SCORE(B3, X_test, y_test)  # RBF SVM
Cell C4: =ML.EVAL.SCORE(B4, X_test, y_test)  # Random Forest

Decision Boundary Visualization

# Train a classifier
Cell A1: =ML.CLASSIFICATION.SVM(1.0, "rbf")
Cell B1: =ML.FIT(A1, X_train, y_train)

# Extract decision boundary for first two features
Cell C1: =ML.INSPECT.DECISION_BOUNDARY(B1, X_train, "predict", 0.05, {0,1}, {0,1})

# Result is DataFrame with boundary coordinates
# Can be plotted in Excel scatter chart

Grid Search for Best Classifier

# Create SVM model
Cell A1: =ML.CLASSIFICATION.SVM()

# Parameter grid
# Model | Parameter | Value1 | Value2 | Value3
Cell B1: "model" | "C" | 0.1 | 1 | 10
Cell B2: "model" | "kernel" | "linear" | "rbf" | "poly"
Cell B3: "model" | "gamma" | "scale" | "auto" |

# Grid search with accuracy scoring
Cell C1: =ML.EVAL.GRID_SEARCH(A1, B1:E3, "accuracy", 5, TRUE)
Cell D1: =ML.FIT(C1, X_train, y_train)

# Get best parameters and score
Cell E1: =ML.EVAL.BEST_PARAMS(D1)
Cell F1: =ML.EVAL.BEST_SCORE(D1)

# Get detailed results
Cell G1: =ML.EVAL.SEARCH_RESULTS(D1)

Tips and Best Practices

Model Selection
- Logistic Regression: Linear boundaries, interpretable
- Linear SVM: Similar to logistic but different optimization
- RBF SVM: Complex non-linear boundaries
- Random Forest: Feature importance, robust to outliers
Feature Scaling
- Always scale for Logistic Regression and SVM
- Not required for Random Forest
- Use StandardScaler or MinMaxScaler
SVM Kernel Selection
- Start with RBF kernel (most versatile)
- Use linear kernel for high-dimensional data
- Polynomial for specific polynomial relationships
- Tune C and gamma for RBF kernel
Random Forest Optimization
- More trees = better performance but slower
- Limit max_depth to prevent overfitting
- Use bootstrap=TRUE for better generalization
- Set random_state for reproducibility
Regularization
- Higher C (Logistic/SVM) = less regularization
- Lower C = more regularization, simpler model
- Use cross-validation to find optimal C
Evaluation Metrics
- Use accuracy for balanced datasets
- Consider precision/recall for imbalanced data
- Compare multiple models on same test set

ML.FIT() - Train classification models
ML.PREDICT() - Make class predictions
ML.EVAL.SCORE() - Get accuracy score
ML.PREPROCESSING Functions - Scale and encode data
ML.INSPECT.DECISION_BOUNDARY() - Visualize decision boundaries

Classification Models Reference

Table of Contents

Classification Models Reference

ML.CLASSIFICATION Namespace

ML.CLASSIFICATION.LOGISTIC()

ML.CLASSIFICATION.SVM()

ML.CLASSIFICATION.RANDOM_FOREST_CLF() ⭐

Common Patterns

Binary Classification

SVM with Preprocessing Pipeline

Multi-Class Classification

Comparing Classification Models

Decision Boundary Visualization

Grid Search for Best Classifier

Tips and Best Practices

Navigation

Table of Contents

Classification Models Reference

ML.CLASSIFICATION Namespace

ML.CLASSIFICATION.LOGISTIC()

ML.CLASSIFICATION.SVM()

ML.CLASSIFICATION.RANDOM_FOREST_CLF() ⭐

Common Patterns

Binary Classification

SVM with Preprocessing Pipeline

Multi-Class Classification

Comparing Classification Models

Decision Boundary Visualization

Grid Search for Best Classifier

Tips and Best Practices

Related Functions

Navigation