Table of Contents
Classification Models Reference
Functions for creating and training classification models to predict categorical outcomes.
ML.CLASSIFICATION Namespace
ML.CLASSIFICATION.LOGISTIC()
Creates a Logistic Regression classifier for binary and multi-class classification.
Syntax:
=ML.CLASSIFICATION.LOGISTIC(C, penalty, fit_intercept, max_iter, tol)
Parameters:
C
(Number, Optional): Inverse regularization strength (default: 1.0)- Smaller values = stronger regularization
- Must be positive
penalty
(String, Optional): Regularization type (default: “l2”)- “l1”: Lasso regularization
- “l2”: Ridge regularization
- “elasticnet”: Combination of L1 and L2
- “none”: No regularization
fit_intercept
(Boolean, Optional): Add intercept to decision function (default: TRUE)max_iter
(Integer, Optional): Maximum iterations for convergence (default: 100)tol
(Number, Optional): Tolerance for stopping criteria (default: 0.0001)
Returns: Logistic Regression classifier object
Use Case: Binary or multi-class classification with linear decision boundaries
Example:
# Basic logistic regression
Cell A1: =ML.CLASSIFICATION.LOGISTIC()
Result: <LogisticRegression>
# With L1 regularization
Cell A2: =ML.CLASSIFICATION.LOGISTIC(0.5, "l1")
# Train model
Cell B1: =ML.FIT(A1, X_train, y_train)
# Make predictions
Cell C1: =ML.PREDICT(B1, X_test)
ML.CLASSIFICATION.SVM()
Creates a Support Vector Machine (SVM) classifier with various kernel options.
Syntax:
=ML.CLASSIFICATION.SVM(C, kernel, degree, gamma, coef0)
Parameters:
C
(Number, Optional): Regularization parameter (default: 1.0)- Larger values = less regularization
- Must be positive
kernel
(String, Optional): Kernel type (default: “rbf”)- “linear”: Linear kernel (for linearly separable data)
- “poly”: Polynomial kernel
- “rbf”: Radial basis function (most common)
- “sigmoid”: Sigmoid kernel
degree
(Integer, Optional): Polynomial degree for ‘poly’ kernel (default: 3)gamma
(String, Optional): Kernel coefficient (default: “scale”)- “scale”: 1 / (n_features * X.var())
- “auto”: 1 / n_features
coef0
(Number, Optional): Independent term for ‘poly’/‘sigmoid’ kernels (default: 0.0)
Returns: SVM classifier object
Use Case: Complex decision boundaries, high-dimensional data, kernel methods
Example:
# RBF kernel SVM (default)
Cell A1: =ML.CLASSIFICATION.SVM()
Result: <SVC>
# Linear SVM
Cell A2: =ML.CLASSIFICATION.SVM(1.0, "linear")
# Polynomial SVM
Cell A3: =ML.CLASSIFICATION.SVM(1.0, "poly", 3, "scale", 1.0)
# Train model
Cell B1: =ML.FIT(A1, X_train, y_train)
Cell C1: =ML.PREDICT(B1, X_test)
ML.CLASSIFICATION.RANDOM_FOREST_CLF() ⭐
Creates a Random Forest Classifier (Premium feature).
Syntax:
=ML.CLASSIFICATION.RANDOM_FOREST_CLF(n_estimators, criterion, max_depth, min_samples_split, min_samples_leaf, max_features, bootstrap, random_state)
Parameters:
n_estimators
(Integer, Optional): Number of trees in forest (default: 100)criterion
(String, Optional): Split quality measure (default: “gini”)- “gini”: Gini impurity
- “entropy”: Information gain
- “log_loss”: Cross-entropy loss
max_depth
(Integer, Optional): Maximum tree depth (default: None = unlimited)min_samples_split
(Integer, Optional): Min samples to split node (default: 2)min_samples_leaf
(Integer, Optional): Min samples at leaf (default: 1)max_features
(Number/String, Optional): Features per split (default: 1.0)- Integer: Exact number of features
- Float: Fraction of features
- “sqrt”: Square root of total features
- “log2”: Log base 2 of total features
bootstrap
(Boolean, Optional): Use bootstrap samples (default: TRUE)random_state
(Integer, Optional): Random seed for reproducibility
Returns: Random Forest Classifier object
Use Case: Complex patterns, feature importance, robust multi-class classification
Example:
# Basic Random Forest
Cell A1: =ML.CLASSIFICATION.RANDOM_FOREST_CLF()
Result: <RandomForestClassifier>
# Optimized forest
Cell A2: =ML.CLASSIFICATION.RANDOM_FOREST_CLF(200, "entropy", 15, 5, 2, "sqrt", TRUE, 42)
# Train and predict
Cell B1: =ML.FIT(A1, X_train, y_train)
Cell C1: =ML.PREDICT(B1, X_test)
Common Patterns
Binary Classification
# Load Iris dataset and select binary classes
Cell A1: =ML.DATASETS.IRIS()
# Separate features and target
Cell B1: =ML.DATA.SELECT_COLUMNS(A1, {0,1,2,3})
Cell C1: =ML.DATA.SELECT_COLUMNS(A1, 4)
# Split train/test
Cell D1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(B1, 0.3, 42, 0) # Train X
Cell D2: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(B1, 0.3, 42, 1) # Test X
Cell E1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(C1, 0.3, 42, 0) # Train y
Cell E2: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(C1, 0.3, 42, 1) # Test y
# Create and train model
Cell F1: =ML.CLASSIFICATION.LOGISTIC(1.0, "l2")
Cell G1: =ML.FIT(F1, D1, E1)
# Predict and evaluate
Cell H1: =ML.PREDICT(G1, D2)
Cell I1: =ML.EVAL.SCORE(G1, D2, E2)
SVM with Preprocessing Pipeline
# Create preprocessing and model
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell A2: =ML.CLASSIFICATION.SVM(1.0, "rbf")
# Create pipeline
Cell B1: =ML.PIPELINE(A1, A2)
# Train pipeline
Cell C1: =ML.FIT(B1, X_train, y_train)
# Predict
Cell D1: =ML.PREDICT(C1, X_test)
# Get accuracy
Cell E1: =ML.EVAL.SCORE(C1, X_test, y_test)
Multi-Class Classification
# Load digits dataset (10 classes)
Cell A1: =ML.DATASETS.DIGITS()
# Prepare data
Cell B1: =ML.DATA.SELECT_COLUMNS(A1, "0:63") # Features
Cell C1: =ML.DATA.SELECT_COLUMNS(A1, 64) # Target
# Split data
Cell D1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(B1, 0.2, 42, 0)
Cell D2: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(B1, 0.2, 42, 1)
Cell E1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(C1, 0.2, 42, 0)
Cell E2: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(C1, 0.2, 42, 1)
# Create Random Forest for multi-class
Cell F1: =ML.CLASSIFICATION.RANDOM_FOREST_CLF(100, "entropy", , , , "sqrt", TRUE, 42)
Cell G1: =ML.FIT(F1, D1, E1)
# Predict and evaluate
Cell H1: =ML.PREDICT(G1, D2)
Cell I1: =ML.EVAL.SCORE(G1, D2, E2)
Comparing Classification Models
# Create multiple classifiers
Cell A1: =ML.CLASSIFICATION.LOGISTIC()
Cell A2: =ML.CLASSIFICATION.SVM(1.0, "linear")
Cell A3: =ML.CLASSIFICATION.SVM(1.0, "rbf")
Cell A4: =ML.CLASSIFICATION.RANDOM_FOREST_CLF(100)
# Train all models
Cell B1: =ML.FIT(A1, X_train, y_train)
Cell B2: =ML.FIT(A2, X_train, y_train)
Cell B3: =ML.FIT(A3, X_train, y_train)
Cell B4: =ML.FIT(A4, X_train, y_train)
# Compare accuracy scores
Cell C1: =ML.EVAL.SCORE(B1, X_test, y_test) # Logistic
Cell C2: =ML.EVAL.SCORE(B2, X_test, y_test) # Linear SVM
Cell C3: =ML.EVAL.SCORE(B3, X_test, y_test) # RBF SVM
Cell C4: =ML.EVAL.SCORE(B4, X_test, y_test) # Random Forest
Decision Boundary Visualization
# Train a classifier
Cell A1: =ML.CLASSIFICATION.SVM(1.0, "rbf")
Cell B1: =ML.FIT(A1, X_train, y_train)
# Extract decision boundary for first two features
Cell C1: =ML.INSPECT.DECISION_BOUNDARY(B1, X_train, "predict", 0.05, {0,1}, {0,1})
# Result is DataFrame with boundary coordinates
# Can be plotted in Excel scatter chart
Grid Search for Best Classifier
# Create SVM model
Cell A1: =ML.CLASSIFICATION.SVM()
# Parameter grid
# Model | Parameter | Value1 | Value2 | Value3
Cell B1: "model" | "C" | 0.1 | 1 | 10
Cell B2: "model" | "kernel" | "linear" | "rbf" | "poly"
Cell B3: "model" | "gamma" | "scale" | "auto" |
# Grid search with accuracy scoring
Cell C1: =ML.EVAL.GRID_SEARCH(A1, B1:E3, "accuracy", 5, TRUE)
Cell D1: =ML.FIT(C1, X_train, y_train)
# Get best parameters and score
Cell E1: =ML.EVAL.BEST_PARAMS(D1)
Cell F1: =ML.EVAL.BEST_SCORE(D1)
# Get detailed results
Cell G1: =ML.EVAL.SEARCH_RESULTS(D1)
Tips and Best Practices
-
Model Selection
- Logistic Regression: Linear boundaries, interpretable
- Linear SVM: Similar to logistic but different optimization
- RBF SVM: Complex non-linear boundaries
- Random Forest: Feature importance, robust to outliers
-
Feature Scaling
- Always scale for Logistic Regression and SVM
- Not required for Random Forest
- Use StandardScaler or MinMaxScaler
-
SVM Kernel Selection
- Start with RBF kernel (most versatile)
- Use linear kernel for high-dimensional data
- Polynomial for specific polynomial relationships
- Tune C and gamma for RBF kernel
-
Random Forest Optimization
- More trees = better performance but slower
- Limit max_depth to prevent overfitting
- Use bootstrap=TRUE for better generalization
- Set random_state for reproducibility
-
Regularization
- Higher C (Logistic/SVM) = less regularization
- Lower C = more regularization, simpler model
- Use cross-validation to find optimal C
-
Evaluation Metrics
- Use accuracy for balanced datasets
- Consider precision/recall for imbalanced data
- Compare multiple models on same test set
Related Functions
- ML.FIT() - Train classification models
- ML.PREDICT() - Make class predictions
- ML.EVAL.SCORE() - Get accuracy score
- ML.PREPROCESSING Functions - Scale and encode data
- ML.INSPECT.DECISION_BOUNDARY() - Visualize decision boundaries