Evaluation Functions Reference

Complete reference for FormulaML evaluation functions including scoring, cross-validation, and hyperparameter tuning.

Evaluation Functions Reference

Functions for evaluating model performance, cross-validation, and hyperparameter optimization.

ML.EVAL Namespace

ML.EVAL.SCORE()

Evaluates model performance on test data.

Syntax:

=ML.EVAL.SCORE(model, X, y)

Parameters:

  • model (Object, Required): Trained model object
  • X (Object, Required): Test features
  • y (Object, Required): True target values

Returns: Float score value

  • Regression: R² score (coefficient of determination)
  • Classification: Mean accuracy

Use Case: Quick model performance evaluation

Example:

# Evaluate regression model
Cell A1: =ML.EVAL.SCORE(trained_regression, X_test, y_test)
Result: 0.85  # R² score

# Evaluate classifier
Cell B1: =ML.EVAL.SCORE(trained_classifier, X_test, y_test)
Result: 0.92  # Accuracy

ML.EVAL.CV_SCORE() ⭐

Performs cross-validation on a model (Premium feature).

Syntax:

=ML.EVAL.CV_SCORE(model, X, y, cv, scoring)

Parameters:

  • model (Object, Required): Unfitted model object
  • X (Object, Required): Training features
  • y (Object, Required): Training target
  • cv (Integer, Required): Number of cross-validation folds
  • scoring (String, Required): Scoring metric
    • Regression: “r2”, “neg_mean_squared_error”, “neg_mean_absolute_error”
    • Classification: “accuracy”, “precision”, “recall”, “f1”

Returns: Array of scores (one per fold)

Use Case: Robust model evaluation, detect overfitting

Example:

# 5-fold cross-validation
Cell A1: =ML.EVAL.CV_SCORE(model, X_train, y_train, 5, "accuracy")
Result: [0.89, 0.91, 0.88, 0.90, 0.92]  # 5 scores

# Average CV score
Cell B1: =AVERAGE(A1#)
Result: 0.90

ML.EVAL.GRID_SEARCH() ⭐

Performs exhaustive hyperparameter search (Premium feature).

Syntax:

=ML.EVAL.GRID_SEARCH(model, param_grid, scoring, cv, refit)

Parameters:

  • model (Object, Required): Unfitted model or pipeline
  • param_grid (DataFrame, Required): Parameter combinations to test
    • Format: Model | Parameter | Value1 | Value2 | …
  • scoring (String/Array, Optional): Scoring metric(s)
  • cv (Integer, Optional): Cross-validation folds
  • refit (Boolean, Optional): Refit best model on full data (default: TRUE)

Returns: GridSearchCV object with best model

Use Case: Find optimal hyperparameters automatically

Example:

# Create parameter grid
# Cell B1:E3
# Model      | Parameter | Value1 | Value2 | Value3
# model      | C         | 0.1    | 1      | 10
# model      | kernel    | linear | rbf    |

Cell A1: =ML.CLASSIFICATION.SVM()
Cell A2: =ML.EVAL.GRID_SEARCH(A1, B1:E3, "accuracy", 5, TRUE)
Cell A3: =ML.FIT(A2, X_train, y_train)

# Now A3 contains best model

ML.EVAL.BEST_PARAMS() ⭐

Extracts best parameters from grid search (Premium feature).

Syntax:

=ML.EVAL.BEST_PARAMS(grid_search_model)

Parameters:

  • grid_search_model (Object, Required): Fitted GridSearchCV object

Returns: DataFrame with best parameters

  • Columns: Model | Parameter | Value

Use Case: Identify optimal hyperparameters

Example:

# After grid search
Cell A1: =ML.EVAL.BEST_PARAMS(fitted_grid_search)
Result:
# Model | Parameter | Value
# model | C         | 10
# model | kernel    | rbf

ML.EVAL.BEST_SCORE() ⭐

Gets the best cross-validation score from grid search (Premium feature).

Syntax:

=ML.EVAL.BEST_SCORE(grid_search_model)

Parameters:

  • grid_search_model (Object, Required): Fitted GridSearchCV object

Returns: Float - best CV score achieved

Use Case: Compare grid search results

Example:

Cell A1: =ML.EVAL.BEST_SCORE(fitted_grid_search)
Result: 0.9456  # Best cross-validation score

ML.EVAL.SEARCH_RESULTS() ⭐

Returns detailed grid search results (Premium feature).

Syntax:

=ML.EVAL.SEARCH_RESULTS(grid_search_model)

Parameters:

  • grid_search_model (Object, Required): Fitted GridSearchCV object

Returns: DataFrame with all parameter combinations and scores

Use Case: Analyze all tested combinations, identify patterns

Example:

Cell A1: =ML.EVAL.SEARCH_RESULTS(fitted_grid_search)
# Returns table with all parameter combos and their scores

Common Patterns

Basic Model Evaluation

# Train model
Cell A1: =ML.REGRESSION.LINEAR()
Cell B1: =ML.FIT(A1, X_train, y_train)

# Evaluate on test set
Cell C1: =ML.EVAL.SCORE(B1, X_test, y_test)
Result: 0.847  # R² score

# Check predictions
Cell D1: =ML.PREDICT(B1, X_test)
Cell E1: =ML.DATA.SAMPLE(D1, 10)

Cross-Validation Workflow

# Create model
Cell A1: =ML.CLASSIFICATION.SVM(1.0, "rbf")

# 10-fold cross-validation
Cell B1: =ML.EVAL.CV_SCORE(A1, X_train, y_train, 10, "accuracy")

# Calculate mean and std
Cell C1: =AVERAGE(B1#)  # Mean: 0.913
Cell C2: =STDEV(B1#)    # Std: 0.032

# Final training on full dataset
Cell D1: =ML.FIT(A1, X_train, y_train)
Cell E1: =ML.EVAL.SCORE(D1, X_test, y_test)

Complete Grid Search Workflow

# Create base model
Cell A1: =ML.CLASSIFICATION.RANDOM_FOREST_CLF()

# Define parameter grid
# Model | Parameter        | V1  | V2  | V3
Cell B1: "model" | "n_estimators"    | 50  | 100 | 200
Cell B2: "model" | "max_depth"       | 5   | 10  | 20
Cell B3: "model" | "min_samples_split" | 2 | 5   | 10

# Grid search
Cell C1: =ML.EVAL.GRID_SEARCH(A1, B1:E3, "accuracy", 5, TRUE)
Cell D1: =ML.FIT(C1, X_train, y_train)

# Get best parameters
Cell E1: =ML.EVAL.BEST_PARAMS(D1)
Cell E2: =ML.EVAL.BEST_SCORE(D1)

# Evaluate on test set
Cell F1: =ML.EVAL.SCORE(D1, X_test, y_test)

# Detailed results
Cell G1: =ML.EVAL.SEARCH_RESULTS(D1)
# Create pipeline
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell A2: =ML.CLASSIFICATION.SVM()
Cell B1: =ML.PIPELINE(A1, A2)

# Pipeline parameter grid (use step__param format)
# Model | Parameter | V1     | V2
Cell C1: "model" | "C"      | 0.1   | 1.0 | 10
Cell C2: "model" | "kernel" | "linear" | "rbf" |

# Grid search on pipeline
Cell D1: =ML.EVAL.GRID_SEARCH(B1, C1:E2, "accuracy", 5, TRUE)
Cell E1: =ML.FIT(D1, X_train, y_train)

# Best params and score
Cell F1: =ML.EVAL.BEST_PARAMS(E1)
Cell F2: =ML.EVAL.BEST_SCORE(E1)

Comparing Multiple Models

# Create different models
Cell A1: =ML.CLASSIFICATION.LOGISTIC()
Cell A2: =ML.CLASSIFICATION.SVM()
Cell A3: =ML.CLASSIFICATION.RANDOM_FOREST_CLF()

# Cross-validate each
Cell B1: =ML.EVAL.CV_SCORE(A1, X_train, y_train, 5, "accuracy")
Cell B2: =ML.EVAL.CV_SCORE(A2, X_train, y_train, 5, "accuracy")
Cell B3: =ML.EVAL.CV_SCORE(A3, X_train, y_train, 5, "accuracy")

# Compare mean scores
Cell C1: =AVERAGE(B1#)  # Logistic
Cell C2: =AVERAGE(B2#)  # SVM
Cell C3: =AVERAGE(B3#)  # Random Forest

# Select best and train on full data
Cell D1: =ML.FIT(A3, X_train, y_train)  # Assuming RF was best
Cell E1: =ML.EVAL.SCORE(D1, X_test, y_test)
# Create model
Cell A1: =ML.CLASSIFICATION.LOGISTIC()

# Parameter grid
Cell B1: "model" | "C"       | 0.01 | 0.1 | 1.0 | 10
Cell B2: "model" | "penalty" | "l1" | "l2" |

# Grid search with multiple metrics
Cell C1: =ML.EVAL.GRID_SEARCH(A1, B1:E2, {"accuracy","precision","recall"}, 5, TRUE)
Cell D1: =ML.FIT(C1, X_train, y_train)

# Get results for all metrics
Cell E1: =ML.EVAL.SEARCH_RESULTS(D1)

Regression Model Tuning

# Create regression model
Cell A1: =ML.REGRESSION.RANDOM_FOREST_REG()

# Parameter grid
Cell B1: "model" | "n_estimators" | 100 | 200 | 300
Cell B2: "model" | "max_depth"    | 10  | 20  | 30
Cell B3: "model" | "min_samples_leaf" | 1 | 2 | 5

# Grid search with R² scoring
Cell C1: =ML.EVAL.GRID_SEARCH(A1, B1:D3, "r2", 5, TRUE)
Cell D1: =ML.FIT(C1, X_train, y_train)

# Best parameters
Cell E1: =ML.EVAL.BEST_PARAMS(D1)
Cell E2: =ML.EVAL.BEST_SCORE(D1)

# Test set performance
Cell F1: =ML.EVAL.SCORE(D1, X_test, y_test)

Tips and Best Practices

  1. Choosing Evaluation Metrics

    • Regression: R², MSE, MAE
      • R²: Overall fit quality (0-1)
      • MSE: Penalizes large errors
      • MAE: Robust to outliers
    • Classification: Accuracy, Precision, Recall, F1
      • Accuracy: Balanced datasets
      • Precision: Minimize false positives
      • Recall: Minimize false negatives
      • F1: Balance precision and recall
  2. Cross-Validation Strategy

    • 5-fold: Good default
    • 10-fold: More reliable, slower
    • 3-fold: Quick testing
    • Always use same cv for fair comparison
    • Stratified CV for classification (automatic)
  3. Grid Search Best Practices

    • Start with wide range, then narrow
    • Use logarithmic scales for some params (e.g., C: 0.001, 0.01, 0.1, 1, 10)
    • Limit grid size (computation grows exponentially)
    • Use cross-validation (cv parameter)
  4. Parameter Ranges

    SVM C: [0.001, 0.01, 0.1, 1, 10, 100]
    SVM gamma: ['scale', 'auto', 0.001, 0.01, 0.1]
    Random Forest n_estimators: [50, 100, 200, 500]
    Random Forest max_depth: [5, 10, 20, None]
    Regularization alpha: [0.001, 0.01, 0.1, 1, 10]
    
  5. Interpreting Results

    • High train, low test: Overfitting
    • Low train, low test: Underfitting
    • CV std > 0.05: Model unstable
    • Test < CV mean: Potential issue
  6. Avoiding Common Mistakes

    • ❌ Fit on test set (data leakage)
    • ❌ Grid search on test set
    • ❌ No cross-validation
    • ❌ Too large parameter grids
    • ✅ Always use separate test set
    • ✅ Use CV for model selection
    • ✅ Report both CV and test scores
  7. Optimization Workflow

    1. Train baseline model
    2. Cross-validate to check stability
    3. Grid search for hyperparameters
    4. Retrain with best params
    5. Final evaluation on test set
    6. Report both CV and test scores
    
  8. Performance Tips

    • Reduce cv for faster experimentation
    • Limit grid size (try RandomSearchCV alternative)
    • Use scoring parameter efficiently
    • Cache results when possible

Scoring Metrics Reference

Regression Metrics

  • r2: R² score (default)
  • neg_mean_squared_error: Negative MSE
  • neg_mean_absolute_error: Negative MAE
  • neg_root_mean_squared_error: Negative RMSE

Classification Metrics

  • accuracy: Accuracy (default)
  • precision: Precision
  • recall: Recall (Sensitivity)
  • f1: F1 Score
  • roc_auc: ROC AUC
  • f1_weighted: Weighted F1 (multi-class)