Regression Models Reference

Complete reference for FormulaML regression models including Linear, Ridge, Lasso, Elastic Net, and Random Forest regression.

Regression Models Reference

Functions for creating and training regression models to predict continuous values.

ML.REGRESSION Namespace

ML.REGRESSION.LINEAR()

Creates a Linear Regression model for predicting continuous values.

Syntax:

=ML.REGRESSION.LINEAR(fit_intercept)

Parameters:

  • fit_intercept (Boolean, Optional): Whether to calculate the intercept (default: TRUE)
    • TRUE: Include intercept in the model
    • FALSE: Force regression through the origin

Returns: Linear Regression model object

Use Case: Simple linear relationships between features and target

Example:

# Create model
Cell A1: =ML.REGRESSION.LINEAR()
Result: <LinearRegression>

# Train model
Cell B1: =ML.FIT(A1, X_train, y_train)

# Make predictions
Cell C1: =ML.PREDICT(B1, X_test)

ML.REGRESSION.RIDGE()

Creates a Ridge Regression model with L2 regularization to prevent overfitting.

Syntax:

=ML.REGRESSION.RIDGE(alpha, fit_intercept)

Parameters:

  • alpha (Number, Optional): Regularization strength (default: 1.0)
    • Larger values = stronger regularization
    • Must be positive
  • fit_intercept (Boolean, Optional): Whether to calculate intercept (default: TRUE)

Returns: Ridge Regression model object

Use Case: When features are correlated or to prevent overfitting

Example:

# Create Ridge model with alpha=0.5
Cell A1: =ML.REGRESSION.RIDGE(0.5)
Result: <Ridge>

# Strong regularization
Cell A2: =ML.REGRESSION.RIDGE(10)

# Train model
Cell B1: =ML.FIT(A1, X_train, y_train)

ML.REGRESSION.LASSO()

Creates a Lasso Regression model with L1 regularization for feature selection.

Syntax:

=ML.REGRESSION.LASSO(alpha, fit_intercept)

Parameters:

  • alpha (Number, Optional): Regularization strength (default: 1.0)
    • Larger values = more features set to zero
    • Must be positive
  • fit_intercept (Boolean, Optional): Whether to calculate intercept (default: TRUE)

Returns: Lasso Regression model object

Use Case: Automatic feature selection, sparse models

Example:

# Create Lasso model
Cell A1: =ML.REGRESSION.LASSO(0.1)
Result: <Lasso>

# Train and use for feature selection
Cell B1: =ML.FIT(A1, X_train, y_train)
Cell C1: =ML.PREDICT(B1, X_test)

ML.REGRESSION.ELASTIC_NET()

Creates an Elastic Net model combining L1 and L2 regularization.

Syntax:

=ML.REGRESSION.ELASTIC_NET(alpha, l1_ratio, fit_intercept)

Parameters:

  • alpha (Number, Optional): Regularization strength (default: 1.0)
  • l1_ratio (Number, Optional): L1/L2 mix ratio, 0 to 1 (default: 0.5)
    • 0 = Pure Ridge (L2)
    • 1 = Pure Lasso (L1)
    • 0.5 = Equal mix
  • fit_intercept (Boolean, Optional): Whether to calculate intercept (default: TRUE)

Returns: Elastic Net model object

Use Case: Balance between Ridge and Lasso, correlated features with feature selection

Example:

# Balanced Elastic Net
Cell A1: =ML.REGRESSION.ELASTIC_NET(1.0, 0.5)
Result: <ElasticNet>

# More L1 (Lasso-like)
Cell A2: =ML.REGRESSION.ELASTIC_NET(1.0, 0.8)

# More L2 (Ridge-like)
Cell A3: =ML.REGRESSION.ELASTIC_NET(1.0, 0.2)

ML.REGRESSION.RANDOM_FOREST_REG() ⭐

Creates a Random Forest Regression model (Premium feature).

Syntax:

=ML.REGRESSION.RANDOM_FOREST_REG(n_estimators, criterion, max_depth, min_samples_split, min_samples_leaf, max_features, bootstrap, random_state)

Parameters:

  • n_estimators (Integer, Optional): Number of trees (default: 100)
  • criterion (String, Optional): Split quality measure (default: “squared_error”)
    • “squared_error”: Mean squared error
    • “absolute_error”: Mean absolute error
    • “friedman_mse”: Friedman’s improvement
    • “poisson”: Poisson loss
  • max_depth (Integer, Optional): Maximum tree depth (default: None = unlimited)
  • min_samples_split (Integer, Optional): Min samples to split node (default: 2)
  • min_samples_leaf (Integer, Optional): Min samples at leaf (default: 1)
  • max_features (Number, Optional): Features per split (default: 1.0 = all)
  • bootstrap (Boolean, Optional): Use bootstrap samples (default: TRUE)
  • random_state (Integer, Optional): Random seed for reproducibility

Returns: Random Forest Regressor model object

Use Case: Non-linear relationships, feature importance, robust predictions

Example:

# Basic Random Forest
Cell A1: =ML.REGRESSION.RANDOM_FOREST_REG()
Result: <RandomForestRegressor>

# Customized forest
Cell A2: =ML.REGRESSION.RANDOM_FOREST_REG(200, "squared_error", 10, 5, 2, 0.8, TRUE, 42)

# Train model
Cell B1: =ML.FIT(A1, X_train, y_train)
Cell C1: =ML.PREDICT(B1, X_test)

Common Patterns

Simple Linear Regression

# Load data
Cell A1: =ML.DATASETS.DIABETES()

# Split features and target
Cell B1: =ML.DATA.SELECT_COLUMNS(A1, {0,1,2,3,4,5,6,7,8,9})
Cell C1: =ML.DATA.SELECT_COLUMNS(A1, 10)

# Split train/test
Cell D1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(B1, 0.2, 42, 0)  # Train
Cell D2: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(B1, 0.2, 42, 1)  # Test
Cell E1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(C1, 0.2, 42, 0)  # Train y
Cell E2: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(C1, 0.2, 42, 1)  # Test y

# Create and train model
Cell F1: =ML.REGRESSION.LINEAR()
Cell G1: =ML.FIT(F1, D1, E1)

# Predict and evaluate
Cell H1: =ML.PREDICT(G1, D2)
Cell I1: =ML.EVAL.SCORE(G1, D2, E2)

Regularized Regression with Scaling

# Create scaler and model
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell A2: =ML.REGRESSION.RIDGE(1.0)

# Create pipeline
Cell B1: =ML.PIPELINE(A1, A2)

# Train pipeline
Cell C1: =ML.FIT(B1, X_train, y_train)

# Predict
Cell D1: =ML.PREDICT(C1, X_test)

Comparing Regression Models

# Create multiple models
Cell A1: =ML.REGRESSION.LINEAR()
Cell A2: =ML.REGRESSION.RIDGE(1.0)
Cell A3: =ML.REGRESSION.LASSO(0.1)
Cell A4: =ML.REGRESSION.ELASTIC_NET(1.0, 0.5)

# Train all models
Cell B1: =ML.FIT(A1, X_train, y_train)
Cell B2: =ML.FIT(A2, X_train, y_train)
Cell B3: =ML.FIT(A3, X_train, y_train)
Cell B4: =ML.FIT(A4, X_train, y_train)

# Compare scores
Cell C1: =ML.EVAL.SCORE(B1, X_test, y_test)  # Linear
Cell C2: =ML.EVAL.SCORE(B2, X_test, y_test)  # Ridge
Cell C3: =ML.EVAL.SCORE(B3, X_test, y_test)  # Lasso
Cell C4: =ML.EVAL.SCORE(B4, X_test, y_test)  # Elastic Net
# Create base model
Cell A1: =ML.REGRESSION.RANDOM_FOREST_REG(100, "squared_error", , , , , TRUE, 42)

# Prepare parameter grid
# Model | Parameter | Value1 | Value2 | Value3
Cell B1: "model" | "n_estimators" | 50 | 100 | 200
Cell B2: "model" | "max_depth" | 5 | 10 | 20
Cell B3: "model" | "min_samples_split" | 2 | 5 | 10

# Grid search
Cell C1: =ML.EVAL.GRID_SEARCH(A1, B1:E3, "r2", 5, TRUE)
Cell D1: =ML.FIT(C1, X_train, y_train)

# Get best parameters
Cell E1: =ML.EVAL.BEST_PARAMS(D1)
Cell F1: =ML.EVAL.BEST_SCORE(D1)

Tips and Best Practices

  1. Model Selection

    • Linear: Start here for simple relationships
    • Ridge: When features are correlated
    • Lasso: For automatic feature selection
    • Elastic Net: Balance of Ridge and Lasso
    • Random Forest: For complex non-linear patterns
  2. Regularization Tuning

    • Start with alpha=1.0, adjust based on performance
    • Higher alpha = simpler model, may underfit
    • Lower alpha = complex model, may overfit
    • Use cross-validation to find optimal alpha
  3. Feature Scaling

    • Always scale features for Ridge, Lasso, Elastic Net
    • Not required for Random Forest
    • Use StandardScaler or MinMaxScaler
  4. Random Forest Tips

    • More trees (n_estimators) = better but slower
    • Limit max_depth to prevent overfitting
    • Use random_state for reproducibility
    • Bootstrap=TRUE for better generalization
  5. Evaluation

    • Use R² score for model comparison
    • Check predictions on test set
    • Compare multiple models systematically