Table of Contents
Regression Models Reference
Functions for creating and training regression models to predict continuous values.
ML.REGRESSION Namespace
ML.REGRESSION.LINEAR()
Creates a Linear Regression model for predicting continuous values.
Syntax:
=ML.REGRESSION.LINEAR(fit_intercept)
Parameters:
fit_intercept
(Boolean, Optional): Whether to calculate the intercept (default: TRUE)- TRUE: Include intercept in the model
- FALSE: Force regression through the origin
Returns: Linear Regression model object
Use Case: Simple linear relationships between features and target
Example:
# Create model
Cell A1: =ML.REGRESSION.LINEAR()
Result: <LinearRegression>
# Train model
Cell B1: =ML.FIT(A1, X_train, y_train)
# Make predictions
Cell C1: =ML.PREDICT(B1, X_test)
ML.REGRESSION.RIDGE()
Creates a Ridge Regression model with L2 regularization to prevent overfitting.
Syntax:
=ML.REGRESSION.RIDGE(alpha, fit_intercept)
Parameters:
alpha
(Number, Optional): Regularization strength (default: 1.0)- Larger values = stronger regularization
- Must be positive
fit_intercept
(Boolean, Optional): Whether to calculate intercept (default: TRUE)
Returns: Ridge Regression model object
Use Case: When features are correlated or to prevent overfitting
Example:
# Create Ridge model with alpha=0.5
Cell A1: =ML.REGRESSION.RIDGE(0.5)
Result: <Ridge>
# Strong regularization
Cell A2: =ML.REGRESSION.RIDGE(10)
# Train model
Cell B1: =ML.FIT(A1, X_train, y_train)
ML.REGRESSION.LASSO()
Creates a Lasso Regression model with L1 regularization for feature selection.
Syntax:
=ML.REGRESSION.LASSO(alpha, fit_intercept)
Parameters:
alpha
(Number, Optional): Regularization strength (default: 1.0)- Larger values = more features set to zero
- Must be positive
fit_intercept
(Boolean, Optional): Whether to calculate intercept (default: TRUE)
Returns: Lasso Regression model object
Use Case: Automatic feature selection, sparse models
Example:
# Create Lasso model
Cell A1: =ML.REGRESSION.LASSO(0.1)
Result: <Lasso>
# Train and use for feature selection
Cell B1: =ML.FIT(A1, X_train, y_train)
Cell C1: =ML.PREDICT(B1, X_test)
ML.REGRESSION.ELASTIC_NET()
Creates an Elastic Net model combining L1 and L2 regularization.
Syntax:
=ML.REGRESSION.ELASTIC_NET(alpha, l1_ratio, fit_intercept)
Parameters:
alpha
(Number, Optional): Regularization strength (default: 1.0)l1_ratio
(Number, Optional): L1/L2 mix ratio, 0 to 1 (default: 0.5)- 0 = Pure Ridge (L2)
- 1 = Pure Lasso (L1)
- 0.5 = Equal mix
fit_intercept
(Boolean, Optional): Whether to calculate intercept (default: TRUE)
Returns: Elastic Net model object
Use Case: Balance between Ridge and Lasso, correlated features with feature selection
Example:
# Balanced Elastic Net
Cell A1: =ML.REGRESSION.ELASTIC_NET(1.0, 0.5)
Result: <ElasticNet>
# More L1 (Lasso-like)
Cell A2: =ML.REGRESSION.ELASTIC_NET(1.0, 0.8)
# More L2 (Ridge-like)
Cell A3: =ML.REGRESSION.ELASTIC_NET(1.0, 0.2)
ML.REGRESSION.RANDOM_FOREST_REG() ⭐
Creates a Random Forest Regression model (Premium feature).
Syntax:
=ML.REGRESSION.RANDOM_FOREST_REG(n_estimators, criterion, max_depth, min_samples_split, min_samples_leaf, max_features, bootstrap, random_state)
Parameters:
n_estimators
(Integer, Optional): Number of trees (default: 100)criterion
(String, Optional): Split quality measure (default: “squared_error”)- “squared_error”: Mean squared error
- “absolute_error”: Mean absolute error
- “friedman_mse”: Friedman’s improvement
- “poisson”: Poisson loss
max_depth
(Integer, Optional): Maximum tree depth (default: None = unlimited)min_samples_split
(Integer, Optional): Min samples to split node (default: 2)min_samples_leaf
(Integer, Optional): Min samples at leaf (default: 1)max_features
(Number, Optional): Features per split (default: 1.0 = all)bootstrap
(Boolean, Optional): Use bootstrap samples (default: TRUE)random_state
(Integer, Optional): Random seed for reproducibility
Returns: Random Forest Regressor model object
Use Case: Non-linear relationships, feature importance, robust predictions
Example:
# Basic Random Forest
Cell A1: =ML.REGRESSION.RANDOM_FOREST_REG()
Result: <RandomForestRegressor>
# Customized forest
Cell A2: =ML.REGRESSION.RANDOM_FOREST_REG(200, "squared_error", 10, 5, 2, 0.8, TRUE, 42)
# Train model
Cell B1: =ML.FIT(A1, X_train, y_train)
Cell C1: =ML.PREDICT(B1, X_test)
Common Patterns
Simple Linear Regression
# Load data
Cell A1: =ML.DATASETS.DIABETES()
# Split features and target
Cell B1: =ML.DATA.SELECT_COLUMNS(A1, {0,1,2,3,4,5,6,7,8,9})
Cell C1: =ML.DATA.SELECT_COLUMNS(A1, 10)
# Split train/test
Cell D1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(B1, 0.2, 42, 0) # Train
Cell D2: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(B1, 0.2, 42, 1) # Test
Cell E1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(C1, 0.2, 42, 0) # Train y
Cell E2: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(C1, 0.2, 42, 1) # Test y
# Create and train model
Cell F1: =ML.REGRESSION.LINEAR()
Cell G1: =ML.FIT(F1, D1, E1)
# Predict and evaluate
Cell H1: =ML.PREDICT(G1, D2)
Cell I1: =ML.EVAL.SCORE(G1, D2, E2)
Regularized Regression with Scaling
# Create scaler and model
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell A2: =ML.REGRESSION.RIDGE(1.0)
# Create pipeline
Cell B1: =ML.PIPELINE(A1, A2)
# Train pipeline
Cell C1: =ML.FIT(B1, X_train, y_train)
# Predict
Cell D1: =ML.PREDICT(C1, X_test)
Comparing Regression Models
# Create multiple models
Cell A1: =ML.REGRESSION.LINEAR()
Cell A2: =ML.REGRESSION.RIDGE(1.0)
Cell A3: =ML.REGRESSION.LASSO(0.1)
Cell A4: =ML.REGRESSION.ELASTIC_NET(1.0, 0.5)
# Train all models
Cell B1: =ML.FIT(A1, X_train, y_train)
Cell B2: =ML.FIT(A2, X_train, y_train)
Cell B3: =ML.FIT(A3, X_train, y_train)
Cell B4: =ML.FIT(A4, X_train, y_train)
# Compare scores
Cell C1: =ML.EVAL.SCORE(B1, X_test, y_test) # Linear
Cell C2: =ML.EVAL.SCORE(B2, X_test, y_test) # Ridge
Cell C3: =ML.EVAL.SCORE(B3, X_test, y_test) # Lasso
Cell C4: =ML.EVAL.SCORE(B4, X_test, y_test) # Elastic Net
Random Forest with Grid Search
# Create base model
Cell A1: =ML.REGRESSION.RANDOM_FOREST_REG(100, "squared_error", , , , , TRUE, 42)
# Prepare parameter grid
# Model | Parameter | Value1 | Value2 | Value3
Cell B1: "model" | "n_estimators" | 50 | 100 | 200
Cell B2: "model" | "max_depth" | 5 | 10 | 20
Cell B3: "model" | "min_samples_split" | 2 | 5 | 10
# Grid search
Cell C1: =ML.EVAL.GRID_SEARCH(A1, B1:E3, "r2", 5, TRUE)
Cell D1: =ML.FIT(C1, X_train, y_train)
# Get best parameters
Cell E1: =ML.EVAL.BEST_PARAMS(D1)
Cell F1: =ML.EVAL.BEST_SCORE(D1)
Tips and Best Practices
-
Model Selection
- Linear: Start here for simple relationships
- Ridge: When features are correlated
- Lasso: For automatic feature selection
- Elastic Net: Balance of Ridge and Lasso
- Random Forest: For complex non-linear patterns
-
Regularization Tuning
- Start with alpha=1.0, adjust based on performance
- Higher alpha = simpler model, may underfit
- Lower alpha = complex model, may overfit
- Use cross-validation to find optimal alpha
-
Feature Scaling
- Always scale features for Ridge, Lasso, Elastic Net
- Not required for Random Forest
- Use StandardScaler or MinMaxScaler
-
Random Forest Tips
- More trees (n_estimators) = better but slower
- Limit max_depth to prevent overfitting
- Use random_state for reproducibility
- Bootstrap=TRUE for better generalization
-
Evaluation
- Use R² score for model comparison
- Check predictions on test set
- Compare multiple models systematically
Related Functions
- ML.FIT() - Train regression models
- ML.PREDICT() - Make predictions
- ML.EVAL.SCORE() - Evaluate model performance
- ML.PREPROCESSING Functions - Scale and prepare data
- ML.PIPELINE() - Combine preprocessing and models