Model Methods Reference

Complete reference for FormulaML model methods including fit, predict, transform, and pipeline creation.

Model Methods Reference

Core functions for training models, making predictions, and creating machine learning pipelines.

Core ML Functions

ML.FIT()

Trains an estimator or transformer on the provided data.

Syntax:

=ML.FIT(model, X, y)

Parameters:

  • model (Object, Required): Untrained model or transformer object
  • X (Object, Required): Training features (DataFrame or array)
  • y (Object, Optional): Training target (for supervised learning)

Returns: Trained model object

Use Case: Train supervised/unsupervised models, fit transformers

Example:

# Train regression model
Cell A1: =ML.REGRESSION.LINEAR()
Cell B1: =ML.FIT(A1, X_train, y_train)

# Fit transformer (no y needed)
Cell C1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell D1: =ML.FIT(C1, X_train)

# Train clustering (no y needed)
Cell E1: =ML.CLUSTERING.KMEANS(3)
Cell F1: =ML.FIT(E1, X_data)

ML.PREDICT()

Makes predictions using a trained model.

Syntax:

=ML.PREDICT(model, X)

Parameters:

  • model (Object, Required): Trained model object
  • X (Object, Required): Features for prediction

Returns: Predictions (array or DataFrame)

Use Case: Generate predictions for regression, classification, or clustering

Example:

# Predict with regression model
Cell A1: =ML.PREDICT(trained_regression, X_test)

# Predict with classifier
Cell B1: =ML.PREDICT(trained_classifier, X_test)

# Get cluster labels
Cell C1: =ML.PREDICT(trained_kmeans, X_data)

ML.TRANSFORM()

Transforms data using a fitted transformer.

Syntax:

=ML.TRANSFORM(transformer, X, y)

Parameters:

  • transformer (Object, Required): Fitted transformer object
  • X (Object, Required): Data to transform
  • y (Object, Optional): Target variable (rarely used)

Returns: Transformed data

Use Case: Apply preprocessing transformations, dimensionality reduction

Example:

# Scale test data using fitted scaler
Cell A1: =ML.TRANSFORM(fitted_scaler, X_test)

# Apply PCA transformation
Cell B1: =ML.TRANSFORM(fitted_pca, X_test)

# Encode categorical features
Cell C1: =ML.TRANSFORM(fitted_encoder, categories_test)

ML.FIT_TRANSFORM()

Fits transformer and transforms data in one step.

Syntax:

=ML.FIT_TRANSFORM(transformer, X, y)

Parameters:

  • transformer (Object, Required): Unfitted transformer object
  • X (Object, Required): Data to fit and transform
  • y (Object, Optional): Target variable (rarely used)

Returns: Transformed data

Use Case: Quick fit and transform (use only on training data)

Example:

# Fit and transform training data
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell B1: =ML.FIT_TRANSFORM(A1, X_train)

# Fit PCA and get components
Cell C1: =ML.DIM_REDUCTION.PCA(2)
Cell D1: =ML.FIT_TRANSFORM(C1, X_train)

# IMPORTANT: For test data, use fitted transformer
Cell E1: =ML.TRANSFORM(C1, X_test)  # Don't fit_transform test!

ML.PIPELINE()

Creates a pipeline of transformers and estimators.

Syntax:

=ML.PIPELINE(*args)

Parameters:

  • *args (Objects, Required): Sequence of transformers and final estimator
    • All but last must be transformers (have fit/transform)
    • Last can be transformer or estimator

Returns: Pipeline object

Use Case: Chain preprocessing steps with model, prevent data leakage

Example:

# Simple pipeline: scaler + model
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell A2: =ML.REGRESSION.LINEAR()
Cell B1: =ML.PIPELINE(A1, A2)

# Complex pipeline: multiple transformers + model
Cell C1: =ML.IMPUTE.SIMPLE_IMPUTER("mean")
Cell C2: =ML.PREPROCESSING.STANDARD_SCALER()
Cell C3: =ML.DIM_REDUCTION.PCA(10)
Cell C4: =ML.CLASSIFICATION.SVM()
Cell D1: =ML.PIPELINE(C1, C2, C3, C4)

# Fit and use pipeline
Cell E1: =ML.FIT(D1, X_train, y_train)
Cell F1: =ML.PREDICT(E1, X_test)

ML.OBJECT_INFO()

Returns information about a model or transformer object.

Syntax:

=ML.OBJECT_INFO(obj)

Parameters:

  • obj (Object, Required): Any ML object

Returns: String with object information

Use Case: Debug, inspect object state

Example:

# Get info about trained model
Cell A1: =ML.OBJECT_INFO(trained_model)

Common Patterns

Complete Training Workflow

# Create model
Cell A1: =ML.CLASSIFICATION.LOGISTIC()

# Fit on training data
Cell B1: =ML.FIT(A1, X_train, y_train)

# Make predictions
Cell C1: =ML.PREDICT(B1, X_test)

# Evaluate
Cell D1: =ML.EVAL.SCORE(B1, X_test, y_test)

Preprocessing + Model Pipeline

# Create components
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell A2: =ML.CLASSIFICATION.SVM(1.0, "rbf")

# Create pipeline
Cell B1: =ML.PIPELINE(A1, A2)

# Train pipeline (auto scales then trains)
Cell C1: =ML.FIT(B1, X_train, y_train)

# Predict (auto scales then predicts)
Cell D1: =ML.PREDICT(C1, X_test)

Dimensionality Reduction Workflow

# Create PCA transformer
Cell A1: =ML.DIM_REDUCTION.PCA(2)

# Fit and transform training data
Cell B1: =ML.FIT_TRANSFORM(A1, X_train)

# Transform test data (use same PCA fit)
Cell C1: =ML.TRANSFORM(A1, X_test)

# Sample transformed data
Cell D1: =ML.DATA.SAMPLE(B1, 10)
Cell D2: =ML.DATA.SAMPLE(C1, 10)

Feature Scaling Best Practice

# Create scaler
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()

# Fit on training data only!
Cell B1: =ML.FIT(A1, X_train)

# Transform both train and test
Cell C1: =ML.TRANSFORM(B1, X_train)  # Training data
Cell C2: =ML.TRANSFORM(B1, X_test)   # Test data

# Or use FIT_TRANSFORM for training
Cell D1: =ML.FIT_TRANSFORM(A1, X_train)  # Equivalent to C1
Cell D2: =ML.TRANSFORM(A1, X_test)       # Must use TRANSFORM for test

Multi-Step Pipeline

# Step 1: Impute missing values
Cell A1: =ML.IMPUTE.SIMPLE_IMPUTER("mean")

# Step 2: Scale features
Cell A2: =ML.PREPROCESSING.STANDARD_SCALER()

# Step 3: Reduce dimensions
Cell A3: =ML.DIM_REDUCTION.PCA(20)

# Step 4: Train classifier
Cell A4: =ML.CLASSIFICATION.RANDOM_FOREST_CLF(100)

# Create pipeline
Cell B1: =ML.PIPELINE(A1, A2, A3, A4)

# Fit entire pipeline
Cell C1: =ML.FIT(B1, X_train, y_train)

# Predict (all steps applied automatically)
Cell D1: =ML.PREDICT(C1, X_test)

Transformer-Only Pipeline

# Create preprocessing-only pipeline
Cell A1: =ML.PREPROCESSING.ROBUST_SCALER()
Cell A2: =ML.DIM_REDUCTION.PCA(10)

# Combine transformers
Cell B1: =ML.PIPELINE(A1, A2)

# Fit and transform
Cell C1: =ML.FIT_TRANSFORM(B1, X_train)
Cell C2: =ML.TRANSFORM(B1, X_test)

# Now use transformed data for modeling
Cell D1: =ML.REGRESSION.RIDGE(1.0)
Cell E1: =ML.FIT(D1, C1, y_train)
# Create pipeline
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell A2: =ML.CLASSIFICATION.SVM()
Cell B1: =ML.PIPELINE(A1, A2)

# Parameter grid for pipeline steps
# Model | Parameter | Value1 | Value2 | Value3
Cell C1: "model" | "C" | 0.1 | 1 | 10
Cell C2: "model" | "kernel" | "linear" | "rbf" |

# Grid search on pipeline
Cell D1: =ML.EVAL.GRID_SEARCH(B1, C1:E2, "accuracy", 5, TRUE)
Cell E1: =ML.FIT(D1, X_train, y_train)

# Get best parameters
Cell F1: =ML.EVAL.BEST_PARAMS(E1)

Reusing Fitted Transformers

# Fit scaler once
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell B1: =ML.FIT(A1, X_train)

# Use for multiple purposes
Cell C1: =ML.TRANSFORM(B1, X_train)  # Scaled training
Cell C2: =ML.TRANSFORM(B1, X_test)   # Scaled test
Cell C3: =ML.TRANSFORM(B1, X_new)    # Scaled new data

# All use same scaling parameters

Tips and Best Practices

  1. Fit/Transform Pattern

    • FIT on training data only
    • TRANSFORM both train and test
    • FIT_TRANSFORM = convenience for train only
    • Never fit on test data (data leakage!)
  2. Pipeline Benefits

    • Prevents data leakage automatically
    • Ensures correct preprocessing order
    • Simplifies code and deployment
    • Works seamlessly with grid search
  3. When to Use Each Function

    • FIT: Supervised models, transformers
    • PREDICT: After fitting estimators
    • TRANSFORM: After fitting transformers
    • FIT_TRANSFORM: Quick train preprocessing
    • PIPELINE: Combine multiple steps
  4. Pipeline Best Practices

    • Order: imputation → encoding → scaling → dim reduction → model
    • All steps except last must be transformers
    • Last step can be estimator or transformer
    • Use clear, descriptive step names
  5. Common Workflows

    Regression: Scale → Model → Predict
    Classification: Encode → Scale → Model → Predict
    Clustering: Scale → Cluster → Labels
    Dim Reduction: Scale → PCA → Transform
    
  6. Avoiding Data Leakage

    • ✅ Pipeline ensures no leakage
    • ✅ Fit transformers on train only
    • ✅ Use same fitted objects for test
    • ❌ Never fit_transform on test
    • ❌ Never include test in fitting
  7. Memory and Performance

    • FIT_TRANSFORM is more efficient than FIT + TRANSFORM
    • Pipelines cache intermediate results
    • Reuse fitted objects when possible
    • Consider data size for complex pipelines
  8. Debugging Pipelines

    • Test each step individually first
    • Use ML.DATA.SAMPLE to inspect outputs
    • Check shapes at each step
    • Use ML.INSPECT.GET_PARAMS for diagnostics