Compose Functions Reference

Complete reference for FormulaML compose functions for building advanced transformation pipelines.

Compose Functions Reference

Functions for creating advanced column-specific transformations and data pipelines.

ML.COMPOSE Namespace

ML.COMPOSE.COLUMN_TRANSFORMER()

Applies a transformer to specific columns.

Syntax:

=ML.COMPOSE.COLUMN_TRANSFORMER(transformer, cols)

Parameters:

  • transformer (Object, Required): Transformer to apply
  • cols (Array/String/Integer, Required): Columns to transform
    • Single column: “column_name” or 0
    • Multiple columns: {“col1”, “col2”} or {0, 1, 2}

Returns: ColumnTransformer object

Use Case: Apply different transformations to different columns

Example:

# Transform specific columns
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell A2: =ML.COMPOSE.COLUMN_TRANSFORMER(A1, {"age", "income"})

# Or with column indices
Cell B1: =ML.COMPOSE.COLUMN_TRANSFORMER(A1, {0, 1})

ML.COMPOSE.DATA_TRANSFORMER()

Combines multiple column transformers into a single transformer.

Syntax:

=ML.COMPOSE.DATA_TRANSFORMER(*args)

Parameters:

  • *args (Objects, Required): Multiple ColumnTransformer objects

Returns: DataTransformer object

Use Case: Apply different transformers to different column groups

Example:

# Scale numeric columns
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell A2: =ML.COMPOSE.COLUMN_TRANSFORMER(A1, {"age", "income"})

# Encode categorical columns
Cell B1: =ML.PREPROCESSING.ONE_HOT_ENCODER()
Cell B2: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"category", "region"})

# Combine transformers
Cell C1: =ML.COMPOSE.DATA_TRANSFORMER(A2, B2)

# Use in pipeline
Cell D1: =ML.CLASSIFICATION.LOGISTIC()
Cell E1: =ML.PIPELINE(C1, D1)

ML.COMPOSE.COLUMN_SELECTOR()

Selects columns based on pattern or data type.

Syntax:

=ML.COMPOSE.COLUMN_SELECTOR(pattern, dtypes)

Parameters:

  • pattern (String, Required): Regex pattern for column names
  • dtypes (Array, Required): Data types to match
    • Examples: {“int”, “float”}, {“object”}, {“int64”, “float64”}

Returns: ColumnSelector object

Use Case: Automatically select columns by type or name pattern

Example:

# Select all numeric columns
Cell A1: =ML.COMPOSE.COLUMN_SELECTOR(".*", {"int64", "float64"})

# Select columns starting with "num_"
Cell B1: =ML.COMPOSE.COLUMN_SELECTOR("^num_.*", {"int64", "float64"})

ML.COMPOSE.TRANSFORMERS Namespace

ML.COMPOSE.TRANSFORMERS.DROP()

Creates a drop transformer to exclude columns.

Syntax:

=ML.COMPOSE.TRANSFORMERS.DROP()

Parameters: None

Returns: DropTransformer object

Use Case: Exclude specific columns from pipeline

Example:

# Drop ID column
Cell A1: =ML.COMPOSE.TRANSFORMERS.DROP()
Cell A2: =ML.COMPOSE.COLUMN_TRANSFORMER(A1, {"id"})

ML.COMPOSE.TRANSFORMERS.PASSTHROUGH()

Creates a passthrough transformer (no transformation).

Syntax:

=ML.COMPOSE.TRANSFORMERS.PASSTHROUGH()

Parameters: None

Returns: PassthroughTransformer object

Use Case: Keep columns unchanged in pipeline

Example:

# Pass through already processed columns
Cell A1: =ML.COMPOSE.TRANSFORMERS.PASSTHROUGH()
Cell A2: =ML.COMPOSE.COLUMN_TRANSFORMER(A1, {"preprocessed_feature"})

Common Patterns

Mixed Data Type Processing

# Assume DataFrame with numeric and categorical columns
Cell A1: =ML.DATA.CONVERT_TO_DF(Sheet1!A1:F1000, TRUE)

# Create numeric scaler
Cell B1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"age", "income", "score"})

# Create categorical encoder
Cell B2: =ML.PREPROCESSING.ONE_HOT_ENCODER()
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {"category", "region"})

# Combine transformers
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2)

# Use in model pipeline
Cell E1: =ML.CLASSIFICATION.RANDOM_FOREST_CLF()
Cell F1: =ML.PIPELINE(D1, E1)
Cell G1: =ML.FIT(F1, train_data, train_target)

Selective Column Processing

# Load data
Cell A1: =ML.DATASETS.DIABETES()

# Scale only specific features
Cell B1: =ML.PREPROCESSING.ROBUST_SCALER()
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {0, 1, 2})  # First 3 columns

# Leave others unchanged
Cell B2: =ML.COMPOSE.TRANSFORMERS.PASSTHROUGH()
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {3, 4, 5, 6, 7, 8, 9})

# Combine
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2)

# Add model
Cell E1: =ML.REGRESSION.LINEAR()
Cell F1: =ML.PIPELINE(D1, E1)

Drop Unwanted Columns

# Data with ID and timestamp columns
Cell A1: =ML.DATA.CONVERT_TO_DF(Sheet1!A1:H1000, TRUE)

# Drop non-predictive columns
Cell B1: =ML.COMPOSE.TRANSFORMERS.DROP()
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"id", "timestamp"})

# Scale remaining features
Cell B2: =ML.PREPROCESSING.STANDARD_SCALER()
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {"feature1", "feature2", "feature3"})

# Combine and use
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2)
Cell E1: =ML.CLASSIFICATION.SVM()
Cell F1: =ML.PIPELINE(D1, E1)

Different Scalers for Different Features

# Load data
Cell A1: =ML.DATA.CONVERT_TO_DF(Sheet1!A1:F1000, TRUE)

# Standard scale normal distributions
Cell B1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"age", "height"})

# Robust scale features with outliers
Cell B2: =ML.PREPROCESSING.ROBUST_SCALER()
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {"income", "spending"})

# MinMax scale bounded features
Cell B3: =ML.PREPROCESSING.MIN_MAX_SCALER()
Cell C3: =ML.COMPOSE.COLUMN_TRANSFORMER(B3, {"score"})

# Combine all transformers
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2, C3)

# Add to pipeline
Cell E1: =ML.REGRESSION.RIDGE()
Cell F1: =ML.PIPELINE(D1, E1)

Imputation and Scaling Pipeline

# Load data with missing values
Cell A1: =ML.DATA.CONVERT_TO_DF(Sheet1!A1:E1000, TRUE)

# Impute numeric columns with mean
Cell B1: =ML.IMPUTE.SIMPLE_IMPUTER("mean")
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"age", "income"})

# Impute categorical with most frequent
Cell B2: =ML.IMPUTE.SIMPLE_IMPUTER("most_frequent")
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {"category"})

# Combine imputers
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2)

# Then scale numeric
Cell E1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell F1: =ML.COMPOSE.COLUMN_TRANSFORMER(E1, {"age", "income"})

# Encode categorical
Cell E2: =ML.PREPROCESSING.ONE_HOT_ENCODER()
Cell F2: =ML.COMPOSE.COLUMN_TRANSFORMER(E2, {"category"})

# Combine scalers/encoders
Cell G1: =ML.COMPOSE.DATA_TRANSFORMER(F1, F2)

# Full pipeline: impute → scale/encode → model
Cell H1: =ML.CLASSIFICATION.LOGISTIC()
Cell I1: =ML.PIPELINE(D1, G1, H1)

Feature Engineering Pipeline

# Load data
Cell A1: =ML.DATA.CONVERT_TO_DF(Sheet1!A1:G1000, TRUE)

# Pass through engineered features (already calculated in Excel)
Cell B1: =ML.COMPOSE.TRANSFORMERS.PASSTHROUGH()
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"feature_ratio", "feature_product"})

# Scale raw features
Cell B2: =ML.PREPROCESSING.STANDARD_SCALER()
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {"raw_feature1", "raw_feature2"})

# Drop original features (now have ratios/products)
Cell B3: =ML.COMPOSE.TRANSFORMERS.DROP()
Cell C3: =ML.COMPOSE.COLUMN_TRANSFORMER(B3, {"original_feature1", "original_feature2"})

# Combine
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2, C3)

# Model
Cell E1: =ML.REGRESSION.RANDOM_FOREST_REG()
Cell F1: =ML.PIPELINE(D1, E1)

Tips and Best Practices

  1. When to Use Compose

    • Mixed data types (numeric + categorical)
    • Different preprocessing for different columns
    • Feature-specific transformations
    • Complex data pipelines
  2. Column Specification

    • By name: {“col1”, “col2”} - More readable
    • By index: {0, 1, 2} - More robust to name changes
    • Single column: “col1” or 0
  3. Transformation Order

    1. Drop unwanted columns
    2. Impute missing values
    3. Encode categorical features
    4. Scale numeric features
    5. Apply model
    
  4. Compose vs Pipeline

    • COMPOSE: Column-specific transformations
    • PIPELINE: Sequential transformations
    • Combine both: Use COMPOSE in PIPELINE steps
  5. Common Patterns

    Numeric + Categorical:
    - COLUMN_TRANSFORMER(scaler, numeric_cols)
    - COLUMN_TRANSFORMER(encoder, categorical_cols)
    - DATA_TRANSFORMER(both)
    
    Selective Processing:
    - COLUMN_TRANSFORMER(transform, selected_cols)
    - COLUMN_TRANSFORMER(passthrough, other_cols)
    - DATA_TRANSFORMER(both)
    
  6. Performance Tips

    • Group similar transformations
    • Drop columns early if not needed
    • Use passthrough for pre-processed columns
    • Consider column order for readability
  7. Debugging Compose Pipelines

    • Test each transformer separately
    • Verify column names/indices
    • Check transformed output shape
    • Use ML.DATA.SAMPLE to inspect results
  8. Best Practices

    • ✅ Group columns by transformation type
    • ✅ Use descriptive column names
    • ✅ Document column choices
    • ✅ Test with sample data first
    • ❌ Don’t mix column names and indices
    • ❌ Don’t forget to handle all columns
    • ❌ Don’t duplicate column transformations