Compose Functions Reference

Functions for creating advanced column-specific transformations and data pipelines.

ML.COMPOSE Namespace

ML.COMPOSE.COLUMN_TRANSFORMER()

Applies a transformer to specific columns.

Syntax:

=ML.COMPOSE.COLUMN_TRANSFORMER(transformer, cols)

Parameters:

transformer (Object, Required): Transformer to apply
cols (Array/String/Integer, Required): Columns to transform
- Single column: “column_name” or 0
- Multiple columns: {“col1”, “col2”} or {0, 1, 2}

Returns: ColumnTransformer object

Use Case: Apply different transformations to different columns

Example:

# Transform specific columns
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell A2: =ML.COMPOSE.COLUMN_TRANSFORMER(A1, {"age", "income"})

# Or with column indices
Cell B1: =ML.COMPOSE.COLUMN_TRANSFORMER(A1, {0, 1})

ML.COMPOSE.DATA_TRANSFORMER()

Combines multiple column transformers into a single transformer.

Syntax:

=ML.COMPOSE.DATA_TRANSFORMER(*args)

Parameters:

*args (Objects, Required): Multiple ColumnTransformer objects

Returns: DataTransformer object

Use Case: Apply different transformers to different column groups

Example:

# Scale numeric columns
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell A2: =ML.COMPOSE.COLUMN_TRANSFORMER(A1, {"age", "income"})

# Encode categorical columns
Cell B1: =ML.PREPROCESSING.ONE_HOT_ENCODER()
Cell B2: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"category", "region"})

# Combine transformers
Cell C1: =ML.COMPOSE.DATA_TRANSFORMER(A2, B2)

# Use in pipeline
Cell D1: =ML.CLASSIFICATION.LOGISTIC()
Cell E1: =ML.PIPELINE(C1, D1)

ML.COMPOSE.COLUMN_SELECTOR()

Selects columns based on pattern or data type.

Syntax:

=ML.COMPOSE.COLUMN_SELECTOR(pattern, dtypes)

Parameters:

pattern (String, Required): Regex pattern for column names
dtypes (Array, Required): Data types to match
- Examples: {“int”, “float”}, {“object”}, {“int64”, “float64”}

Returns: ColumnSelector object

Use Case: Automatically select columns by type or name pattern

Example:

# Select all numeric columns
Cell A1: =ML.COMPOSE.COLUMN_SELECTOR(".*", {"int64", "float64"})

# Select columns starting with "num_"
Cell B1: =ML.COMPOSE.COLUMN_SELECTOR("^num_.*", {"int64", "float64"})

ML.COMPOSE.TRANSFORMERS Namespace

ML.COMPOSE.TRANSFORMERS.DROP()

Creates a drop transformer to exclude columns.

Syntax:

=ML.COMPOSE.TRANSFORMERS.DROP()

Parameters: None

Returns: DropTransformer object

Use Case: Exclude specific columns from pipeline

Example:

# Drop ID column
Cell A1: =ML.COMPOSE.TRANSFORMERS.DROP()
Cell A2: =ML.COMPOSE.COLUMN_TRANSFORMER(A1, {"id"})

ML.COMPOSE.TRANSFORMERS.PASSTHROUGH()

Creates a passthrough transformer (no transformation).

Syntax:

=ML.COMPOSE.TRANSFORMERS.PASSTHROUGH()

Parameters: None

Returns: PassthroughTransformer object

Use Case: Keep columns unchanged in pipeline

Example:

# Pass through already processed columns
Cell A1: =ML.COMPOSE.TRANSFORMERS.PASSTHROUGH()
Cell A2: =ML.COMPOSE.COLUMN_TRANSFORMER(A1, {"preprocessed_feature"})

Common Patterns

Mixed Data Type Processing

# Assume DataFrame with numeric and categorical columns
Cell A1: =ML.DATA.CONVERT_TO_DF(Sheet1!A1:F1000, TRUE)

# Create numeric scaler
Cell B1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"age", "income", "score"})

# Create categorical encoder
Cell B2: =ML.PREPROCESSING.ONE_HOT_ENCODER()
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {"category", "region"})

# Combine transformers
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2)

# Use in model pipeline
Cell E1: =ML.CLASSIFICATION.RANDOM_FOREST_CLF()
Cell F1: =ML.PIPELINE(D1, E1)
Cell G1: =ML.FIT(F1, train_data, train_target)

Selective Column Processing

# Load data
Cell A1: =ML.DATASETS.DIABETES()

# Scale only specific features
Cell B1: =ML.PREPROCESSING.ROBUST_SCALER()
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {0, 1, 2})  # First 3 columns

# Leave others unchanged
Cell B2: =ML.COMPOSE.TRANSFORMERS.PASSTHROUGH()
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {3, 4, 5, 6, 7, 8, 9})

# Combine
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2)

# Add model
Cell E1: =ML.REGRESSION.LINEAR()
Cell F1: =ML.PIPELINE(D1, E1)

Drop Unwanted Columns

# Data with ID and timestamp columns
Cell A1: =ML.DATA.CONVERT_TO_DF(Sheet1!A1:H1000, TRUE)

# Drop non-predictive columns
Cell B1: =ML.COMPOSE.TRANSFORMERS.DROP()
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"id", "timestamp"})

# Scale remaining features
Cell B2: =ML.PREPROCESSING.STANDARD_SCALER()
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {"feature1", "feature2", "feature3"})

# Combine and use
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2)
Cell E1: =ML.CLASSIFICATION.SVM()
Cell F1: =ML.PIPELINE(D1, E1)

Different Scalers for Different Features

# Load data
Cell A1: =ML.DATA.CONVERT_TO_DF(Sheet1!A1:F1000, TRUE)

# Standard scale normal distributions
Cell B1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"age", "height"})

# Robust scale features with outliers
Cell B2: =ML.PREPROCESSING.ROBUST_SCALER()
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {"income", "spending"})

# MinMax scale bounded features
Cell B3: =ML.PREPROCESSING.MIN_MAX_SCALER()
Cell C3: =ML.COMPOSE.COLUMN_TRANSFORMER(B3, {"score"})

# Combine all transformers
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2, C3)

# Add to pipeline
Cell E1: =ML.REGRESSION.RIDGE()
Cell F1: =ML.PIPELINE(D1, E1)

Imputation and Scaling Pipeline

# Load data with missing values
Cell A1: =ML.DATA.CONVERT_TO_DF(Sheet1!A1:E1000, TRUE)

# Impute numeric columns with mean
Cell B1: =ML.IMPUTE.SIMPLE_IMPUTER("mean")
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"age", "income"})

# Impute categorical with most frequent
Cell B2: =ML.IMPUTE.SIMPLE_IMPUTER("most_frequent")
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {"category"})

# Combine imputers
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2)

# Then scale numeric
Cell E1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell F1: =ML.COMPOSE.COLUMN_TRANSFORMER(E1, {"age", "income"})

# Encode categorical
Cell E2: =ML.PREPROCESSING.ONE_HOT_ENCODER()
Cell F2: =ML.COMPOSE.COLUMN_TRANSFORMER(E2, {"category"})

# Combine scalers/encoders
Cell G1: =ML.COMPOSE.DATA_TRANSFORMER(F1, F2)

# Full pipeline: impute → scale/encode → model
Cell H1: =ML.CLASSIFICATION.LOGISTIC()
Cell I1: =ML.PIPELINE(D1, G1, H1)

Feature Engineering Pipeline

# Load data
Cell A1: =ML.DATA.CONVERT_TO_DF(Sheet1!A1:G1000, TRUE)

# Pass through engineered features (already calculated in Excel)
Cell B1: =ML.COMPOSE.TRANSFORMERS.PASSTHROUGH()
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"feature_ratio", "feature_product"})

# Scale raw features
Cell B2: =ML.PREPROCESSING.STANDARD_SCALER()
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {"raw_feature1", "raw_feature2"})

# Drop original features (now have ratios/products)
Cell B3: =ML.COMPOSE.TRANSFORMERS.DROP()
Cell C3: =ML.COMPOSE.COLUMN_TRANSFORMER(B3, {"original_feature1", "original_feature2"})

# Combine
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2, C3)

# Model
Cell E1: =ML.REGRESSION.RANDOM_FOREST_REG()
Cell F1: =ML.PIPELINE(D1, E1)

Tips and Best Practices

When to Use Compose
- Mixed data types (numeric + categorical)
- Different preprocessing for different columns
- Feature-specific transformations
- Complex data pipelines
Column Specification
- By name: {“col1”, “col2”} - More readable
- By index: {0, 1, 2} - More robust to name changes
- Single column: “col1” or 0

Transformation Order

1. Drop unwanted columns
2. Impute missing values
3. Encode categorical features
4. Scale numeric features
5. Apply model

Compose vs Pipeline
- COMPOSE: Column-specific transformations
- PIPELINE: Sequential transformations
- Combine both: Use COMPOSE in PIPELINE steps

Common Patterns

Numeric + Categorical:
- COLUMN_TRANSFORMER(scaler, numeric_cols)
- COLUMN_TRANSFORMER(encoder, categorical_cols)
- DATA_TRANSFORMER(both)

Selective Processing:
- COLUMN_TRANSFORMER(transform, selected_cols)
- COLUMN_TRANSFORMER(passthrough, other_cols)
- DATA_TRANSFORMER(both)

Performance Tips
- Group similar transformations
- Drop columns early if not needed
- Use passthrough for pre-processed columns
- Consider column order for readability
Debugging Compose Pipelines
- Test each transformer separately
- Verify column names/indices
- Check transformed output shape
- Use ML.DATA.SAMPLE to inspect results
Best Practices
- ✅ Group columns by transformation type
- ✅ Use descriptive column names
- ✅ Document column choices
- ✅ Test with sample data first
- ❌ Don’t mix column names and indices
- ❌ Don’t forget to handle all columns
- ❌ Don’t duplicate column transformations

ML.PIPELINE() - Sequential transformations
ML.PREPROCESSING Functions - Transformers to use
ML.IMPUTE Functions - Imputation transformers
ML.DATA Functions - Data preparation

Compose Functions Reference

Table of Contents

Compose Functions Reference

ML.COMPOSE Namespace

ML.COMPOSE.COLUMN_TRANSFORMER()

ML.COMPOSE.DATA_TRANSFORMER()

ML.COMPOSE.COLUMN_SELECTOR()

ML.COMPOSE.TRANSFORMERS Namespace

ML.COMPOSE.TRANSFORMERS.DROP()

ML.COMPOSE.TRANSFORMERS.PASSTHROUGH()

Common Patterns

Mixed Data Type Processing

Selective Column Processing

Drop Unwanted Columns

Different Scalers for Different Features

Imputation and Scaling Pipeline

Feature Engineering Pipeline

Tips and Best Practices

Navigation

Table of Contents

Compose Functions Reference

ML.COMPOSE Namespace

ML.COMPOSE.COLUMN_TRANSFORMER()

ML.COMPOSE.DATA_TRANSFORMER()

ML.COMPOSE.COLUMN_SELECTOR()

ML.COMPOSE.TRANSFORMERS Namespace

ML.COMPOSE.TRANSFORMERS.DROP()

ML.COMPOSE.TRANSFORMERS.PASSTHROUGH()

Common Patterns

Mixed Data Type Processing

Selective Column Processing

Drop Unwanted Columns

Different Scalers for Different Features

Imputation and Scaling Pipeline

Feature Engineering Pipeline

Tips and Best Practices

Related Functions

Navigation