Table of Contents
Compose Functions Reference
Functions for creating advanced column-specific transformations and data pipelines.
ML.COMPOSE Namespace
ML.COMPOSE.COLUMN_TRANSFORMER()
Applies a transformer to specific columns.
Syntax:
=ML.COMPOSE.COLUMN_TRANSFORMER(transformer, cols)
Parameters:
transformer
(Object, Required): Transformer to applycols
(Array/String/Integer, Required): Columns to transform- Single column: “column_name” or 0
- Multiple columns: {“col1”, “col2”} or {0, 1, 2}
Returns: ColumnTransformer object
Use Case: Apply different transformations to different columns
Example:
# Transform specific columns
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell A2: =ML.COMPOSE.COLUMN_TRANSFORMER(A1, {"age", "income"})
# Or with column indices
Cell B1: =ML.COMPOSE.COLUMN_TRANSFORMER(A1, {0, 1})
ML.COMPOSE.DATA_TRANSFORMER()
Combines multiple column transformers into a single transformer.
Syntax:
=ML.COMPOSE.DATA_TRANSFORMER(*args)
Parameters:
*args
(Objects, Required): Multiple ColumnTransformer objects
Returns: DataTransformer object
Use Case: Apply different transformers to different column groups
Example:
# Scale numeric columns
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell A2: =ML.COMPOSE.COLUMN_TRANSFORMER(A1, {"age", "income"})
# Encode categorical columns
Cell B1: =ML.PREPROCESSING.ONE_HOT_ENCODER()
Cell B2: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"category", "region"})
# Combine transformers
Cell C1: =ML.COMPOSE.DATA_TRANSFORMER(A2, B2)
# Use in pipeline
Cell D1: =ML.CLASSIFICATION.LOGISTIC()
Cell E1: =ML.PIPELINE(C1, D1)
ML.COMPOSE.COLUMN_SELECTOR()
Selects columns based on pattern or data type.
Syntax:
=ML.COMPOSE.COLUMN_SELECTOR(pattern, dtypes)
Parameters:
pattern
(String, Required): Regex pattern for column namesdtypes
(Array, Required): Data types to match- Examples: {“int”, “float”}, {“object”}, {“int64”, “float64”}
Returns: ColumnSelector object
Use Case: Automatically select columns by type or name pattern
Example:
# Select all numeric columns
Cell A1: =ML.COMPOSE.COLUMN_SELECTOR(".*", {"int64", "float64"})
# Select columns starting with "num_"
Cell B1: =ML.COMPOSE.COLUMN_SELECTOR("^num_.*", {"int64", "float64"})
ML.COMPOSE.TRANSFORMERS Namespace
ML.COMPOSE.TRANSFORMERS.DROP()
Creates a drop transformer to exclude columns.
Syntax:
=ML.COMPOSE.TRANSFORMERS.DROP()
Parameters: None
Returns: DropTransformer object
Use Case: Exclude specific columns from pipeline
Example:
# Drop ID column
Cell A1: =ML.COMPOSE.TRANSFORMERS.DROP()
Cell A2: =ML.COMPOSE.COLUMN_TRANSFORMER(A1, {"id"})
ML.COMPOSE.TRANSFORMERS.PASSTHROUGH()
Creates a passthrough transformer (no transformation).
Syntax:
=ML.COMPOSE.TRANSFORMERS.PASSTHROUGH()
Parameters: None
Returns: PassthroughTransformer object
Use Case: Keep columns unchanged in pipeline
Example:
# Pass through already processed columns
Cell A1: =ML.COMPOSE.TRANSFORMERS.PASSTHROUGH()
Cell A2: =ML.COMPOSE.COLUMN_TRANSFORMER(A1, {"preprocessed_feature"})
Common Patterns
Mixed Data Type Processing
# Assume DataFrame with numeric and categorical columns
Cell A1: =ML.DATA.CONVERT_TO_DF(Sheet1!A1:F1000, TRUE)
# Create numeric scaler
Cell B1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"age", "income", "score"})
# Create categorical encoder
Cell B2: =ML.PREPROCESSING.ONE_HOT_ENCODER()
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {"category", "region"})
# Combine transformers
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2)
# Use in model pipeline
Cell E1: =ML.CLASSIFICATION.RANDOM_FOREST_CLF()
Cell F1: =ML.PIPELINE(D1, E1)
Cell G1: =ML.FIT(F1, train_data, train_target)
Selective Column Processing
# Load data
Cell A1: =ML.DATASETS.DIABETES()
# Scale only specific features
Cell B1: =ML.PREPROCESSING.ROBUST_SCALER()
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {0, 1, 2}) # First 3 columns
# Leave others unchanged
Cell B2: =ML.COMPOSE.TRANSFORMERS.PASSTHROUGH()
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {3, 4, 5, 6, 7, 8, 9})
# Combine
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2)
# Add model
Cell E1: =ML.REGRESSION.LINEAR()
Cell F1: =ML.PIPELINE(D1, E1)
Drop Unwanted Columns
# Data with ID and timestamp columns
Cell A1: =ML.DATA.CONVERT_TO_DF(Sheet1!A1:H1000, TRUE)
# Drop non-predictive columns
Cell B1: =ML.COMPOSE.TRANSFORMERS.DROP()
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"id", "timestamp"})
# Scale remaining features
Cell B2: =ML.PREPROCESSING.STANDARD_SCALER()
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {"feature1", "feature2", "feature3"})
# Combine and use
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2)
Cell E1: =ML.CLASSIFICATION.SVM()
Cell F1: =ML.PIPELINE(D1, E1)
Different Scalers for Different Features
# Load data
Cell A1: =ML.DATA.CONVERT_TO_DF(Sheet1!A1:F1000, TRUE)
# Standard scale normal distributions
Cell B1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"age", "height"})
# Robust scale features with outliers
Cell B2: =ML.PREPROCESSING.ROBUST_SCALER()
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {"income", "spending"})
# MinMax scale bounded features
Cell B3: =ML.PREPROCESSING.MIN_MAX_SCALER()
Cell C3: =ML.COMPOSE.COLUMN_TRANSFORMER(B3, {"score"})
# Combine all transformers
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2, C3)
# Add to pipeline
Cell E1: =ML.REGRESSION.RIDGE()
Cell F1: =ML.PIPELINE(D1, E1)
Imputation and Scaling Pipeline
# Load data with missing values
Cell A1: =ML.DATA.CONVERT_TO_DF(Sheet1!A1:E1000, TRUE)
# Impute numeric columns with mean
Cell B1: =ML.IMPUTE.SIMPLE_IMPUTER("mean")
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"age", "income"})
# Impute categorical with most frequent
Cell B2: =ML.IMPUTE.SIMPLE_IMPUTER("most_frequent")
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {"category"})
# Combine imputers
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2)
# Then scale numeric
Cell E1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell F1: =ML.COMPOSE.COLUMN_TRANSFORMER(E1, {"age", "income"})
# Encode categorical
Cell E2: =ML.PREPROCESSING.ONE_HOT_ENCODER()
Cell F2: =ML.COMPOSE.COLUMN_TRANSFORMER(E2, {"category"})
# Combine scalers/encoders
Cell G1: =ML.COMPOSE.DATA_TRANSFORMER(F1, F2)
# Full pipeline: impute → scale/encode → model
Cell H1: =ML.CLASSIFICATION.LOGISTIC()
Cell I1: =ML.PIPELINE(D1, G1, H1)
Feature Engineering Pipeline
# Load data
Cell A1: =ML.DATA.CONVERT_TO_DF(Sheet1!A1:G1000, TRUE)
# Pass through engineered features (already calculated in Excel)
Cell B1: =ML.COMPOSE.TRANSFORMERS.PASSTHROUGH()
Cell C1: =ML.COMPOSE.COLUMN_TRANSFORMER(B1, {"feature_ratio", "feature_product"})
# Scale raw features
Cell B2: =ML.PREPROCESSING.STANDARD_SCALER()
Cell C2: =ML.COMPOSE.COLUMN_TRANSFORMER(B2, {"raw_feature1", "raw_feature2"})
# Drop original features (now have ratios/products)
Cell B3: =ML.COMPOSE.TRANSFORMERS.DROP()
Cell C3: =ML.COMPOSE.COLUMN_TRANSFORMER(B3, {"original_feature1", "original_feature2"})
# Combine
Cell D1: =ML.COMPOSE.DATA_TRANSFORMER(C1, C2, C3)
# Model
Cell E1: =ML.REGRESSION.RANDOM_FOREST_REG()
Cell F1: =ML.PIPELINE(D1, E1)
Tips and Best Practices
-
When to Use Compose
- Mixed data types (numeric + categorical)
- Different preprocessing for different columns
- Feature-specific transformations
- Complex data pipelines
-
Column Specification
- By name: {“col1”, “col2”} - More readable
- By index: {0, 1, 2} - More robust to name changes
- Single column: “col1” or 0
-
Transformation Order
1. Drop unwanted columns 2. Impute missing values 3. Encode categorical features 4. Scale numeric features 5. Apply model
-
Compose vs Pipeline
- COMPOSE: Column-specific transformations
- PIPELINE: Sequential transformations
- Combine both: Use COMPOSE in PIPELINE steps
-
Common Patterns
Numeric + Categorical: - COLUMN_TRANSFORMER(scaler, numeric_cols) - COLUMN_TRANSFORMER(encoder, categorical_cols) - DATA_TRANSFORMER(both) Selective Processing: - COLUMN_TRANSFORMER(transform, selected_cols) - COLUMN_TRANSFORMER(passthrough, other_cols) - DATA_TRANSFORMER(both)
-
Performance Tips
- Group similar transformations
- Drop columns early if not needed
- Use passthrough for pre-processed columns
- Consider column order for readability
-
Debugging Compose Pipelines
- Test each transformer separately
- Verify column names/indices
- Check transformed output shape
- Use ML.DATA.SAMPLE to inspect results
-
Best Practices
- ✅ Group columns by transformation type
- ✅ Use descriptive column names
- ✅ Document column choices
- ✅ Test with sample data first
- ❌ Don’t mix column names and indices
- ❌ Don’t forget to handle all columns
- ❌ Don’t duplicate column transformations
Related Functions
- ML.PIPELINE() - Sequential transformations
- ML.PREPROCESSING Functions - Transformers to use
- ML.IMPUTE Functions - Imputation transformers
- ML.DATA Functions - Data preparation