Compose Functions Reference
Complete reference for FormulaML compose functions for building advanced transformation pipelines.
Functions for loading, exploring, and manipulating data.
ML.DATASETS.* - Built-in datasets (Iris, Diabetes, Digits, OpenML)ML.DATA.* - Data manipulation and explorationFunctions for predicting continuous values.
ML.REGRESSION.LINEAR - Linear RegressionML.REGRESSION.RIDGE - Ridge Regression (L2 regularization)ML.REGRESSION.LASSO - Lasso Regression (L1 regularization)ML.REGRESSION.ELASTIC_NET - Elastic Net (L1 + L2)ML.REGRESSION.RANDOM_FOREST_REG - Random Forest RegressionFunctions for categorizing data.
ML.CLASSIFICATION.LOGISTIC - Logistic RegressionML.CLASSIFICATION.SVM - Support Vector MachinesML.CLASSIFICATION.RANDOM_FOREST_CLF - Random Forest ClassifierFunctions for finding groups in data.
ML.CLUSTERING.KMEANS - K-Means clustering with advanced parametersFunctions for preparing data.
ML.PREPROCESSING.TRAIN_TEST_SPLIT - Split train/test setsML.PREPROCESSING.STANDARD_SCALER - Standardize featuresML.PREPROCESSING.MIN_MAX_SCALER - Scale to range [0,1]ML.PREPROCESSING.ROBUST_SCALER - Scale robust to outliersML.PREPROCESSING.ONE_HOT_ENCODER - One-hot encode categoriesML.PREPROCESSING.ORDINAL_ENCODER - Ordinal encode categoriesFunctions for assessing model performance.
ML.EVAL.SCORE - R² or accuracy scoreML.EVAL.CV_SCORE - Cross-validationML.EVAL.GRID_SEARCH - Hyperparameter tuningML.EVAL.BEST_PARAMS - Extract best parametersML.EVAL.BEST_SCORE - Get best CV scoreML.EVAL.SEARCH_RESULTS - Detailed search resultsStandalone metric functions for regression, classification, and clustering evaluation.
ML.EVAL.REGRESSION.* — R²/RMSE/MAE and 12 more regression metricsML.EVAL.CLASSIFICATION.* — Accuracy/F1/ROC AUC and more (label-based and score-based)ML.EVAL.CLUSTERING.* — Adjusted Rand Score/V-Measure and moreEssential functions for model training and prediction.
ML.FIT - Train models and transformersML.PREDICT - Make predictionsML.TRANSFORM - Transform dataML.FIT_TRANSFORM - Fit and transform in one stepML.PIPELINE - Create ML workflowsML.OBJECT_INFO - Inspect object detailsFunctions for reducing feature dimensions.
ML.DIM_REDUCTION.PCA - Principal Component AnalysisML.DIM_REDUCTION.PCA.RESULTS - PCA detailed resultsML.DIM_REDUCTION.KERNEL_PCA - Non-linear PCAFunctions for model analysis and visualization.
ML.INSPECT.GET_PARAMS - Extract model parametersML.INSPECT.DECISION_BOUNDARY - Visualize decision boundariesFunctions for advanced column transformations.
ML.COMPOSE.COLUMN_TRANSFORMER - Apply transformer to columnsML.COMPOSE.DATA_TRANSFORMER - Combine transformersML.COMPOSE.COLUMN_SELECTOR - Select columns by pattern/typeML.COMPOSE.TRANSFORMERS.DROP - Drop columnsML.COMPOSE.TRANSFORMERS.PASSTHROUGH - Pass columns unchangedFunctions for handling missing values and selecting features.
ML.IMPUTE.SIMPLE_IMPUTER - Impute missing valuesML.FEATURE_SELECTION.SELECT_PERCENTILE - Select top featuresAll FormulaML functions follow a consistent naming pattern:
ML.[NAMESPACE].[FUNCTION_NAME](parameters)
Examples:
ML.DATASETS.IRIS() - Load Iris datasetML.REGRESSION.LINEAR() - Create linear regression modelML.EVAL.SCORE() - Evaluate model performanceMost core functionality is available in the free tier:
Advanced capabilities require premium subscription:
ML.EVAL.CV_SCORE)ML.EVAL.GRID_SEARCH)ML.DIM_REDUCTION.KERNEL_PCA)ML.DATASETS.OPENML)Many functions return or accept “object handles” - references to complex data structures:
Cell A1: =ML.DATASETS.IRIS() → Returns: <Dataset>
Cell A2: =ML.REGRESSION.LINEAR() → Returns: <LinearRegression>
Cell A3: =ML.FIT(A2, features, target) → Returns: <LinearRegression> (with 🧠 brain icon)
These handles allow Excel to manage complex ML objects efficiently.
random_state
42 (any integer works)fit_intercept
alpha
n_estimators
max_iter
Functions return different types of values:
Object Handles: Complex objects (models, dataframes)
<SVC>Numeric Values: Single numbers
0.95 (accuracy score)Arrays: Multiple values
DataFrames: Tabular data
Load Data:
ML.DATASETS.IRIS() - Classification datasetML.DATASETS.DIABETES() - Regression datasetML.DATASETS.FORCE_2020() - Well log data (petrophysics)ML.DATA.CONVERT_TO_DF() - Excel to DataFrameExplore Data:
ML.DATA.INFO() - Data structureML.DATA.DESCRIBE() - StatisticsML.DATA.SAMPLE() - View rowsFilter / Query Data:
ML.DATA.QUERY() - Run DuckDB SQL against a DataFramePrepare Data:
ML.DATA.SELECT_COLUMNS() - Choose columnsML.PREPROCESSING.TRAIN_TEST_SPLIT() - Split dataML.PREPROCESSING.STANDARD_SCALER() - Scale featuresTrain Models:
ML.FIT() - Train any modelML.PREDICT() - Make predictionsML.TRANSFORM() - Transform dataEvaluate:
ML.EVAL.SCORE() - Basic scoring (model-level)ML.EVAL.CV_SCORE() - Cross-validationML.EVAL.GRID_SEARCH() - Hyperparameter tuningCommon error messages and their meanings:
“Object handle not found”
“Invalid parameter value”
“Dimension mismatch”
“Premium function”
Always use consistent data shapes
Set random_state for reproducibility
Check data types
Handle missing values
Start with simple models
Browse functions by category:
Or return to:
Complete reference for FormulaML compose functions for building advanced transformation pipelines.
Complete reference for FormulaML imputation and feature selection functions.