Function Reference

Function Categories

📊 Data Functions

Functions for loading, exploring, and manipulating data.

ML.DATASETS.* - Built-in datasets (Iris, Diabetes, Digits, OpenML)
ML.DATA.* - Data manipulation and exploration

📈 Regression Models

Functions for predicting continuous values.

ML.REGRESSION.LINEAR - Linear Regression
ML.REGRESSION.RIDGE - Ridge Regression (L2 regularization)
ML.REGRESSION.LASSO - Lasso Regression (L1 regularization)
ML.REGRESSION.ELASTIC_NET - Elastic Net (L1 + L2)
ML.REGRESSION.RANDOM_FOREST_REG - Random Forest Regression ⭐

🎯 Classification Models

Functions for categorizing data.

ML.CLASSIFICATION.LOGISTIC - Logistic Regression
ML.CLASSIFICATION.SVM - Support Vector Machines
ML.CLASSIFICATION.RANDOM_FOREST_CLF - Random Forest Classifier ⭐

🔍 Clustering Models

Functions for finding groups in data.

ML.CLUSTERING.KMEANS - K-Means clustering with advanced parameters

⚙️ Preprocessing Functions

Functions for preparing data.

ML.PREPROCESSING.TRAIN_TEST_SPLIT - Split train/test sets
ML.PREPROCESSING.STANDARD_SCALER - Standardize features
ML.PREPROCESSING.MIN_MAX_SCALER - Scale to range [0,1]
ML.PREPROCESSING.ROBUST_SCALER - Scale robust to outliers
ML.PREPROCESSING.ONE_HOT_ENCODER - One-hot encode categories
ML.PREPROCESSING.ORDINAL_ENCODER - Ordinal encode categories

📏 Evaluation Functions

Functions for assessing model performance.

ML.EVAL.SCORE - R² or accuracy score
ML.EVAL.CV_SCORE - Cross-validation ⭐
ML.EVAL.GRID_SEARCH - Hyperparameter tuning ⭐
ML.EVAL.BEST_PARAMS - Extract best parameters ⭐
ML.EVAL.BEST_SCORE - Get best CV score ⭐
ML.EVAL.SEARCH_RESULTS - Detailed search results ⭐

🔧 Core ML Functions

Essential functions for model training and prediction.

ML.FIT - Train models and transformers
ML.PREDICT - Make predictions
ML.TRANSFORM - Transform data
ML.FIT_TRANSFORM - Fit and transform in one step
ML.PIPELINE - Create ML workflows
ML.OBJECT_INFO - Inspect object details

📉 Dimensionality Reduction

Functions for reducing feature dimensions.

ML.DIM_REDUCTION.PCA - Principal Component Analysis
ML.DIM_REDUCTION.PCA.RESULTS - PCA detailed results
ML.DIM_REDUCTION.KERNEL_PCA - Non-linear PCA ⭐

🔬 Inspection Tools

Functions for model analysis and visualization.

ML.INSPECT.GET_PARAMS - Extract model parameters
ML.INSPECT.DECISION_BOUNDARY - Visualize decision boundaries

🧩 Compose Functions

Functions for advanced column transformations.

ML.COMPOSE.COLUMN_TRANSFORMER - Apply transformer to columns
ML.COMPOSE.DATA_TRANSFORMER - Combine transformers
ML.COMPOSE.COLUMN_SELECTOR - Select columns by pattern/type
ML.COMPOSE.TRANSFORMERS.DROP - Drop columns
ML.COMPOSE.TRANSFORMERS.PASSTHROUGH - Pass columns unchanged

🔧 Impute & Feature Selection

Functions for handling missing values and selecting features.

ML.IMPUTE.SIMPLE_IMPUTER - Impute missing values
ML.FEATURE_SELECTION.SELECT_PERCENTILE - Select top features

Function Naming Convention

All FormulaML functions follow a consistent naming pattern:

ML.[NAMESPACE].[FUNCTION_NAME](parameters)

Examples:

ML.DATASETS.IRIS() - Load Iris dataset
ML.REGRESSION.LINEAR() - Create linear regression model
ML.EVAL.SCORE() - Evaluate model performance

Free vs Premium Functions

✅ Free Functions

Most core functionality is available in the free tier:

Basic models (Linear, Logistic, SVM, K-Means)
Data handling and exploration
Model training and prediction
Basic evaluation

⭐ Premium Functions

Advanced capabilities require premium subscription:

Random Forest models
Cross-validation (ML.EVAL.CV_SCORE)
Grid search (ML.EVAL.GRID_SEARCH)
Kernel PCA (ML.DIM_REDUCTION.KERNEL_PCA)
OpenML datasets (ML.DATASETS.OPENML)

Premium functions are marked with a ⭐ icon in the documentation.

Understanding Object Handles

Many functions return or accept “object handles” - references to complex data structures:

Cell A1: =ML.DATASETS.IRIS()           → Returns: <Dataset>
Cell A2: =ML.REGRESSION.LINEAR()       → Returns: <LinearRegression>
Cell A3: =ML.FIT(A2, features, target) → Returns: <LinearRegression> (with 🧠 brain icon)

These handles allow Excel to manage complex ML objects efficiently.

Common Parameters

Frequently Used Parameters

random_state

Type: Integer
Purpose: Ensures reproducible results
Example: 42 (any integer works)

fit_intercept

Type: Boolean (TRUE/FALSE)
Purpose: Whether to calculate the intercept
Default: TRUE

alpha

Type: Float
Purpose: Regularization strength
Range: > 0 (higher = more regularization)

n_estimators

Type: Integer
Purpose: Number of trees in ensemble
Default: 100

max_iter

Type: Integer
Purpose: Maximum iterations
Default: Varies by algorithm

Return Value Types

Functions return different types of values:

Object Handles: Complex objects (models, dataframes)
- Example: <SVC>
Numeric Values: Single numbers
- Example: 0.95 (accuracy score)
Arrays: Multiple values
- Example: Cross-validation scores
DataFrames: Tabular data
- Example: Sample data, parameters

Quick Function Lookup

By Task

Load Data:

ML.DATASETS.IRIS() - Classification dataset
ML.DATASETS.DIABETES() - Regression dataset
ML.DATA.CONVERT_TO_DF() - Excel to DataFrame

Explore Data:

ML.DATA.INFO() - Data structure
ML.DATA.DESCRIBE() - Statistics
ML.DATA.SAMPLE() - View rows

Prepare Data:

ML.DATA.SELECT_COLUMNS() - Choose columns
ML.PREPROCESSING.TRAIN_TEST_SPLIT() - Split data
ML.PREPROCESSING.STANDARD_SCALER() - Scale features

Train Models:

ML.FIT() - Train any model
ML.PREDICT() - Make predictions
ML.TRANSFORM() - Transform data

Evaluate:

ML.EVAL.SCORE() - Basic scoring
ML.EVAL.CV_SCORE() - Cross-validation ⭐
ML.EVAL.GRID_SEARCH() - Hyperparameter tuning ⭐

Error Messages

Common error messages and their meanings:

“Object handle not found”

The referenced cell doesn’t contain a valid object
Solution: Check cell reference is correct

“Invalid parameter value”

Parameter is outside acceptable range
Solution: Check parameter constraints

“Dimension mismatch”

Data shapes don’t match
Solution: Ensure X and y have same number of rows

“Premium function”

Function requires premium subscription
Solution: Upgrade or use free alternative

Best Practices

Always use consistent data shapes
- Features (X) and target (y) must have same number of rows
Set random_state for reproducibility
- Use same seed value across related operations
Check data types
- Ensure numerical data isn’t stored as text
Handle missing values
- Use ML.IMPUTE or clean data before analysis
Start with simple models
- Use as baseline before complex models

Browse functions by category:

Data Functions - Data loading and manipulation
Regression Models - Continuous prediction
Classification Models - Category prediction
Clustering Models - Group discovery
Preprocessing - Data preparation
Evaluation - Model assessment
Core Functions - Training and prediction
Dimensionality Reduction - Feature reduction
Inspection Tools - Model analysis
Compose Functions - Advanced pipelines
Impute & Feature Selection - Data cleaning and feature selection

Or return to: