Tips and Troubleshooting

This guide helps you resolve common issues, optimize performance, and use FormulaML effectively.

Critical Rules for Avoiding Common Failures

These four rules prevent the most frequent and hard-to-diagnose errors in FormulaML.

Rule 1: Do Not Nest Functions That Return Object Handles

FormulaML functions that return object handles (models, datasets, DataFrames) must each live in their own cell. Excel needs to store the handle in a cell before another function can read it. Nesting these calls will always fail.

# WRONG — will fail
=ML.FIT(ML.CLASSIFICATION.LOGISTIC(), A1:D100, E1:E100)

# CORRECT — one function per cell
Cell A1: =ML.CLASSIFICATION.LOGISTIC()
Cell A2: =ML.FIT(A1, B1:E100, F1:F100)
Cell A3: =ML.PREDICT(A2, G1:J10)

Rule 2: Some Functions Spill Results Directly Into Cells

Not all FormulaML functions return a handle. Functions like ML.DATA.SAMPLE, ML.DATA.INFO, and ML.DATA.DESCRIBE write their results directly into multiple cells (Excel’s “spill” behavior). Make sure the cells below and to the right of the formula are empty, or you’ll get a #SPILL! error.

Cell A5: =ML.DATA.SAMPLE(data, 10)
# This writes 10 rows × N columns of data starting at A5
# Leave rows 5-14 and the relevant columns clear!

Rule 3: Always Reference the Original Handle Cell

An object handle is tied to the cell where it was created. If you copy a handle’s display value to another cell (e.g., =A1), you get a text copy, not the actual handle. FormulaML functions will reject it.

# WRONG — B1 is just a text copy
Cell A1: =ML.DATASETS.IRIS()
Cell B1: =A1                    <- text copy, not a real handle
Cell C1: =ML.DATA.INFO(B1)      <- fails

# CORRECT — always point back to the original cell
Cell C1: =ML.DATA.INFO(A1)

Rule 4: Use ML.DATA.SHAPE() Before Spilling Large Results

Before calling a function that spills many rows or columns (like ML.DATA.SAMPLE(data, -1)), call ML.DATA.SHAPE() to find out how much space the output needs. This helps you lay out your sheet so the spill area is clear.

Cell B1: =ML.DATA.SHAPE(A1)         <- returns [150, 5]
# So ML.DATA.SAMPLE(A1, -1) will need 150 rows × 5 columns
Cell A3: =ML.DATA.SAMPLE(A1, -1)    <- leave A3:E152 clear

Common Error Messages

#NAME? Error

Problem: Excel doesn’t recognize the FormulaML function.

Solutions:

Check installation:
- Go to Insert → Get Add-ins
- Search for FormulaML
- Ensure it’s installed and enabled

Verify function syntax:

Wrong: =MLDATASETS.IRIS()
Right: =ML.DATASETS.IRIS()

Restart Excel:
- Save your work
- Close and reopen Excel
- Try the function again

“Object handle not found”

Problem: Referenced cell doesn’t contain a valid FormulaML object.

Common Causes:

# Wrong cell reference
Cell A1: =ML.DATASETS.IRIS()
Cell A2: =ML.DATA.INFO(B1)  # Should be A1, not B1

Solutions:

Check you’re referencing the correct cell
Ensure the referenced cell contains an object handle
Verify the object hasn’t expired (restart Excel can clear handles)

“Dimension mismatch”

Problem: Data shapes don’t match between operations.

Example:

X_train has 100 rows
y_train has 80 rows  # Mismatch!

Solutions:

Ensure X and y have same number of rows
Check train/test split consistency
Verify column selection is correct

“Invalid parameter value”

Problem: Function parameter is outside acceptable range.

Examples:

=ML.CLUSTERING.KMEANS("five")    # n_clusters must be integer
=ML.REGRESSION.RIDGE(-1.0)       # alpha must be positive
=ML.DATA.SAMPLE(data, 0)         # n must be positive or -1

Solutions:

Check parameter documentation
Use correct data types (numbers vs text)
Verify ranges (e.g., percentages 0-1)

“Premium function requires subscription”

Problem: Trying to use premium feature without subscription.

Premium Functions:

ML.REGRESSION.RANDOM_FOREST_REG
ML.CLASSIFICATION.RANDOM_FOREST_CLF
ML.EVAL.CV_SCORE
ML.EVAL.GRID_SEARCH
ML.DIM_REDUCTION.KERNEL_PCA
ML.DATASETS.OPENML

Solutions:

Use free alternatives (e.g., Linear instead of Random Forest)
Upgrade to premium subscription
Check documentation for free equivalent workflows

Performance Optimization

Large Dataset Issues

Problem: Slow performance with big datasets (>10,000 rows).

Solutions:

Sample your data:

# Work with subset first
Cell A1: =ML.DATA.SAMPLE(large_data, 5000, , , 42)

Reduce features:

# Select only important columns
Cell A1: =ML.DATA.SELECT_COLUMNS(data, {1,3,5,7})

Use efficient algorithms:
- Linear models are fastest
- Avoid deep trees
- Limit iterations

Progressive workflow:

# Start small
1. Test with 100 rows
2. Validate with 1,000 rows
3. Train with 10,000 rows
4. Deploy on full dataset

Memory Management

Problem: Excel becomes unresponsive or crashes.

Solutions:

Clear unused objects:
- Delete cells with old object handles
- Save and reopen workbook periodically

Optimize data types:

# Remove unnecessary precision
Round numerical data to needed decimal places
Convert text to categories where possible

Batch processing:

# Process in chunks
Chunk 1: Rows 1-10000
Chunk 2: Rows 10001-20000
Combine results after

Calculation Speed

Problem: Formulas take too long to calculate.

Solutions:

Disable automatic calculation:
- File → Options → Formulas
- Set to Manual calculation
- Press F9 to calculate when ready
Minimize volatile functions:
- Avoid unnecessary SAMPLE calls
- Cache results in cells

Use simpler models for exploration:

# Development: Use Linear
Cell A1: =ML.REGRESSION.LINEAR()

# Production: Use Random Forest
Cell A1: =ML.REGRESSION.RANDOM_FOREST_REG()

Best Practices

Data Preparation

1. Always Clean Your Data First

# Check for issues
Cell A1: =ML.DATA.INFO(data)         # Look for nulls
Cell A2: =ML.DATA.DESCRIBE(data)     # Check for outliers

# Handle missing values
Cell B1: =ML.DATA.DROP_MISSING_ROWS(data)
# OR
Cell B1: =ML.IMPUTE.SIMPLE_IMPUTER("mean")

2. Consistent Data Types

Don’t mix text and numbers in columns
Format dates consistently
Remove special characters

3. Feature Engineering in Excel

# Create new features before ML
Column F: =D2/E2           # Ratio feature
Column G: =MONTH(A2)       # Extract month from date
Column H: =IF(B2>100,1,0)  # Binary feature

Model Development

1. Start Simple, Add Complexity

Step 1: =ML.REGRESSION.LINEAR()        # Baseline
Step 2: =ML.REGRESSION.RIDGE(1.0)      # Add regularization
Step 3: =ML.REGRESSION.RANDOM_FOREST() # Complex model

2. Always Split Your Data

# Never evaluate on training data
Cell A1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(X, y, 0.2, 42)
                                                      ↑
                                          Always use seed

3. Compare Multiple Models

# Same data, different models
Linear Score:  0.72
Ridge Score:   0.74
Lasso Score:   0.73
Best: Ridge (free and performs well)

Production Deployment

1. Document Your Workflow

# Add comments to cells
Cell A1: Data loading - 10,000 customer records
Cell B1: Feature selection - top 5 predictors
Cell C1: Model - Ridge regression with alpha=0.5
Cell D1: Performance - R²=0.82 on test set

2. Version Control

# Save versions of your workbook
CustomerModel_v1.xlsx - Initial model
CustomerModel_v2.xlsx - Added features
CustomerModel_v3.xlsx - Production version

3. Monitor Performance

# Track model performance over time
Week 1 Score: 0.85
Week 2 Score: 0.84
Week 3 Score: 0.81  # Degrading - retrain needed

Common Pitfalls

1. Data Leakage

Problem: Test data information influences training.

Wrong:

# Scaling before splitting - WRONG!
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell A2: =ML.FIT_TRANSFORM(A1, all_data)
Cell A3: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(A2, y, 0.2)

Right:

# Split first, then scale - RIGHT!
Cell A1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(X, y, 0.2)
Cell B1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell B2: =ML.FIT(B1, INDEX(A1,1))        # Fit on train only
Cell B3: =ML.TRANSFORM(B2, INDEX(A1,2))  # Transform test

2. Overfitting

Signs:

Training score: 0.99
Test score: 0.65
Perfect memorization, poor generalization

Solutions:

# Add regularization
=ML.REGRESSION.RIDGE(10.0)  # Higher alpha

# Simplify model
=ML.DIM_REDUCTION.PCA(5)    # Reduce features

# Get more data
Combine multiple data sources

3. Wrong Metric

Problem: Using accuracy for imbalanced data.

Example:

95% customers don't churn
Model predicts "no churn" always
Accuracy: 95% (misleading!)
Catches 0% of actual churners

Solution: Consider business impact, not just accuracy.

Excel-Specific Tips

1. Cell Naming

# Name your cells for clarity
Name cell A1: "TrainingData"
Name cell B1: "Model"
Name cell C1: "Predictions"

# Use in formulas
=ML.FIT(Model, TrainingData, Labels)

2. Array Formulas

# INDEX for accessing split results
Cell A1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(X, y, 0.2)
Cell B1: =INDEX(A1, 1)  # X_train
Cell B2: =INDEX(A1, 2)  # X_test
Cell B3: =INDEX(A1, 3)  # y_train
Cell B4: =INDEX(A1, 4)  # y_test

3. Conditional Formatting

# Highlight predictions
Conditional Format: 
- Green if prediction > 0.8
- Yellow if 0.5-0.8
- Red if < 0.5

4. Data Validation

# Ensure valid inputs
Data Validation on input cells:
- List for categorical inputs
- Number ranges for parameters
- Prevent invalid entries

Debugging Techniques

1. Check Intermediate Results

# Debug step by step
Cell A1: =ML.DATASETS.IRIS()
Cell A2: =ML.DATA.INFO(A1)        # Check loaded correctly
Cell A3: =ML.DATA.SAMPLE(A1, 5)   # View sample
Cell A4: =ML.DATA.DESCRIBE(A1)    # Check statistics

2. Simplify to Isolate Issues

# Start with minimal example
1. Use built-in dataset (not your data)
2. Use simple model (Linear)
3. Skip preprocessing
4. If works, add complexity gradually

3. Verify Data Shapes

# Check dimensions match
Cell A1: =ML.DATA.INFO(X_train)  # Should have n rows
Cell A2: =ML.DATA.INFO(y_train)  # Should have n rows
Cell A3: =ML.DATA.INFO(X_test)   # Should have m rows
Cell A4: =ML.DATA.INFO(y_test)   # Should have m rows

FAQ

Q: How do I save trained models?

A: Save the Excel workbook. Object handles are preserved with the file.

Q: Can I use FormulaML offline?

A: Yes, once installed. But you need internet for initial installation and updates.

Q: What’s the maximum dataset size?

A: Depends on your system RAM. Generally handles up to 1 million rows efficiently.

Q: Can I export models to Python?

A: Not directly. FormulaML models live in Excel. Consider using the same algorithms in scikit-learn for Python.

Q: How do I update FormulaML?

A: Updates are automatic through Excel’s add-in system.

A: Yes, share the Excel file. Each user needs FormulaML installed.

Getting Help

Resources

Documentation: You’re here!
Examples: See Examples and Tutorials
Function Reference: Check Function Reference
Support: Email [email protected]

Before Contacting Support

Check error message carefully
Try with built-in dataset
Verify FormulaML version is current
Document steps to reproduce issue
Include Excel version and OS

Community

Share tips and tricks
Report bugs
Request features
Help other users

Quick Reference Card

Essential Functions

# Data
=ML.DATASETS.IRIS()
=ML.DATA.CONVERT_TO_DF(range, TRUE)
=ML.DATA.SELECT_COLUMNS(df, {0,1,2})

# Models
=ML.REGRESSION.LINEAR()
=ML.CLASSIFICATION.SVM()
=ML.CLUSTERING.KMEANS(4)

# Training
=ML.FIT(model, X, y)
=ML.PREDICT(model, X)
=ML.TRANSFORM(transformer, X)

# Evaluation
=ML.EVAL.SCORE(model, X, y)
=ML.PREPROCESSING.TRAIN_TEST_SPLIT(X, y, 0.2, 42)

Parameter Defaults

test_size: 0.2 (20% test)
random_state: 42 (or any integer)
alpha: 1.0 (regularization)
n_clusters: 8 (K-means)
n_estimators: 100 (Random Forest)

Performance Tips

Sample large data first
Use simple models for testing
Add complexity gradually
Monitor memory usage
Save work frequently

Next Steps

Happy machine learning in Excel with FormulaML!

Table of Contents

Tips and Troubleshooting

Critical Rules for Avoiding Common Failures

Rule 1: Do Not Nest Functions That Return Object Handles

Rule 2: Some Functions Spill Results Directly Into Cells

Rule 3: Always Reference the Original Handle Cell

Rule 4: Use ML.DATA.SHAPE() Before Spilling Large Results

Common Error Messages

#NAME? Error

“Object handle not found”

“Dimension mismatch”

“Invalid parameter value”

“Premium function requires subscription”

Performance Optimization

Large Dataset Issues

Memory Management

Calculation Speed

Best Practices

Data Preparation

1. Always Clean Your Data First

2. Consistent Data Types

3. Feature Engineering in Excel

Model Development

1. Start Simple, Add Complexity

2. Always Split Your Data

3. Compare Multiple Models

Production Deployment

1. Document Your Workflow

2. Version Control

3. Monitor Performance

Common Pitfalls

1. Data Leakage

2. Overfitting

3. Wrong Metric

Excel-Specific Tips

1. Cell Naming

2. Array Formulas

3. Conditional Formatting

4. Data Validation

Debugging Techniques

1. Check Intermediate Results

2. Simplify to Isolate Issues

3. Verify Data Shapes

FAQ

Q: How do I save trained models?

Q: Can I use FormulaML offline?

Q: What’s the maximum dataset size?

Q: Can I export models to Python?

Q: How do I update FormulaML?

Q: Can multiple users share models?

Getting Help

Resources

Before Contacting Support

Community

Quick Reference Card

Essential Functions

Parameter Defaults

Performance Tips

Next Steps