Table of Contents
Tips and Troubleshooting
This guide helps you resolve common issues, optimize performance, and use FormulaML effectively.
Critical Rules for Avoiding Common Failures
These four rules prevent the most frequent and hard-to-diagnose errors in FormulaML.
Rule 1: Do Not Nest Functions That Return Object Handles
FormulaML functions that return object handles (models, datasets, DataFrames) must each live in their own cell. Excel needs to store the handle in a cell before another function can read it. Nesting these calls will always fail.
# WRONG — will fail
=ML.FIT(ML.CLASSIFICATION.LOGISTIC(), A1:D100, E1:E100)
# CORRECT — one function per cell
Cell A1: =ML.CLASSIFICATION.LOGISTIC()
Cell A2: =ML.FIT(A1, B1:E100, F1:F100)
Cell A3: =ML.PREDICT(A2, G1:J10)
Rule 2: Some Functions Spill Results Directly Into Cells
Not all FormulaML functions return a handle. Functions like ML.DATA.SAMPLE, ML.DATA.INFO, and ML.DATA.DESCRIBE write their results directly into multiple cells (Excel’s “spill” behavior). Make sure the cells below and to the right of the formula are empty, or you’ll get a #SPILL! error.
Cell A5: =ML.DATA.SAMPLE(data, 10)
# This writes 10 rows × N columns of data starting at A5
# Leave rows 5-14 and the relevant columns clear!
Rule 3: Always Reference the Original Handle Cell
An object handle is tied to the cell where it was created. If you copy a handle’s display value to another cell (e.g., =A1), you get a text copy, not the actual handle. FormulaML functions will reject it.
# WRONG — B1 is just a text copy
Cell A1: =ML.DATASETS.IRIS()
Cell B1: =A1 <- text copy, not a real handle
Cell C1: =ML.DATA.INFO(B1) <- fails
# CORRECT — always point back to the original cell
Cell C1: =ML.DATA.INFO(A1)
Rule 4: Use ML.DATA.SHAPE() Before Spilling Large Results
Before calling a function that spills many rows or columns (like ML.DATA.SAMPLE(data, -1)), call ML.DATA.SHAPE() to find out how much space the output needs. This helps you lay out your sheet so the spill area is clear.
Cell B1: =ML.DATA.SHAPE(A1) <- returns [150, 5]
# So ML.DATA.SAMPLE(A1, -1) will need 150 rows × 5 columns
Cell A3: =ML.DATA.SAMPLE(A1, -1) <- leave A3:E152 clear
Common Error Messages
#NAME? Error
Problem: Excel doesn’t recognize the FormulaML function.
Solutions:
-
Check installation:
- Go to Insert → Get Add-ins
- Search for FormulaML
- Ensure it’s installed and enabled
-
Verify function syntax:
Wrong: =MLDATASETS.IRIS() Right: =ML.DATASETS.IRIS() -
Restart Excel:
- Save your work
- Close and reopen Excel
- Try the function again
“Object handle not found”
Problem: Referenced cell doesn’t contain a valid FormulaML object.
Common Causes:
# Wrong cell reference
Cell A1: =ML.DATASETS.IRIS()
Cell A2: =ML.DATA.INFO(B1) # Should be A1, not B1
Solutions:
- Check you’re referencing the correct cell
- Ensure the referenced cell contains an object handle
- Verify the object hasn’t expired (restart Excel can clear handles)
“Dimension mismatch”
Problem: Data shapes don’t match between operations.
Example:
X_train has 100 rows
y_train has 80 rows # Mismatch!
Solutions:
- Ensure X and y have same number of rows
- Check train/test split consistency
- Verify column selection is correct
“Invalid parameter value”
Problem: Function parameter is outside acceptable range.
Examples:
=ML.CLUSTERING.KMEANS("five") # n_clusters must be integer
=ML.REGRESSION.RIDGE(-1.0) # alpha must be positive
=ML.DATA.SAMPLE(data, 0) # n must be positive or -1
Solutions:
- Check parameter documentation
- Use correct data types (numbers vs text)
- Verify ranges (e.g., percentages 0-1)
“Premium function requires subscription”
Problem: Trying to use premium feature without subscription.
Premium Functions:
- ML.REGRESSION.RANDOM_FOREST_REG
- ML.CLASSIFICATION.RANDOM_FOREST_CLF
- ML.EVAL.CV_SCORE
- ML.EVAL.GRID_SEARCH
- ML.DIM_REDUCTION.KERNEL_PCA
- ML.DATASETS.OPENML
Solutions:
- Use free alternatives (e.g., Linear instead of Random Forest)
- Upgrade to premium subscription
- Check documentation for free equivalent workflows
Performance Optimization
Large Dataset Issues
Problem: Slow performance with big datasets (>10,000 rows).
Solutions:
-
Sample your data:
# Work with subset first Cell A1: =ML.DATA.SAMPLE(large_data, 5000, , , 42) -
Reduce features:
# Select only important columns Cell A1: =ML.DATA.SELECT_COLUMNS(data, {1,3,5,7}) -
Use efficient algorithms:
- Linear models are fastest
- Avoid deep trees
- Limit iterations
-
Progressive workflow:
# Start small 1. Test with 100 rows 2. Validate with 1,000 rows 3. Train with 10,000 rows 4. Deploy on full dataset
Memory Management
Problem: Excel becomes unresponsive or crashes.
Solutions:
-
Clear unused objects:
- Delete cells with old object handles
- Save and reopen workbook periodically
-
Optimize data types:
# Remove unnecessary precision Round numerical data to needed decimal places Convert text to categories where possible -
Batch processing:
# Process in chunks Chunk 1: Rows 1-10000 Chunk 2: Rows 10001-20000 Combine results after
Calculation Speed
Problem: Formulas take too long to calculate.
Solutions:
-
Disable automatic calculation:
- File → Options → Formulas
- Set to Manual calculation
- Press F9 to calculate when ready
-
Minimize volatile functions:
- Avoid unnecessary SAMPLE calls
- Cache results in cells
-
Use simpler models for exploration:
# Development: Use Linear Cell A1: =ML.REGRESSION.LINEAR() # Production: Use Random Forest Cell A1: =ML.REGRESSION.RANDOM_FOREST_REG()
Best Practices
Data Preparation
1. Always Clean Your Data First
# Check for issues
Cell A1: =ML.DATA.INFO(data) # Look for nulls
Cell A2: =ML.DATA.DESCRIBE(data) # Check for outliers
# Handle missing values
Cell B1: =ML.DATA.DROP_MISSING_ROWS(data)
# OR
Cell B1: =ML.IMPUTE.SIMPLE_IMPUTER("mean")
2. Consistent Data Types
- Don’t mix text and numbers in columns
- Format dates consistently
- Remove special characters
3. Feature Engineering in Excel
# Create new features before ML
Column F: =D2/E2 # Ratio feature
Column G: =MONTH(A2) # Extract month from date
Column H: =IF(B2>100,1,0) # Binary feature
Model Development
1. Start Simple, Add Complexity
Step 1: =ML.REGRESSION.LINEAR() # Baseline
Step 2: =ML.REGRESSION.RIDGE(1.0) # Add regularization
Step 3: =ML.REGRESSION.RANDOM_FOREST() # Complex model
2. Always Split Your Data
# Never evaluate on training data
Cell A1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(X, y, 0.2, 42)
↑
Always use seed
3. Compare Multiple Models
# Same data, different models
Linear Score: 0.72
Ridge Score: 0.74
Lasso Score: 0.73
Best: Ridge (free and performs well)
Production Deployment
1. Document Your Workflow
# Add comments to cells
Cell A1: Data loading - 10,000 customer records
Cell B1: Feature selection - top 5 predictors
Cell C1: Model - Ridge regression with alpha=0.5
Cell D1: Performance - R²=0.82 on test set
2. Version Control
# Save versions of your workbook
CustomerModel_v1.xlsx - Initial model
CustomerModel_v2.xlsx - Added features
CustomerModel_v3.xlsx - Production version
3. Monitor Performance
# Track model performance over time
Week 1 Score: 0.85
Week 2 Score: 0.84
Week 3 Score: 0.81 # Degrading - retrain needed
Common Pitfalls
1. Data Leakage
Problem: Test data information influences training.
Wrong:
# Scaling before splitting - WRONG!
Cell A1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell A2: =ML.FIT_TRANSFORM(A1, all_data)
Cell A3: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(A2, y, 0.2)
Right:
# Split first, then scale - RIGHT!
Cell A1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(X, y, 0.2)
Cell B1: =ML.PREPROCESSING.STANDARD_SCALER()
Cell B2: =ML.FIT(B1, INDEX(A1,1)) # Fit on train only
Cell B3: =ML.TRANSFORM(B2, INDEX(A1,2)) # Transform test
2. Overfitting
Signs:
- Training score: 0.99
- Test score: 0.65
- Perfect memorization, poor generalization
Solutions:
# Add regularization
=ML.REGRESSION.RIDGE(10.0) # Higher alpha
# Simplify model
=ML.DIM_REDUCTION.PCA(5) # Reduce features
# Get more data
Combine multiple data sources
3. Wrong Metric
Problem: Using accuracy for imbalanced data.
Example:
95% customers don't churn
Model predicts "no churn" always
Accuracy: 95% (misleading!)
Catches 0% of actual churners
Solution: Consider business impact, not just accuracy.
Excel-Specific Tips
1. Cell Naming
# Name your cells for clarity
Name cell A1: "TrainingData"
Name cell B1: "Model"
Name cell C1: "Predictions"
# Use in formulas
=ML.FIT(Model, TrainingData, Labels)
2. Array Formulas
# INDEX for accessing split results
Cell A1: =ML.PREPROCESSING.TRAIN_TEST_SPLIT(X, y, 0.2)
Cell B1: =INDEX(A1, 1) # X_train
Cell B2: =INDEX(A1, 2) # X_test
Cell B3: =INDEX(A1, 3) # y_train
Cell B4: =INDEX(A1, 4) # y_test
3. Conditional Formatting
# Highlight predictions
Conditional Format:
- Green if prediction > 0.8
- Yellow if 0.5-0.8
- Red if < 0.5
4. Data Validation
# Ensure valid inputs
Data Validation on input cells:
- List for categorical inputs
- Number ranges for parameters
- Prevent invalid entries
Debugging Techniques
1. Check Intermediate Results
# Debug step by step
Cell A1: =ML.DATASETS.IRIS()
Cell A2: =ML.DATA.INFO(A1) # Check loaded correctly
Cell A3: =ML.DATA.SAMPLE(A1, 5) # View sample
Cell A4: =ML.DATA.DESCRIBE(A1) # Check statistics
2. Simplify to Isolate Issues
# Start with minimal example
1. Use built-in dataset (not your data)
2. Use simple model (Linear)
3. Skip preprocessing
4. If works, add complexity gradually
3. Verify Data Shapes
# Check dimensions match
Cell A1: =ML.DATA.INFO(X_train) # Should have n rows
Cell A2: =ML.DATA.INFO(y_train) # Should have n rows
Cell A3: =ML.DATA.INFO(X_test) # Should have m rows
Cell A4: =ML.DATA.INFO(y_test) # Should have m rows
FAQ
Q: How do I save trained models?
A: Save the Excel workbook. Object handles are preserved with the file.
Q: Can I use FormulaML offline?
A: Yes, once installed. But you need internet for initial installation and updates.
Q: What’s the maximum dataset size?
A: Depends on your system RAM. Generally handles up to 1 million rows efficiently.
Q: Can I export models to Python?
A: Not directly. FormulaML models live in Excel. Consider using the same algorithms in scikit-learn for Python.
Q: How do I update FormulaML?
A: Updates are automatic through Excel’s add-in system.
Q: Can multiple users share models?
A: Yes, share the Excel file. Each user needs FormulaML installed.
Getting Help
Resources
- Documentation: You’re here!
- Examples: See Examples and Tutorials
- Function Reference: Check Function Reference
- Support: Email [email protected]
Before Contacting Support
- Check error message carefully
- Try with built-in dataset
- Verify FormulaML version is current
- Document steps to reproduce issue
- Include Excel version and OS
Community
- Share tips and tricks
- Report bugs
- Request features
- Help other users
Quick Reference Card
Essential Functions
# Data
=ML.DATASETS.IRIS()
=ML.DATA.CONVERT_TO_DF(range, TRUE)
=ML.DATA.SELECT_COLUMNS(df, {0,1,2})
# Models
=ML.REGRESSION.LINEAR()
=ML.CLASSIFICATION.SVM()
=ML.CLUSTERING.KMEANS(4)
# Training
=ML.FIT(model, X, y)
=ML.PREDICT(model, X)
=ML.TRANSFORM(transformer, X)
# Evaluation
=ML.EVAL.SCORE(model, X, y)
=ML.PREPROCESSING.TRAIN_TEST_SPLIT(X, y, 0.2, 42)
Parameter Defaults
test_size: 0.2 (20% test)
random_state: 42 (or any integer)
alpha: 1.0 (regularization)
n_clusters: 8 (K-means)
n_estimators: 100 (Random Forest)
Performance Tips
- Sample large data first
- Use simple models for testing
- Add complexity gradually
- Monitor memory usage
- Save work frequently
Next Steps
- Return to Documentation Home
- Review Function Reference
- Try Examples
- Explore Advanced Features
Happy machine learning in Excel with FormulaML!