PILOT IN PRODUCTION
TESTING
DEPLOYMENT
MONITORING
ML Test Score
These are the components of a Machine Learning Software. Use the ML Test Score to evaluate your ML System in production.
Data tests
- [ ] Feature expectations are capture in a schema
- [ ] All features are beneficial
- [ ] No feature's cost is too much
- [ ] Features adhere to meta-level requirements (Business reason)
- [ ] The data pipeline has appropriate privacy controls
- [ ] New features can be added quickly
- [ ] All input feature code is tested
Model Tests
- [ ] Model specs are reviewed and submmited
- [ ] Offline and online metrics correlate
- [ ] All hyperparameters have been tuned
- [ ] The impact of model staleness is known
- [ ] A simpler model is not better
- [ ] Model quality is sufficient on important data slices
- [ ] The model is tested for considerations
ML Infrastructure Tests
- [ ] Training is reproducible
- [ ] Models specs are unit tested
- [ ] The ML pipeline is Integration tested
- [ ] Model quality is validated before serving
- [ ] The model is debuggable
- [ ] Models are canaried before serving (Tested few production predictions with the new model)
- [ ] Serving models can be rolled back
Monitoring Tests
- [ ] Dependency changes result in notification
- [ ] Data invariants hold for inputs
- [ ] Training and serving are not skewed
- [ ] Models are not too stale
- [ ] Models are numerically stable
- [ ] Computing performance has not regressed (how fast it takes to actually run the model prediction)
- [ ] Prediction quality has not regressed
Scoring Test
Description
More of a research than a productionized system
Not actually untested, but it is worth considering the possibility of serious holes in reliability
There's been first pass at basic productionization, but additional investment may be needed
Reasonably tested, but it's possible that more of those tests and procedures may be automated
Strong levels of automated testing and monitoring, appropriate for mission-critical systems
Exceptional levels of automated testing and monitoring
Points
0
(0, 1]
(1,2]
(2,3]
(3,5]
5
Unit and Integration Testing:
- Types of tests:
- Training system tests: testing training pipeline
- Validation tests: testing prediction system on validation set
- Functionality tests: testing prediction system on few important examples