What is Infrastructure as Code
Patterns and Practices
- Everything in Source Control
- Modularize and Version
- Security and Compliance
- Automate Execution from a Shared Environment
— Infrastructure as Code Pipeline
1. There is a certain level of Organizational maturity needed to use these Principles, Patterns, and Practices. This article is not focused on the cultural side of things but it is very important for the successful adoption of these.
2. Examples used in this article are using Terraform and AWS but these Principles, Patterns, and…
I talked about what Platform Engineering(PE) is, when is it useful, and the challenges I have seen working with it in my previous article. In this article, we will go through solutions that have worked for my teams and me in resolving those challenges. I have led platform engineering teams at various companies of different sizes (from startup to enterprise), and these challenges and solutions are based on my experience.
Let’s go through each of the challenges and what I have done to solve them.
As mentioned in the previous article, one of the DevOps movement’s critical aspects is to…
Caution: Platform is a widely used term to define various types of platforms, so be careful about using it. At one place, we chose not to use the term platform in the team name and just called it the Infrastructure team since there was another AI platform team. Another place we called it the Infrastructure Platform Team. Use the term that makes sense for your organization, but the concept remains the same. …
What is Continuous Delivery?
Continuous Delivery vs. Continuous Deployment
Machine Learning Workflow
How does Continuous Delivery help with ML challenges?
- Automated Data Pipeline
- Training Code
- Training Process
- Application Code
Bringing it all together
Most of the principles and practices of traditional software development can be applied to Machine Learning(ML), but certain unique ML specific challenges need to be handled differently. We discussed those unique “Challenges Deploying Machine Learning Models to Production” in the previous article. …
Traditional Software Development vs Machine Learning
Machine Learning Workflow
Stage #1: Data Management
- Large Data Size
- High Quality
- Data Versioning
- Security & Compliance
Stage #2: Experimentation
- Constant Research and Experimentation Workflow
- Tracking Experiments
- Code Quality
- Training Time & Troubleshooting
- Model Accuracy Evaluation
- Infrastructure Requirements
Stage #3: Production Deployment
- Offline/Online Prediction
- Model Degradation