Originally published here.
DevOps is a cultural and collaborative mindset that emphasizes communication, collaboration, integration, and automation between development and operations teams in order to achieve faster and more reliable software delivery.
DevOps engineers are professionals who have the skills and knowledge to work across the entire software creation and maintenance process, from development to operations, which spans the entire technology stack.
But how can you become a DevOps engineer? What are the steps and skills you need to learn and master? In this article, I’ll provide you with a DevOps roadmap, which is a visual guide that shows the main steps and concepts you need to follow and understand to become a successful DevOps engineer.
The DevOps roadmap
The DevOps roadmap below covers a lot of topics within software development. You don't need to learn everything at once, but you should have a general idea of what each topic entails and how it relates to DevOps.
You can also use this roadmap as a reference to dive deeper into the topics that interest you or that you need to improve on.
DevOps career roadmap steps
- Learn programming languages.
- Study operating systems.
- Review networking security and protocols.
- Understand Infrastructure as code.
- Adopt continuous integration/continuous deployment tools.
- Invest in application and infrastructure monitoring.
- Study cloud providers.
- Learn cloud design patterns.
Let's break down each of these steps in more detail.
1. Learn programming languages
Although DevOps engineers don’t typically write source code, they do integrate databases, debug code from the development team, and automate processes.
Automation is a critical part of what gives the DevOps lifecycle its speed, and a DevOps engineer plays an important role in implementing a DevOps automation strategy.
Additionally, a DevOps engineer should have a working knowledge of the languages their team is using to help them understand existing code, review new code, and assist with debugging.
Programming languages to learn include:
- Go (recommended)
2. Study operating systems
Operating systems (OSs) are a crucial piece of the technology stack that a DevOps team needs to function. OSs not only power the local machines that the team uses to communicate and complete tasks, but they also run the servers that host the team's deployed applications.
As such, you need to learn the command line terminal so you aren’t reliant on the graphic user interface (GUI) to configure your servers. Command line simplifies tasks that would require multiple clicks in a GUI, and some commands are only executable through the terminal.
Every OS is different, so learning more than one is advisable. Popular OSs to learn include:
- Linux (recommended)
You'll also want to learn the larger strategies and rules that govern how OSs are built and run. As a DevOps engineer, technical knowledge and conceptual knowledge are equally important.
Some of the topics you should learn about operating systems include:
- I/O management
- File systems
- Startup management (initd)
- Service management (systemd)
- Threads and concurrency
3. Review networking security and protocols
Networking is another essential aspect of the technology stack that a DevOps team relies on. Networking enables communication between different devices, applications, and services within and outside the organization.
As a DevOps engineer, you need to understand how networking works, how to troubleshoot network issues, how to secure network connections, and how to optimize network performance.
Some of the topics you should learn about networking security and protocols include:
- OSI Model
- Port Forwarding
- Domain keys
- White/Grey listing
You should also learn about different types of network tools and services that can help you manage your network infrastructure, such as:
- Forward proxy
- Caching server
- Reverse proxy
- Load balancer
- Network tools:
4. Understand infrastructure as code
Infrastructure as Code (IaC) is a key DevOps practice that enables you to automate the provisioning and management of your IT infrastructure using code.
Instead of manually configuring and updating servers, networks, storage, and other infrastructure elements, you can use a high-level descriptive language to define the desired state of your infrastructure and let a tool like Azure Resource Manager (ARM), Terraform, or Azure CLI execute it for you.
IaC has many benefits for DevOps teams, such as:
- Faster and more reliable deployments: You can provision infrastructure on demand in minutes instead of hours or days, and ensure that every environment is consistent and reproducible.
- Improved scalability and elasticity: You can easily scale up or down your infrastructure based on your application's needs, and pay only for what you use.
- Enhanced security and compliance: You can enforce security policies and best practices across your infrastructure, and track changes and audit logs for compliance purposes.
- Reduced costs and risks: You can avoid human errors and configuration drift that can lead to downtime, performance issues, or security breaches.
Some of the topics you should learn about IaC include:
- IaC tools and frameworks: Learn how to use tools like ARM, Terraform, or Azure CLI to define and deploy your infrastructure as code. Each tool has its own syntax, features, and advantages.
- IaC principles and best practices: Learn how to write clean, modular, reusable, and maintainable code for your infrastructure. Follow the DRY (don't repeat yourself) principle, use version control, test your code, document your code, etc.
- IaC patterns and architectures: Learn how to design your infrastructure to support different scenarios and requirements, such as high availability, disaster recovery, load balancing, etc. Use cloud design patterns to optimize your infrastructure for performance, scalability, security, etc.
5. Adopt continuous integration/continuous delivery (CI/CD) tools
Continuous integration/continuous delivery (CI/CD) is another core DevOps practice that enables you to automate the process of building, testing, and deploying your software applications. CI/CD helps you deliver software faster and more frequently, while ensuring quality and reliability.
CI/CD consists of two main stages:
- Continuous integration (CI): This is the process of merging code changes from multiple developers into a shared repository (such as GitHub) and running automated tests to verify that the code works as expected. CI helps you detect bugs early, improve code quality, and reduce integration conflicts.
- Continuous delivery (CD): This is the process of delivering code changes from the repository to different environments (such as development, testing, staging, or production) using automated pipelines. CD helps you deploy software faster and more consistently, while minimizing human errors and manual interventions.
Some of the topics you should learn about CI/CD include:
- CI/CD tools and platforms: Learn how to use tools like Jenkins, GitLab CI, Travis CI, GitHub Actions, TeamCity, Circle CI, Drone, Azure DevOps Services, etc. to create and manage your CI/CD pipelines. Each tool has its own features and capabilities.
- CI/CD principles and best practices: Learn how to implement CI/CD effectively in your DevOps workflow. Follow the principles of frequent integration and fast feedback loops.
6. Invest in application and infrastructure monitoring
Application and infrastructure monitoring is the process of collecting and analyzing data from your software applications and backend components to measure their performance, health, availability, and user experience.
Monitoring helps you detect and troubleshoot issues, optimize resource utilization, improve service quality, and ensure customer satisfaction.
Application monitoring tracks metrics such as response time, error rate, throughput, and user satisfaction from your web or mobile applications. You can use tools like real user monitoring (RUM) or synthetic monitoring to measure how your applications perform from the end-user perspective.
You can also use tools like application performance monitoring (APM) or distributed tracing to measure how your applications perform internally, such as how they interact with microservices, databases, or APIs.
Infrastructure monitoring tracks metrics such as CPU utilization, memory usage, disk I/O, network traffic, and uptime from your servers, virtual machines, containers, databases, and other backend components.
You can use tools like Datadog, Amazon CloudWatch, Azure Monitor, or IBM Cloud Monitoring to collect and visualize infrastructure metrics from various sources.
Application and infrastructure monitoring are complementary practices that provide you with a holistic view of your system's performance and reliability. By correlating application and infrastructure metrics, you can identify the root cause of issues faster and more accurately.
Some of the topics you should learn about application and infrastructure monitoring include:
- Monitoring tools and platforms: Learn how to use tools like Datadog, Amazon CloudWatch, Azure Monitor, IBM Cloud Monitoring, New Relic, AppDynamics, Instana, etc. to collect and visualize application and infrastructure metrics from various sources. Each tool has its own features and capabilities.
- Monitoring principles and best practices: Learn how to implement monitoring effectively in your DevOps workflow. Follow the principles of observability (the ability to infer the internal state of a system from its external outputs), the four golden signals (latency, traffic, errors, saturation), the RED method (request rate, error rate, duration), the USE method (utilization, saturation, errors), etc.
- Monitoring patterns and architectures: Learn how to design your monitoring system to support different scenarios and requirements, such as high availability, scalability, security, etc. Use cloud design patterns to optimize your monitoring system for performance, cost-efficiency, reliability, etc.
7. Study cloud providers
Cloud providers are companies that offer cloud computing services such as infrastructure as a service (IaaS), platform as a service (PaaS), software as a service (SaaS), etc. Cloud computing enables you to access computing resources on demand over the internet without having to manage them yourself.
As a DevOps engineer, you need to understand how cloud providers work, what services they offer, how to use them efficiently and securely, and how to integrate them with your DevOps tools and processes.
Some of the popular cloud providers you should learn about include:
- Google Cloud
- Digital Ocean
- Alibaba Cloud
Each cloud provider has its own advantages and disadvantages in terms of features, pricing, reliability, scalability, security, etc. You should compare and contrast different cloud providers based on your application's needs and preferences.
Some of the topics you should learn about cloud providers include:
- Cloud computing concepts and models: Learn the basic concepts and terminology of cloud computing, such as cloud service models (IaaS, PaaS, SaaS), cloud deployment models (public, private, hybrid, multi-cloud), cloud characteristics (on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service), etc.
- Cloud provider services and features: Learn the different types of services and features that each cloud provider offers, such as compute, storage, networking, database, analytics, security, management, etc. Learn how to use these services and features to build and run your applications in the cloud.
- Cloud provider tools and platforms: Learn how to use the tools and platforms that each cloud provider provides to manage and monitor your cloud resources and applications, such as AWS Console, Google Cloud Console, Azure Portal, AWS CLI, Google Cloud SDK, Azure CLI, AWS CloudFormation, Google Cloud Deployment Manager, Azure Resource Manager, etc.
- Cloud provider best practices and recommendations: Learn how to follow the best practices and recommendations that each cloud provider suggests to optimize your cloud usage and performance, such as security best practices, cost optimization best practices, performance optimization best practices, reliability best practices, etc.
8. Learn cloud design patterns
Cloud design patterns are general solutions to common problems or challenges that arise when designing and developing applications in the cloud. Cloud design patterns provide guidance and best practices on how to use cloud services and features effectively and efficiently.
As a DevOps engineer, you need to learn how to apply cloud design patterns to your application architecture and infrastructure design. Cloud design patterns can help you improve your application's performance, scalability, reliability, security, availability, etc.
Some of the common cloud design patterns you should learn about include:
- Availability patterns: These patterns help you ensure that your application is always available and responsive to user requests.
Examples of availability patterns are health endpoint monitoring (monitoring the health of an application using a specific URL endpoint), Queue-Based Load Leveling (using a queue to distribute workloads evenly across multiple instances), Throttling (limiting the number of requests that an application can accept or process), etc.
- Data management patterns: These patterns help you manage your data effectively and efficiently in the cloud.
Examples of data management patterns are CQRS (separating read and write operations for a data store), event sourcing (capturing changes to an application state as a sequence of events), sharding (partitioning data across multiple data stores), etc.
- Design and implementation patterns: These patterns help you design and implement your application logic and functionality in the cloud.
Examples of design and implementation patterns are microservices (decomposing an application into small independent services), serverless (using cloud functions to execute code without managing servers), Strangler (gradually replacing a legacy system with a new system), etc.
- Management and monitoring patterns: These patterns help you manage and monitor your cloud resources and applications.
Examples of management and monitoring patterns are autoscaling (adjusting the number of instances or resources based on demand), circuit breaker (handling failures and preventing cascading failures), compensating transaction (undoing the effects of a previous operation), etc.
- Performance and scalability patterns: These patterns help you improve your application's performance and scalability in the cloud.
Examples of performance and scalability patterns are Cache-Aside (loading data on demand into a cache from a data store), CDN (using a distributed network of servers to deliver content to users), Load balancer (distributing incoming requests across multiple instances or resources), etc.
- Resiliency patterns: These patterns help you improve your application's resiliency and fault tolerance in the cloud.
Examples of resiliency patterns are Bulkhead (isolating elements of an application to prevent failures from spreading), Leader Election (coordinating the actions of multiple instances of a service), Retry (repeating an operation that failed due to transient errors), etc.
- Security patterns: These patterns help you improve your application's security and compliance in the cloud.
Examples of security patterns are Federated Identity (delegating user authentication to an external identity provider), Role-Based Access Control (granting access to resources based on roles and permissions), Valet Key (using a token or key to grant limited access to resources), etc.
Becoming a DevOps engineer is not an easy task, but it’s a rewarding and fulfilling career path. By following this DevOps roadmap, you can learn the essential skills and concepts that will help you succeed in this role.
Remember that this roadmap is not a definitive or exhaustive guide, but rather a starting point for your learning journey. You should always keep learning and updating your knowledge as new technologies and practices emerge in the DevOps field.
I hope this article has given you some useful insights and resources to help you become a DevOps engineer. If you have any questions or feedback, please feel free to contact me.
Want more on the topic of DevOps? We sat down with Farhan Manjiyani, Product Marketing Manager at Grafana Labs, to talk about what development operations is and its best practices: