Saturday, December 7, 2024

Interview Preparation: Devops/ Software Architect

CI/CD pipeline (Jenkins)



  • Git branch plugin has GIT_BRANCH variable to identify branch name
  • parameters for setting variable
  • environment block , when block  link1 link2
pipeline {
agent any // This means that the pipeline will run on any available agent
stages {
stage('Build') {
steps {
git 'https://github.com/ajitfawade/node-todo-cicd.git' // This will clone the GitHub repository to the agent's workspace
docker.build('ajitfawade/node-todo-cicd') // This will build a Docker image using Dockerfile
}
}
stage('Test') {
steps {
sh 'npm install' // This will install the dependencies using npm
sh 'npm test' // This will run the unit tests using npm
}
}
stage('Deploy') {
steps {
script {
docker.withRegistry('https://registry.hub.docker.com', 'docker-hub-credentials') { // This will use the credentials for Docker Hub that you need to create in Jenkins
docker.image('ajitfawade/node-todo-cicd').push() // This will push the Docker image to Docker Hub
}
withCredentials([usernamePassword(credentialsId: 'kubernetes-credentials', usernameVariable: 'KUBE_USER', passwordVariable: 'KUBE_PASS')]) { // This will use the credentials for Kubernetes that you need to create in Jenkins
sh "kubectl --username=${KUBE_USER} --password=${KUBE_PASS} apply -f k8s.yaml" // This will deploy the Docker image to Kubernetes using kubectl and k8s.yaml file
}
}
}
}
stage('Notify') {
steps {
emailext ( // This will send an email notification using Email Extension Plugin that you need to install in Jenkins
subject: "${env.JOB_NAME} - Build # ${env.BUILD_NUMBER} - ${currentBuild.currentResult}",
body: """<p>${env.JOB_NAME} - Build # ${env.BUILD_NUMBER} - ${currentBuild.currentResult}</p>
<p>Check console output at <a href="${env.BUILD_URL}">${env.BUILD_URL}</a></p>
<p>Access deployed application at <a href="http://node-todo-cicd.k8s.io">http://node-todo-cicd.k8s.io</a></p>"""
,
to: 'ajitfawade@gmail.com'
)
}
}
}
}


Docker

intro  interview


                    # Use a multi-stage build to reduce the final image size

# Stage 1: Build the Spring Boot application
FROM maven:3.8.6-amazoncorretto-17 AS build123

WORKDIR /app

COPY pom.xml .
COPY src ./src

# Package the application into a JAR file
RUN mvn package -DskipTests

# Stage 2: Create the final image (using a smaller JRE base image)
FROM amazoncorretto:17-alpine-jdk

WORKDIR /app

# Copy only the JAR file from the build stage
COPY --from=build123  /app/target/your-app.jar     app.jar 

# Expose the port your Spring Boot app uses (usually 8080)
EXPOSE 8080

# Set the command to run when the container starts
CMD ["java", "-jar", "app.jar"]


Note: Basic Concept:

  1. WORKDIR: Establishes the target directory inside the Docker image.
  2. COPY: Copies files or directories from your local filesystem into the directory specified by the preceding WORKDIR instruction.

The COPY --from=<stage> instruction in a Dockerfile is a powerful feature used in multi-stage builds. It allows you to copy files or directories from a previous stage of your Docker build into the current stage.
This name should match name of build from previous stage.


      What is mounting? link

In Docker, "mounting" refers to the process of making a directory or file from the host machine accessible inside a container, essentially allowing the container to read and write data from a location on the host system, which is particularly useful for persisting data even when the container is restarted or deleted; this is usually achieved by using a "volume mount" or a "bind mount" command when running a container. 

      Key points about mounting in Docker:

Data persistence:
The primary reason for mounting is to ensure data is not lost when a container is stopped or removed, as the data is stored on the host machine.

Accessing host files:
You can mount a directory from your host system into the container to access files directly from the host.

Volume vs. Bind Mount:
Volume mount: Creates a separate storage area managed by Docker, where data is stored independently from the host system.

Bind mount: Directly mounts a directory from the host machine into the container, meaning changes made inside the container are reflected on the host.

How to handle permissions in docker?

755 vs 777 (chmod , chown)


To add user

FROM openjdk:8-jdk-alpine RUN groupadd -S spring && useradd -u spring -G spring USER spring:spring ARG JAR_FILE=target/*.jar COPY ${JAR_FILE} app.jar ENTRYPOINT ["java","-jar","/app.jar"]

Example:  

RUN groupadd -g 1001 mygroup && \ useradd -u 1001 -g mygroup myuser
RUN chown -R 1001:1001 /app
USER 1001:1001

RUN groupadd -g 1001 mygroup && useradd -u 1001 -g mygroup myuser:

  • Creates a group named mygroup with GID 1001.
  • Creates a user named myuser with UID 1001 and adds it to the mygroup.
  • It is very important to use the -g and -u flags to explicitly set the group and user id's.
  • RUN chown -R 1001:1001 /app:

    • Changes the ownership of the /app directory and its contents to the user and group with IDs 1001. The -R flag makes the chown recursive, so all files and directories within /app are affected.
  • USER 1001:1001:

    • Switches the user context to the myuser user (UID 1001) and mygroup (GID 1001). All subsequent commands will be executed as this user.
  •  

    Kubernetes interview


    # Deployment specification
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: my-app
    spec:
    replicas: 3 # This defines the desired number of replicas for your application.
    selector:
    matchLabels:
    app: my-app
    template:
    metadata:
    labels:
    app: my-app
    spec:
    containers:
    - name: my-app
    image: my-app:v2 # This is the new image that you want to update to.
    ports:
    - containerPort: 8080
    livenessProbe: # This defines a health check for your pod using an HTTP request.
    httpGet:
    path: /healthz
    port: 8080
    initialDelaySeconds: 10 # This defines how long to wait before performing the first probe.
    periodSeconds: 10 # This defines how often to perform the probe.
    failureThreshold: 3 # This defines how many failures to tolerate before restarting the pod.
    readinessProbe: # This defines a readiness check for your pod using an HTTP request.
    httpGet:
    path: /readyz
    port: 8080
    initialDelaySeconds: 10 # This defines how long to wait before performing the first probe.
    periodSeconds: 10 # This defines how often to perform the probe.
    successThreshold: 2 # This defines how many successes to require before marking the pod as ready.
    serviceAccountName: my-app-sa # This defines the service account that the pod will use to access the Kubernetes API server.
    strategy:
    type: RollingUpdate
    rollingUpdate:
    maxUnavailable: 2 # This means that up to 2 pods can be unavailable during the update process.
    maxSurge: 3 # This means that up to 3 more pods than the desired number can be created during the update process.

    # Service specification
    apiVersion: v1
    kind: Service
    metadata:
    name: my-app-service
    spec:
    selector:
    app: my-app # This matches the label of the pods that are part of the service.
    ports:
    - protocol: TCP
    port: 80 # This is the port that the service will expose externally.
    targetPort: 8080 # This is the port that the pods will listen on internally.
    type: LoadBalancer # This means that the service will be exposed externally using a cloud provider's load balancer.

    # Ingress specification
    apiVersion: networking.k8s.io/v1beta1
    kind: Ingress
    metadata:
    name: my-app-ingress
    spec:
    rules:
    - host: my-app.example.com # This is the host name that will be used to access the service from outside the cluster.
    http:
    paths:
    - path: / # This is the path that will be used to access the service from outside the cluster.
    backend:
    serviceName: my-app-service # This refers to the name of the service that will handle the traffic.
    servicePort: 80 # This refers to the port of the service that will handle the traffic.


    Kubeconfig 



    how it works


    Imp: link1 link2 

    https://yuminlee2.medium.com/kubernetes-kubeconfig-file-4aabe3b04ade

    https://medium.com/@vinoji2005/using-terraform-with-kubernetes-a-comprehensive-guide-237f6bbb0586


    Terraform intro



    # Configure the Azure provider
    terraform {
    required_providers {
    azurerm = {
    source = "hashicorp/azurerm"
    version = "~> 3.0.2"
    }
    }

    required_version = ">= 1.1.0"
    }

    provider "azurerm" {
    features {} //subscription detail here
    }


    resource "azurerm_resource_group" "rg" {
    name = "rg-aks-test-001"
    location = "australiaeast"
    }

    resource "azurerm_kubernetes_cluster" "default" {

    name = "aks-test-001"
    location = "australiaeast"
    resource_group_name = "rg-aks-test-001"
    dns_prefix = "dns-k8s-test"
    kubernetes_version = "1.27.9"

    default_node_pool {
    name = "testnodepool"
    node_count = 2
    vm_size = "Standard_D2_v2"
    os_disk_size_gb = 30
    }

    service_principal {
    client_id = var.clientId
    client_secret = var.clientSecret
    }

    role_based_access_control_enabled = true

    tags = {
    environment = "test"
    }
    }


    Basic Structure (for smaller projects):

    ├── main.tf
    ├── variables.tf
    ├── outputs.tf
    ├── terraform.tfvars
    

    • main.tf: This is the primary file where you define your resources.
    • variables.tf: This file contains the definitions of your Terraform variables.
    • outputs.tf: This file defines the output values that Terraform will display after applying your configuration.
    • terraform.tfvars: This file stores the actual values for your Terraform variables. It should not be committed to version control if it contains sensitive data.


    How Jenkins connect with terraform and store secrets? link


    https://iamabhi67.medium.com/deploy-azure-kubernetes-service-aks-cluster-with-terraform-b31bf0bc480c


    Apache Kafka 

    Apache kfka is message queue or pub-sub.Order is maintained at partition level only. Read from kafka blog of mine. Kafka config are managed by zookeper.   link


    https://manjulapiyumal.medium.com/mastering-kafka-advanced-concepts-every-senior-software-engineer-should-know-9283664c99e1

    interview link1 interview link 2


    Redis cache

    Redis stores data as key-value pairs. Redis uses consistent hashing and master slave for distribution of key and values. uses gossip protocol for health check.

    Redis offers two persistence options which write the in-memory data to durable storage.
    • RDB snapshot
    • Write ahead log

    Additional:
    Eg from current:  department_hours-$level-$id-$startWeek-$startYear-$range"

    In redis cache, eviction policy is decided by cluster configuration or redis client? Decided by server config and can be set in redis.conf file

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-redis-reactive</artifactId>
    </dependency>




    • opsForList(): For list operations.  
    • opsForSet(): For set operations.  
    • opsForZSet(): For sorted set operations.  
    • opsForHash(): For hash operations.  

    The value attribute establishes a cache with a specific name, while the key attribute permits the use of Spring Expression Language to compute the key dynamically. Consequently, the method result is stored in the ‘product’ cache, where respective ‘product_id’ serves as the unique key. This approach optimizes caching by associating each result with a distinct key. We can also sue cacheName instead of value

    With option

    @Service
    public class UserService {

        @Cacheable(value = "users", key = "#id", condition = "#id > 0", unless = "#result == null")
        public User getUserById(long id) {
            // ... method implementation ...
        }
    }


    @Service
    public class UserService {

        @CacheEvict(value = "users", key = "#user.id", condition = "#user != null", beforeInvocation = false)
        public void updateUser(User user) {
            // ... method implementation ...
        }

        @CacheEvict(value = "users", allEntries = true)
        public void clearUserCache(){
            // ... method implementation ...
        }
    }


    MongoDb



    MongoDB is a NOSQL database that uses document-oriented data model which means data is stored in BSON(Binary JSON) format. MongoDB uses B+trees to implement its indexing

    MongoDB is an open-source, cross-platform NoSQL database system. It’s document-oriented and highly scalable, making it easy to store and manage data. It’s known for its speed, robustness and flexibility, and provides a range of features, including indexes, authentication and authorization, and automatic sharding and replication.

    MongoDB makes use of collections that is similar to table in Postgres or MySql.

    Best Article link link2

    Before jumping into mongoDB. Let’s understand how disk storage works and how database stored data on disk.

    In a unix file system, we have a hierarchical file system that organizes files and directories into a tree-like structure. In the Unix file system, each file and directory has a unique path, starting from the root directory represented by the “/” symbol.

    When a file is created in the Unix file system, it is stored as a sequence of bytes on disk. The disk is divided into blocks, and each file is stored in one or more blocks. The blocks are grouped into larger units called allocation blocks or disk blocks.

    The Unix file system uses an inode (short for index node) to manage the storage of a file. An inode is a data structure that contains information about a file, such as its size, creation time, and permissions.

    When a file is created, the Unix file system assigns an inode to the file and stores information about the file in the inode. The inode also contains pointers to the disk blocks that store the actual data for the file.

    How mongoDB works?

    MongoDB uses a memory-mapped file system to store its data. This means that the data is stored on disk in a binary format, and the operating system maps a portion of the file system into memory, allowing MongoDB to access the data directly.




    import org.springframework.boot.SpringApplication;
    import org.springframework.boot.autoconfigure.SpringBootApplication;
    import org.springframework.data.annotation.Id;
    import org.springframework.data.mongodb.core.mapping.Document;
    import org.springframework.data.mongodb.repository.MongoRepository;
    import org.springframework.web.bind.annotation.*;

    import java.util.List;

    @SpringBootApplication
    @RestController
    public class MongoDocumentApplication {

        private final DocumentRepository documentRepository;

        public MongoDocumentApplication(DocumentRepository documentRepository) {
            this.documentRepository = documentRepository;
        }

        public static void main(String[] args) {
            SpringApplication.run(MongoDocumentApplication.class, args);
        }

        @PostMapping("/documents")
        public DocumentModel saveDocument(@RequestBody DocumentModel document) {
            return documentRepository.save(document);
        }

        @GetMapping("/documents")
        public List<DocumentModel> getAllDocuments() {
            return documentRepository.findAll();
        }


        @GetMapping("/documents/{id}")
        public DocumentModel getDocumentById(@PathVariable String id) {
            return documentRepository.findById(id).orElse(null); // Handle not found scenario
        }

        @DeleteMapping("/documents/{id}")
        public void deleteDocument(@PathVariable String id) {
           documentRepository.deleteById(id);
        }



        @Document("documents") // Collection name in MongoDB
        public static class DocumentModel {

            @Id
            private String id; // Use String for MongoDB _id

            private String name;
            private String content;
            // ... other fields

            // Getters and setters (Important!)

            public String getId() {
                return id;
            }

            public void setId(String id) {
                this.id = id;
            }

            public String getName() {
                return name;
            }

            public void setName(String name) {
                this.name = name;
            }

            public String getContent() {
                return content;
            }

            public void setContent(String content) {
                this.content = content;
            }
        }

        public interface DocumentRepository extends MongoRepository<DocumentModel, String> {
            // Add custom queries if needed
        }
    }


    Elastic Search
     link link2

    Elasticsearch is a document oriented database. Elasticsearch is a distributed, RESTful search and analytics engine built on top of Apache Lucene. It allows you to store, search, and analyze large volumes of data quickly and in near real-time.
    Elasticsearch uses JSON documents to store data, making it flexible and easy to work with.

    Observability in Microservices

    • Traces: Records the flow of a request as it traverses through different services and components, providing visibility into complex service dependencies.
    • Metrics: Measures the performance of applications by collecting data such as request counts, CPU usage, and response times.
    • Logs: Collects log data that records events, errors, and warnings, providing contextual information for troubleshooting.


    Deep dive


    OpenTelemetry link


    OpenTelemetry is a Cloud Native Computing Foundation (CNCF) project designed to create a standardized way to collect telemetry data (i.e., traces, metrics, and logs) from various applications and programming languages. It unifies different observability signals under a single framework, making it easier to gain a complete view of application performance and identify issues across distributed systems.

    OpenTelemetry (OTel) is an open-source framework that collects and exports telemetry data from applications and services. It's used to monitor and analyze software performance and behavior. The OpenTelemetry collector is in charge of collecting, processing, and exporting collected telemetry data. It is vendor-agnostic(open source) and provides capabilities for cost reduction and easy management.


    Micrometer link

    In contrast, Micrometer is a vendor-neutral application metrics facade (tools, libraries, or frameworks that are designed to work with a variety of monitoring, logging, or observability systems without being locked into a specific vendor's ecosystem) for the JVM, primarily used to instrument dimensional metrics (metrics with tags) in Java applications.It provides a consistent abstraction for capturing metrics and is the default metrics library in Spring Boot, making it easy to collect application-level data and export it to popular monitoring systems like Prometheus, Datadog, and more.

    Micrometer Tracing is a facade over the Brave and OpenTelemetry tracers that gives insight into complex distributed systems at the level of an individual user request. Identify the root cause of issues faster with distributed tracing
    .

    You can think of Micrometer as a specialized tool focused solely on metrics collection for Java-based applications, while OpenTelemetry is a broader observability framework that goes beyond just metrics and supports traces, metrics, and logs for applications written in various programming languages.

    Diff between opentelemetry vs micrometer


    Explain difference between micrometer and actuator in spring boot 

    • Actuator is the foundation, Micrometer is the instrument: Actuator provides the framework for exposing metrics, while Micrometer is the tool you use to collect and structure those metrics.
    • Actuator exposes Micrometer metrics: When you use Micrometer in your Spring Boot application, Actuator automatically configures it and exposes the metrics through its /metrics endpoint.
    • Actuator provides more than just metrics: Actuator offers a wider range of management and monitoring capabilities beyond just metrics, such as health checks, environment information, and more.

    Explain how metrics can be collected from spring boot application to prometheus


    Prometheus 

    Data pull and storage tool Architecture

    The general term for collecting metrics from the targets using Prometheus is called scraping.

    https://devopscube.com/prometheus-architecture/





    Grafana

    visualisation tool link

    Grafana is visualisation tool for metrics which needs time series data to show data. Grafana's strength lies in its ability to create rich dashboards and visualizations of data that changes over time. While Grafana supports a variety of data sources (Prometheus, InfluxDB, Elasticsearch, MySQL, PostgreSQL, etc.), the key is that these data sources must be able to provide time-series data. Even if a data source is not exclusively time-series (like some SQL databases), Grafana needs to query it in a way that returns data points with timestamps.


    Kibana

    Kibana is designed to work primarily with document-based data, specifically the kind of data stored in Elasticsearch.. Kibana needs data that's indexed in Elasticsearch, and Elasticsearch stores data as JSON documents.
    While time-series data can be represented in this format (and is often used with Kibana), Kibana itself is fundamentally document-oriented, not exclusively time-series


    Note:

    Prometheus is used along with grafana for health monitoring and metrics. Prometheus is time series database. To pull data(can be push also but not used that way in general) by prometheus application or node should expose endpoints /metrics. Spring boot exposes endpoints through actuator and micrometer make data available in prometheus format. Micrometer is facade for collecting metrics data. PromQL is a query lanaguage.

    Elastic search is a  distributed document oriented storage for logging and other type of data. Data need to be pushed to it. Then kibana pulls data from Elastic search 

    • Micrometer with grafana and prometheus : link
    • Micrometer for tracing(Facade for various services) link
    • There are various other tool for observability like Datadog, Splunk, Datarelics, AppDynamics, etc



    Cross Cutting concerns link

    The microservices chassis is a set of frameworks that address numerous cross-cutting concerns such as
    • Externalized configuration
    • Health checks
    • Application metrics
    • Service Discovery
    • Circuit breakers
    • Distributed tracing
    • Exception tracking

    Load testing


    Jmeter (load testing)

    https://blog.bigoodyssey.com/rest-api-load-testing-with-apache-jmeter-a4d25ea2b7b6

     

    K6 (Load testing)

    https://blog.stackademic.com/optimizing-api-performance-through-k6-load-testing-b38cf1ff457c


    Cypress or Enzyme(UI testing)


    Testing framework

    https://medium.com/simform-engineering/testing-spring-boot-applications-best-practices-and-frameworks-6294e1068516

    https://medium.com/@nihatonder87/how-to-combine-testcontainers-rest-assured-and-wiremock-8e5cb3ede16e **


    Distributed transaction

    link1 link2


    Devops 

    How autoscaling is done in k8s
    how to persist info in k8s
    how secrets are stored in Iac
    Difference between API gateway and Loadbalancer link
    Horizontal vs Vertical scaling 
    Forward prosy vs reverese proxy
    reverse proxy vs load balancer vs api gateway ? they can be used together also but purpose is diff
    ssh protocol or key
    Port forwarding : remote vs local


    Note: 

    • Core Java : [link]
    • Spring boot framework: link
    • Serverless Application
    • System Design Interview question link
    • Coding round question link
    • Serverless vs Paas
    • Microservice chassis Pattern link
    • Blue green deployment
    • Atleast once delivery
    • Configure autoscale in k8s
    • Token based Arch

    No comments:

    Post a Comment

    Data Engineering and Best practices

    Data and types Data at rest (e.g. batch data pipelines / data stored in warehouses or object stores) Data in motion (e.g. streaming pipeline...