Coding Fungus: Interview Preparation: Devops/ Software Architect

Apache Kafka

Apache kfka is message queue or pub-sub.Order is maintained at partition level only. Read from kafka blog of mine. Kafka config are managed by zookeper. link

https://medium.com/simform-engineering/kafka-integration-made-easy-with-spring-boot-b7aaf44d8889

https://manjulapiyumal.medium.com/mastering-kafka-advanced-concepts-every-senior-software-engineer-should-know-9283664c99e1

interview link1 interview link 2

kafka offset
Exact once deliverylink
https://quix.io/blog/kafka-auto-offset-reset-use-cases-and-pitfalls
https://eryilmaz0.medium.com/designing-event-consumers-everything-about-commit-offsets-in-kafka-23d3f88472bd
Kafka
https://mail-narayank.medium.com/kafka-architecture-internal-d0b3334d1df
How do you ensure message order is maintained link
Raft[its replacement of zookeeper] vs Zookeeper
Kafka(pull model, event processing) vs RabbitMQ(push model, complex message routing)

Redis cache

Redis stores data as key-value pairs. Redis uses consistent hashing and master slave for distribution of key and values. uses gossip protocol for health check.

Redis offers two persistence options which write the in-memory data to durable storage.

RDB snapshot
Write ahead log

Cache strategies: read-through, write-through, write-behind, and cache-aside

Types of cache: In memory, distributed, client side

Additional:

Basic https://medium.com/codex/7-redis-features-you-might-not-know-bab8c9beb2c
caching strategies link
Link For Depth link link2
Spring boot caching link
Type and strategies

Eg from current: department_hours-$level-$id-$startWeek-$startYear-$range"

In redis cache, eviction policy is decided by cluster configuration or redis client? Decided by server config and can be set in redis.conf file

<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis-reactive</artifactId>
</dependency>

opsForList(): For list operations.
opsForSet(): For set operations.
opsForZSet(): For sorted set operations.
opsForHash(): For hash operations.

The value attribute establishes a cache with a specific name, while the key attribute permits the use of Spring Expression Language to compute the key dynamically. Consequently, the method result is stored in the ‘product’ cache, where respective ‘product_id’ serves as the unique key. This approach optimizes caching by associating each result with a distinct key. We can also sue cacheName instead of value

With option

@Service

public class UserService {

@Cacheable(value = "users", key = "#id", condition = "#id > 0", unless = "#result == null")

public User getUserById(long id) {

// ... method implementation ...

}

@Service

public class UserService {

@CacheEvict(value = "users", key = "#user.id", condition = "#user != null", beforeInvocation = false)

public void updateUser(User user) {

// ... method implementation ...

}

@CacheEvict(value = "users", allEntries = true)

public void clearUserCache(){

// ... method implementation ...

}

MongoDb

MongoDB is a NOSQL database that uses document-oriented data model which means data is stored in BSON(Binary JSON) format. MongoDB uses B+trees to implement its indexing

MongoDB is an open-source, cross-platform NoSQL database system. It’s document-oriented and highly scalable, making it easy to store and manage data. It’s known for its speed, robustness and flexibility, and provides a range of features, including indexes, authentication and authorization, and automatic sharding and replication.

MongoDB makes use of collections that is similar to table in Postgres or MySql.

Best Article link link2
Before jumping into mongoDB. Let’s understand how disk storage works and how database stored data on disk.
In a unix file system, we have a hierarchical file system that organizes files and directories into a tree-like structure. In the Unix file system, each file and directory has a unique path, starting from the root directory represented by the “/” symbol.
When a file is created in the Unix file system, it is stored as a sequence of bytes on disk. The disk is divided into blocks, and each file is stored in one or more blocks. The blocks are grouped into larger units called allocation blocks or disk blocks.
The Unix file system uses an inode (short for index node) to manage the storage of a file. An inode is a data structure that contains information about a file, such as its size, creation time, and permissions.

When a file is created, the Unix file system assigns an inode to the file and stores information about the file in the inode. The inode also contains pointers to the disk blocks that store the actual data for the file.
How mongoDB works?
MongoDB uses a memory-mapped file system to store its data. This means that the data is stored on disk in a binary format, and the operating system maps a portion of the file system into memory, allowing MongoDB to access the data directly.

import org.springframework.boot.SpringApplication;

import org.springframework.boot.autoconfigure.SpringBootApplication;

import org.springframework.data.annotation.Id;

import org.springframework.data.mongodb.core.mapping.Document;

import org.springframework.data.mongodb.repository.MongoRepository;

import org.springframework.web.bind.annotation.*;

import java.util.List;

@SpringBootApplication

@RestController

public class MongoDocumentApplication {

private final DocumentRepository documentRepository;

public MongoDocumentApplication(DocumentRepository documentRepository) {

this.documentRepository = documentRepository;

}

public static void main(String[] args) {

SpringApplication.run(MongoDocumentApplication.class, args);

}

@PostMapping("/documents")

public DocumentModel saveDocument(@RequestBody DocumentModel document) {

return documentRepository.save(document);

}

@GetMapping("/documents")

public List<DocumentModel> getAllDocuments() {

return documentRepository.findAll();

}

@GetMapping("/documents/{id}")

public DocumentModel getDocumentById(@PathVariable String id) {

return documentRepository.findById(id).orElse(null); // Handle not found scenario

}

@DeleteMapping("/documents/{id}")

public void deleteDocument(@PathVariable String id) {

documentRepository.deleteById(id);

}

@Document("documents") // Collection name in MongoDB

public static class DocumentModel {

@Id

private String id; // Use String for MongoDB _id

private String name;

private String content;

// ... other fields

// Getters and setters (Important!)

public String getId() {

return id;

}

public void setId(String id) {

this.id = id;

}

public String getName() {

return name;

}

public void setName(String name) {

this.name = name;

}

public String getContent() {

return content;

}

public void setContent(String content) {

this.content = content;

}

public interface DocumentRepository extends MongoRepository<DocumentModel, String> {

// Add custom queries if needed

}

CI/CD pipeline (Jenkins)

https://www.lambdatest.com/blog/jenkins-declarative-pipeline-examples/

https://ajitfawade.medium.com/jenkins-interview-questions-and-answers-day-29-of-90-days-of-devops-141155440200

Git branch plugin has GIT_BRANCH variable to identify branch name
parameters for setting variable
environment block , when block link1 link2

pipeline {
    agent any // This means that the pipeline will run on any available agent
    stages {
        stage('Build') {
            steps {
                git 'https://github.com/ajitfawade/node-todo-cicd.git' // This will clone the GitHub repository to the agent's workspace
                docker.build('ajitfawade/node-todo-cicd') // This will build a Docker image using Dockerfile
            }
        }
        stage('Test') {
            steps {
                sh 'npm install' // This will install the dependencies using npm
                sh 'npm test' // This will run the unit tests using npm
            }
        }
        stage('Deploy') {
            steps {
                script {
                    docker.withRegistry('https://registry.hub.docker.com', 'docker-hub-credentials') { // This will use the credentials for Docker Hub that you need to create in Jenkins
                        docker.image('ajitfawade/node-todo-cicd').push() // This will push the Docker image to Docker Hub
                    }
                    withCredentials([usernamePassword(credentialsId: 'kubernetes-credentials', usernameVariable: 'KUBE_USER', passwordVariable: 'KUBE_PASS')]) { // This will use the credentials for Kubernetes that you need to create in Jenkins
                        sh "kubectl --username=${KUBE_USER} --password=${KUBE_PASS} apply -f k8s.yaml" // This will deploy the Docker image to Kubernetes using kubectl and k8s.yaml file
                    }
                }
            }
        }
        stage('Notify') {
            steps {
                emailext ( // This will send an email notification using Email Extension Plugin that you need to install in Jenkins
                    subject: "${env.JOB_NAME} - Build # ${env.BUILD_NUMBER} - ${currentBuild.currentResult}",
                    body: """<p>${env.JOB_NAME} - Build # ${env.BUILD_NUMBER} - ${currentBuild.currentResult}</p>
                             <p>Check console output at <a href="${env.BUILD_URL}">${env.BUILD_URL}</a></p>
                             <p>Access deployed application at <a href="http://node-todo-cicd.k8s.io">http://node-todo-cicd.k8s.io</a></p>""",
                    to: 'ajitfawade@gmail.com'
                )
            }
        }
    }
}

Docker

intro interview

# Use a multi-stage build to reduce the final image size

# Stage 1: Build the Spring Boot application
FROM maven:3.8.6-amazoncorretto-17 AS build123

WORKDIR /app

COPY pom.xml .
COPY src ./src

# Package the application into a JAR file
RUN mvn package -DskipTests

# Stage 2: Create the final image (using a smaller JRE base image)
FROM amazoncorretto:17-alpine-jdk

WORKDIR /app

# Copy only the JAR file from the build stage
COPY --from=build123 /app/target/your-app.jar app.jar

# Expose the port your Spring Boot app uses (usually 8080)
EXPOSE 8080

# Set the command to run when the container starts
CMD ["java", "-jar", "app.jar"]

Note: Basic Concept:

WORKDIR: Establishes the target directory inside the Docker image.
COPY: Copies files or directories from your local filesystem into the directory specified by the preceding WORKDIR instruction.

The COPY --from=<stage> instruction in a Dockerfile is a powerful feature used in multi-stage builds. It allows you to copy files or directories from a previous stage of your Docker build into the current stage.

This name should match name of build from previous stage.

What is mounting? link

In Docker, "mounting" refers to the process of making a directory or file from the host machine accessible inside a container, essentially allowing the container to read and write data from a location on the host system, which is particularly useful for persisting data even when the container is restarted or deleted; this is usually achieved by using a "volume mount" or a "bind mount" command when running a container.

Key points about mounting in Docker:

Data persistence:
The primary reason for mounting is to ensure data is not lost when a container is stopped or removed, as the data is stored on the host machine.

Accessing host files:
You can mount a directory from your host system into the container to access files directly from the host.

Volume vs. Bind Mount:
Volume mount: Creates a separate storage area managed by Docker, where data is stored independently from the host system.

Bind mount: Directly mounts a directory from the host machine into the container, meaning changes made inside the container are reflected on the host.

How to handle permissions in docker?
https://labex.io/tutorials/docker-how-to-handle-permissions-in-docker-415866
755 vs 777 (chmod , chown)

To add user

FROM openjdk:8-jdk-alpine RUN groupadd -S spring && useradd -u spring -G spring USER spring:spring ARG JAR_FILE=target/*.jar COPY ${JAR_FILE} app.jar ENTRYPOINT ["java","-jar","/app.jar"]

Example:

RUN groupadd -g 1001 mygroup && \ useradd -u 1001 -g mygroup myuser
RUN chown -R 1001:1001 /app
USER 1001:1001

RUN groupadd -g 1001 mygroup && useradd -u 1001 -g mygroup myuser:
Creates a group named mygroup with GID 1001.
Creates a user named myuser with UID 1001 and adds it to the mygroup.
It is very important to use the -g and -u flags to explicitly set the group and user id's.
RUN chown -R 1001:1001 /app:
Changes the ownership of the /app directory and its contents to the user and group with IDs 1001. The -R flag makes the chown recursive, so all files and directories within /app are affected.
USER 1001:1001:
Switches the user context to the myuser user (UID 1001) and mygroup (GID 1001). All subsequent commands will be executed as this user.

Kubernetes interview

# Deployment specification
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3 # This defines the desired number of replicas for your application.
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: my-app:v2 # This is the new image that you want to update to.
        ports:
        - containerPort: 8080
        livenessProbe: # This defines a health check for your pod using an HTTP request.
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 10 # This defines how long to wait before performing the first probe.
          periodSeconds: 10 # This defines how often to perform the probe.
          failureThreshold: 3 # This defines how many failures to tolerate before restarting the pod.
        readinessProbe: # This defines a readiness check for your pod using an HTTP request.
          httpGet:
            path: /readyz
            port: 8080
          initialDelaySeconds: 10 # This defines how long to wait before performing the first probe.
          periodSeconds: 10 # This defines how often to perform the probe.
          successThreshold: 2 # This defines how many successes to require before marking the pod as ready.
      serviceAccountName: my-app-sa # This defines the service account that the pod will use to access the Kubernetes API server.
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 2 # This means that up to 2 pods can be unavailable during the update process.
      maxSurge: 3 # This means that up to 3 more pods than the desired number can be created during the update process.

# Service specification
apiVersion: v1
kind: Service
metadata:
  name: my-app-service
spec:
  selector:
    app: my-app # This matches the label of the pods that are part of the service.
  ports:
  - protocol: TCP
    port: 80 # This is the port that the service will expose externally.
    targetPort: 8080 # This is the port that the pods will listen on internally.
  type: LoadBalancer # This means that the service will be exposed externally using a cloud provider's load balancer.

# Ingress specification
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: my-app-ingress
spec:
  rules:
  - host: my-app.example.com # This is the host name that will be used to access the service from outside the cluster.
    http:
      paths:
      - path: / # This is the path that will be used to access the service from outside the cluster.
        backend:
          serviceName: my-app-service # This refers to the name of the service that will handle the traffic.
          servicePort: 80 # This refers to the port of the service that will handle the traffic.

Kubeconfig

how it works

How to handle secrets? link
How to handle auto scaling in k8shttps://medium.com/@extio/the-power-of-kubernetes-auto-scaling-scaling-your-applications-with-ease-cb232391400c
kubeconfig file link
k8s in jenkins pipeline link

Imp: link1 link2

https://yuminlee2.medium.com/kubernetes-kubeconfig-file-4aabe3b04ade

https://medium.com/@vinoji2005/using-terraform-with-kubernetes-a-comprehensive-guide-237f6bbb0586

Terraform intro

# Configure the Azure provider
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.0.2"
}
}

required_version = ">= 1.1.0"
}

provider "azurerm" {
features {} //subscription detail here
}

resource "azurerm_resource_group" "rg" {
name = "rg-aks-test-001"
location = "australiaeast"
}

resource "azurerm_kubernetes_cluster" "default" {

name = "aks-test-001"
location = "australiaeast"
resource_group_name = "rg-aks-test-001"
dns_prefix = "dns-k8s-test"
kubernetes_version = "1.27.9"

default_node_pool {
name = "testnodepool"
node_count = 2
vm_size = "Standard_D2_v2"
os_disk_size_gb = 30
}

service_principal {
client_id = var.clientId
client_secret = var.clientSecret
}

role_based_access_control_enabled = true

tags = {
environment = "test"
}
}

Basic Structure (for smaller projects):

├── main.tf
├── variables.tf
├── outputs.tf
├── terraform.tfvars

main.tf: This is the primary file where you define your resources.
variables.tf: This file contains the definitions of your Terraform variables.
outputs.tf: This file defines the output values that Terraform will display after applying your configuration.
terraform.tfvars: This file stores the actual values for your Terraform variables. It should not be committed to version control if it contains sensitive data.

How Jenkins connect with terraform and store secrets? link

IAC IAC link

https://iamabhi67.medium.com/deploy-azure-kubernetes-service-aks-cluster-with-terraform-b31bf0bc480c

Elastic Search link link2

Elasticsearch is a document oriented database.

Elasticsearch is based on Apache lucene. It uses Inverted index(word to frequency mapping)

Elasticsearch is a distributed, RESTful search and analytics engine built on top of Apache Lucene. It allows you to store, search, and analyze large volumes of data quickly and in near real-time.
Elasticsearch uses JSON documents to store data, making it flexible and easy to work with.

Observability in Microservices

Traces: Records the flow of a request as it traverses through different services and components, providing visibility into complex service dependencies.
Metrics: Measures the performance of applications by collecting data such as request counts, CPU usage, and response times.
Logs: Collects log data that records events, errors, and warnings, providing contextual information for troubleshooting.

Deep dive

OpenTelemetry link

OpenTelemetry is a Cloud Native Computing Foundation (CNCF) project designed to create a standardized way to collect telemetry data (i.e., traces, metrics, and logs) from various applications and programming languages. It unifies different observability signals under a single framework, making it easier to gain a complete view of application performance and identify issues across distributed systems.

OpenTelemetry (OTel) is an open-source framework that collects and exports telemetry data from applications and services. It's used to monitor and analyze software performance and behavior. The OpenTelemetry collector is in charge of collecting, processing, and exporting collected telemetry data. It is vendor-agnostic(open source) and provides capabilities for cost reduction and easy management.

Micrometer link

In contrast, Micrometer is a vendor-neutral application metrics facade (tools, libraries, or frameworks that are designed to work with a variety of monitoring, logging, or observability systems without being locked into a specific vendor's ecosystem) for the JVM, primarily used to instrument dimensional metrics (metrics with tags) in Java applications.It provides a consistent abstraction for capturing metrics and is the default metrics library in Spring Boot, making it easy to collect application-level data and export it to popular monitoring systems like Prometheus, Datadog, and more.

Micrometer Tracing is a facade over the Brave and OpenTelemetry tracers that gives insight into complex distributed systems at the level of an individual user request. Identify the root cause of issues faster with distributed tracing.

You can think of Micrometer as a specialized tool focused solely on metrics collection for Java-based applications, while OpenTelemetry is a broader observability framework that goes beyond just metrics and supports traces, metrics, and logs for applications written in various programming languages.

Diff between opentelemetry vs micrometer

Explain difference between micrometer and actuator in spring boot

Actuator is the foundation, Micrometer is the instrument: Actuator provides the framework for exposing metrics, while Micrometer is the tool you use to collect and structure those metrics.
Actuator exposes Micrometer metrics: When you use Micrometer in your Spring Boot application, Actuator automatically configures it and exposes the metrics through its /metrics endpoint.
Actuator provides more than just metrics: Actuator offers a wider range of management and monitoring capabilities beyond just metrics, such as health checks, environment information, and more.

Explain how metrics can be collected from spring boot application to prometheus

Prometheus

Data pull and storage tool Architecture

The general term for collecting metrics from the targets using Prometheus is called scraping.

https://devopscube.com/prometheus-architecture/

Grafana

visualisation tool link

Grafana is visualisation tool for metrics which needs time series data to show data. Grafana's strength lies in its ability to create rich dashboards and visualizations of data that changes over time. While Grafana supports a variety of data sources (Prometheus, InfluxDB, Elasticsearch, MySQL, PostgreSQL, etc.), the key is that these data sources must be able to provide time-series data. Even if a data source is not exclusively time-series (like some SQL databases), Grafana needs to query it in a way that returns data points with timestamps.

Kibana

Kibana is designed to work primarily with document-based data, specifically the kind of data stored in Elasticsearch.. Kibana needs data that's indexed in Elasticsearch, and Elasticsearch stores data as JSON documents.
While time-series data can be represented in this format (and is often used with Kibana), Kibana itself is fundamentally document-oriented, not exclusively time-series

Note:

Prometheus is used along with grafana for health monitoring and metrics. Prometheus is time series database. To pull data(can be push also but not used that way in general) by prometheus application or node should expose endpoints /metrics. Spring boot exposes endpoints through actuator and micrometer make data available in prometheus format. Micrometer is facade for collecting metrics data. PromQL is a query lanaguage.

Elastic search is a distributed document oriented storage for logging and other type of data. Data need to be pushed to it. Then kibana pulls data from Elastic search

Micrometer with grafana and prometheus : link
Micrometer for tracing(Facade for various services) link
There are various other tool for observability like Datadog, Splunk, Datarelics, AppDynamics, etc

Cross Cutting concerns link

The microservices chassis is a set of frameworks that address numerous cross-cutting concerns such as

Externalized configuration
Health checks
Application metrics
Service Discovery
Circuit breakers
Distributed tracing
Exception tracking

Load testing

Jmeter (load testing)
https://blog.bigoodyssey.com/rest-api-load-testing-with-apache-jmeter-a4d25ea2b7b6

K6 (Load testing)
https://blog.stackademic.com/optimizing-api-performance-through-k6-load-testing-b38cf1ff457c

Cypress or Enzyme(UI testing)

Testing framework
https://medium.com/simform-engineering/testing-spring-boot-applications-best-practices-and-frameworks-6294e1068516
https://medium.com/@nihatonder87/how-to-combine-testcontainers-rest-assured-and-wiremock-8e5cb3ede16e **

Distributed transaction

link1 link2

Devops

link

How autoscaling is done in k8s

how to persist info in k8s

how secrets are stored in Iac

Difference between API gateway and Loadbalancer link
Horizontal vs Vertical scaling

Forward prosy vs reverese proxy

reverse proxy vs load balancer vs api gateway ? they can be used together also but purpose is diff
ssh protocol or key

Port forwarding : remote vs local

Note:

Core Java : [link]
Spring boot framework: link
Serverless Application
System Design Interview question link
Coding round question link
Serverless vs Paas
Microservice chassis Pattern link
Blue green deployment
Atleast once delivery
Configure autoscale in k8s
Token based Arch

Coding Fungus

Saturday, December 7, 2024

Interview Preparation: Devops/ Software Architect

Apache Kafka

Redis cache

MongoDb

Best Article link link2

Before jumping into mongoDB. Let’s understand how disk storage works and how database stored data on disk.

When a file is created, the Unix file system assigns an inode to the file and stores information about the file in the inode. The inode also contains pointers to the disk blocks that store the actual data for the file.

How mongoDB works?

CI/CD pipeline (Jenkins)

Docker

intro interview

Terraform intro

Elastic Search link link2

Observability in Microservices

Prometheus

Grafana

Load testing

Distributed transaction

No comments:

Post a Comment

Preparation Guide: Interview

Report Abuse