Coding Fungus: November 2022

The following frameworks are available to implement the Saga orchestration pattern:

Camunda is a Java-based framework that supports the Business Process Model and Notation (BPMN) standard for workflow and process automation.
Apache Camel provides implementation for the Saga EIP (Enterprise Integration Pattern), a way to define a series of related actions in a Camel route that should be either completed successfully (all of them) or not-executed or compensated.
IBM App Connect allows you to draw out a flow using various built-in adapters and configure its properties appropriately to create a Saga flow

Apache oozie

API gateway: https://learn.microsoft.com/en-us/azure/architecture/microservices/design/gateway

https://konghq.com/learning-center/api-gateway/why-microservices-need-api-gateway

Nginx Ingress: https://kubernetes.github.io/ingress-nginx/user-guide/basic-usage/

Very Important Topic

Ingress Vs loadbalancer: https://www.baeldung.com/ops/kubernetes-ingress-vs-load-balancer

Reactive programming

https://dassum.medium.com/building-a-reactive-restful-web-service-using-spring-boot-and-postgres-c8e157dbc81d

Some of these terms have different meanings depending on whether you’re talking about relational versus NoSQL databases.

Replication:

In always-available relational environments, you want some way to synchronize your database instances so they’re as close to up-to-date to each other as possible. This synchronization scheme is called Replication. You’ll want this for “hot failover” if your primary instance goes down, and you may have a stack of hosts that are all replicating off of a primary. You may also have “write masters” and “read slaves” that are used in very performance-sensitive environments.

Different relational DB worlds do replication differently; some directly send queries to replicas using network connections, others stream queries (or rows to be updated) as files that are “played”, etc. One concern in any replication stack is “replica lag”, which is something DBAs have to keep track of.

NoSQL clusters also have a notion of replication which is often similar in design to the idea of RAIDed disk arrays in that the same data may be “striped” to multiple nodes.

Partitioning:

Partitioning is a term that has somewhat different meanings in relational versus NoSQL worlds. In relational worlds, partitioning is a storage-level concept that applies at the table level; if one has a very large table, it can be “partitioned” into smaller storage units using various types of partitioning rules, based on a user-specified partition key. Here’s a good discussion of this type of partitioning in MySQL: What is MySQL Partitioning?

In most NoSQL worlds, partitioning describes the rules for allocating different pieces of data to different nodes, as in the vast majority of NoSQL DB’s, you don’t have all your data on every node. There is a notion of a partition key, which must be chosen with care, and is a very important part of your overall database design, and the partition key and your partitioning rules determine what nodes get a specific piece of data, and how requests for that data are routed if you’re asking for it from a node that doesn’t have that data.

The “P” in the CAP theorem is this sense of partitioning…

The analog of the NoSQL sense of partitioning in distributed relational worlds is Sharding.

Sharding:

As mentioned above, Sharding is to distributed relational database environments as Partitioning is to NoSQL environments. The main difference is that most relational databases require you to have app-visible policies for sharding, and care must be taken with the shard key so you can make sure that joins can be done within the scope of your overall shard key (without needing to attempt extremely slow cross-database joins). So, picking the right shard key is pretty fundamental to the success of a distributed relational DB world.

“Resharding” and shard-splitting are tasks that DBAs and app developers may have to deal with occasionally.

Clustering:

In relational databases, clustering unfortunately has many vendor-specific meanings. Oracle has something called a “table cluster”, which has some storage optimizations, etc. PostgreSQL has a notion of CLUSTER where individual tables can be physically rebuilt using a specific cluster key for performance reasons.

That said, the most widely-used notion of “clustering” in relational databases is probably referring to the notion of a clustered index. This is also sometimes called a primary index, in that storage engines that support clustered index use it to organize the base table data around the cluster key - which is almost always the primary key or a part of a composite PK - to maximize performance.

****************************************************************************

Replication - Copying an entire table or database onto multiple servers. Used for improving speed of access to reference records such as master data.

Partitioning - Splitting up a large monolithic database into multiple smaller databases based on data cohesion. Example - splitting a large ERP database into modular databases like accounts database, sales database, materials database etc.

Clustering - Using multiple application servers to access the same database. Used for computation intensive, parallelized, analytical applications that work on non volatile data.

Sharding - Splitting up a large table of data horizontally i.e. row-wise. A table containing 100s of millions of rows may be split into multiple tables containing 1 million rows each. Each of the tables resulting from the split will be placed into a separate database/server. Sharding is done to spread load and improve access speed. Facebook/twitter tables fit into this category.

Coding Fungus

Friday, November 11, 2022

Distributed Transaction in Microservices

Wednesday, November 9, 2022

sharding vs partitioning vs clustering vs replication

Technical Writing in field of research

Report Abuse