← Back to Blog

Right Usage of Kafka: Patterns and Anti-Patterns

kafkamessagingdistributed-systems

Right Usage of Kafka: Patterns and Anti-Patterns

May 6, 2026
4,231 views
5.0
Paal Gyula
Paal Gyula
gyula@pilab.hu

How to use Apache Kafka effectively — from partitioning strategies to consumer group design, and common mistakes to avoid.


🚀 Introduction

Kafka is a powerful distributed event streaming platform, but using it correctly requires understanding its core concepts deeply.

📋 Key Topics Covered

  • Topic design and partitioning strategies
  • Consumer groups and rebalancing
  • Exactly-once semantics
  • Schema Registry and Avro
  • Retention policies and compaction
  • Monitoring: lag, throughput, and errors
  • Common anti-patterns (using Kafka as a database, over-partitioning)
  • Kafka vs RabbitMQ vs Redis Streams

🏗️ Topic Design and Partitioning Strategies

🎯 Proper Topic Design

Topics should be designed around business domains or event types, not technical implementation details. Each topic represents a category of related events.

Best Practices:

  • Use descriptive, domain-specific names (user-events, order-updates, payment-transactions)
  • Separate concerns: different event types should go to different topics
  • Consider using naming conventions with prefixes/suffixes for environments (prod-user-events, staging-user-events)
  • Align topic boundaries with bounded contexts in DDD

⚖️ Partitioning Strategy

Partitions enable parallelism and determine throughput capacity.

Guidelines:

  1. Start with enough partitions for peak throughput: Calculate based on expected producer/consumer throughput per partition
  2. Align with consumer group size: Max parallel consumers = number of partitions
  3. Consider key distribution: Choose partition keys that evenly distribute load
  4. Plan for growth: It's easier to add partitions than to rebalance poorly distributed data
  5. Monitor partition skew: Uneven distribution creates hot partitions

Anti-Pattern: Creating too many partitions (e.g., 100+ for low-throughput topics) increases overhead and recovery time.

👥 Consumer Groups and Rebalancing

🔄 How Consumer Groups Work

Consumer groups enable scalable consumption where multiple consumers share the workload of processing topic partitions.

Key Points:

  • Each partition is consumed by exactly one consumer in a group
  • Adding consumers up to partition count increases processing parallelism
  • Rebalancing occurs when group membership changes (consumers join/leave)
  • During rebalance, consumers temporarily stop processing

⚙️ Minimizing Rebalance Impact

  1. Use static membership (Kafka 2.3+): Assign persistent consumer IDs to reduce shuffling
  2. Optimize session.timeout.ms: Balance between failure detection and unnecessary rebalances
  3. Avoid frequent restarts: Graceful shutdowns trigger rebalances; rolling updates are better
  4. Use proper heartbeat settings: Ensure consumers can send heartbeats within session timeout

🎯 Exactly-Once Semantics (EOS)

🔄 Achieving Exactly-Once Processing

Kafka provides exactly-once semantics through idempotent producers and transactional APIs.

Implementation Steps:

  1. Enable idempotent producers: Set enable.idempotence=true (default in newer versions)
  2. Use transactions for multi-topic writes: Producer sends data to multiple topics atomically
  3. Consume with read_committed isolation level: Consumers only see committed transactions
  4. Design idempotent consumers: Handle duplicate messages gracefully as fallback

Note: EOS has performance implications due to additional coordination overhead.

📜 Schema Registry and Avro

📊 Why Use Schema Registry?

Schema Registry provides a centralized schema store and enforces compatibility rules.

Benefits:

  • Data governance: Prevent incompatible schema changes
  • Evolution safety: Backward/forward compatibility checks
  • Serialization efficiency: Avro is compact and fast
  • Documentation: Schemas serve as API contracts

🔄 Schema Compatibility Types

  1. BACKWARD: New schema can read old data (consumers can upgrade first)
  2. FORWARD: Old schema can read new data (producers can upgrade first)
  3. FULL: Both backward and forward compatible
  4. NONE: No compatibility checks

Best Practice: Use BACKWARD or FULL compatibility for most use cases.

⏳ Retention Policies and Compaction

📦 Retention Policies

Control how long Kafka keeps data.

Types:

  • Time-based: log.retention.hours (default 168 hours = 7 days)
  • Size-based: log.retention.bytes per topic
  • Delete vs Compact: cleanup.policy (delete or compact)

🗜️ Log Compaction

Keeps only the latest value for each key, useful for:

  • Event sourcing: Rebuild state from events
  • Configuration snapshots: Latest config per service
  • Entity state: Current user profile, inventory counts

Compaction Trigger: When log segment reaches min.cleanable.dirty.ratio threshold

📊 Monitoring: Lag, Throughput, and Errors

📈 Key Metrics to Monitor

Consumer Lag:

  • Difference between current offset and end offset
  • Indicates if consumers can keep up with producers
  • Alert when lag grows steadily

Throughput:

  • Messages/sec in/out
  • Bytes/sec in/out
  • Request rates

Error Rates:

  • Failed produce requests
  • Consumer exceptions
  • Connection failures

🛠️ Monitoring Tools

  • Built-in: JMX metrics, kafka-consumer-groups.sh tool
  • Open Source: Prometheus + Grafana, Confluent Control Center
  • Managed Services: Cloud provider monitoring integrations

⚠️ Common Anti-Patterns

❌ Using Kafka as a Database

Problem: Storing data indefinitely expecting query capabilities Solution: Use Kafka for event streaming, move data to appropriate databases for querying

❌ Over-Partitioning

Problem: Too many partitions increase overhead (metadata, file handles, recovery time) Solution: Start with reasonable partition count (e.g., 3-6 per broker), scale based on throughput needs

❌ Ignoring Message Ordering Guarantees

Problem: Assuming global ordering across partitions Solution: Use partitioning keys for ordering within key groups, design consumers to handle out-of-order messages

❌ Not Handling Poison Pills

Problem: Bad messages causing consumer crashes and infinite restart loops Solution: Implement dead letter queues, poison pill handling, or skip mechanisms

⚔️ Kafka vs RabbitMQ vs Redis Streams

🆚 Feature Comparison

FeatureKafkaRabbitMQRedis Streams
ModelLog-based streamingTraditional message queueLog-based streaming
ThroughputVery High (MB/s)MediumHigh
PersistenceDisk-based (configurable)Disk/memoryMemory (with AOF)
OrderingPer-partitionPer-queue (FIFO)Per-stream
Consumer ModelConsumer groupsCompeting consumersConsumer groups
Retry/DLQManual implementationBuilt-inManual implementation
Best ForEvent sourcing, high-throughput pipelinesComplex routing, task queuesSimple streaming, low-latency

🎯 When to Choose Each

Choose Kafka when:

  • Building event-driven architectures
  • Needing high throughput and durability
  • Implementing event sourcing or CQRS
  • Long-term data retention is needed

Choose RabbitMQ when:

  • Complex routing is required (topics, headers)
  • Need sophisticated queueing patterns
  • Lower latency is critical
  • Polyglot protocol support is important

Choose Redis Streams when:

  • Already using Redis in infrastructure
  • Need simple streaming with consumer groups
  • Can accept memory-limited durability
  • Low-latency processing is priority

🏁 Conclusion

Using Kafka effectively requires understanding its distributed nature and embracing event streaming principles. By following the patterns outlined—proper topic design, thoughtful partitioning, consumer group management, and avoiding common anti-patterns—you can build robust, scalable event-driven systems.

Remember that Kafka excels as a high-throughput, durable event log, not as a general-purpose database or task queue. Align your usage with its strengths, and you'll unlock powerful capabilities for real-time data processing and microservices communication.


❓ Frequently Asked Questions

Q: How many partitions should I start with for a new topic?

A: Start with 3-6 partitions per broker as a baseline, then scale based on your throughput requirements. Monitor consumer lag and adjust as needed. Remember that you can always increase partitions later (but not decrease easily).

Q: Should I use keys in my Kafka messages?

A: Use keys when you need ordering guarantees for related events (e.g., all events for a specific user_id should be processed in order). Without keys, messages are distributed randomly across partitions, which provides better throughput but no ordering guarantees.

Q: How do I handle schema evolution safely?

A: Use Schema Registry with appropriate compatibility settings (BACKWARD or FULL). Always test consumer/producer compatibility before deploying schema changes. Consider using a canary release strategy for schema updates.

Q: What's the difference between delete and compact cleanup policies?

A: delete removes old messages based on time or size thresholds. compact retains only the latest value for each message key, effectively creating a snapshot of the latest state for each key.

Q: How can I reduce consumer rebalances?

A: Use static membership (Kafka 2.3+), optimize session.timeout.ms, avoid frequent consumer restarts, and ensure your consumers can process heartbeats within the session timeout period.

Q: Is exactly-once semantics worth the performance cost?

A: It depends on your use case. For financial transactions or other critical operations where duplicates could cause incorrect state, yes. For many event streaming use cases (metrics, logging, etc.), at-least-once with idempotent consumers is sufficient and performs better.

Q: When should I consider alternatives to Kafka?

A: Consider alternatives when you need: complex message routing (RabbitMQ), ultra-low latency with in-memory storage (Redis Streams), or simple task queues where Kafka's overhead isn't justified.

Follow us
All Rights Reserved
© 2011-2026
Progressive Innovation
LAB