2025-03-07 Update: Recently, Xiaohongshu's gateway practices follow many of the principles below. —— [《Xiaohongshu Launches Self-Developed Rust High-Performance Layer 7 Gateway ROFF》](https://mp.weixin.qq.com/s/wnkYr4qKIFmh9E9H_XcwTA)

This article is divided into two parts:

1. [Performance Bottleneck Analysis Methods](#performance-bottleneck-analysis-methods)
2. [Hands-on Analysis of APISIX, Kong, and Nginx](#hands-on-analysis)

## Performance Bottleneck Analysis Methods

### Performance Bottleneck Analysis Methods (The Real Deal)

This section covers the actual methods.

#### **Monitoring Metrics Analysis**

- Latency Metrics: Monitor request latency and integration latency
- Error Rate Metrics: Monitor 4XX and 5XX error rates to identify system failure points
- Throughput Metrics: Request count and requests per second (RPS)
- Cache Hit Metrics: Monitor CacheHitCount and CacheMissCount ratios

#### **System Resource Monitoring**

- CPU utilization analysis
- Memory consumption analysis
- Network I/O monitoring
- Disk I/O monitoring (especially for logs and cache)

#### **Request Flow Analysis**

- Request processing chain tracing
- Hot path identification
- Serial processing bottleneck discovery

### Infrastructure-Level Optimization

#### Hardware Resource Optimization

- Increase CPU cores and memory capacity
- Use SSD storage to improve I/O speed
- Deploy high-performance network cards to reduce network latency
- Properly evaluate and adjust container resource limits (in Kubernetes environments)

#### Network Optimization

- Optimize network topology structure
- Reduce network hops
- Use CDN to accelerate static content
- Implement DNS optimization
- Deploy gateways and backend services in the same network zone

#### Operating System Tuning

- Adjust TCP/IP stack parameters (such as increasing connection queues)
- Optimize file descriptor limits
- Adjust kernel parameters (such as somaxconn, tcp_fin_timeout)
- Optimize client thread count in Linux environments

### Gateway Configuration-Level Optimization

#### Connection Pool Optimization

- Configure appropriate database connection pool sizes (recommended to match expected client count) —— [Oracle Database Administrator Essentials](https://docs.oracle.com/cd/E55956_01/doc.11123/administrator_guide/content/admin_performance.html)
- Adjust backend service connection pool parameters
- Set connection timeout and retry strategies
- Implement connection keep-alive mechanisms

#### HTTP Optimization

- Enable HTTP keep-alive to reuse TCP connections —— [Oracle Performance Practices](https://docs.oracle.com/cd/E55956_01/doc.11123/administrator_guide/content/admin_performance.html)
- Configure appropriate chunked encoding strategies
- Adjust HTTP header size limits
- Set reasonable request/response timeout values

#### Caching Strategy Optimization

- Implement response caching to reduce backend calls —— [Amazon API Optimization Practices](https://docs.aws.amazon.com/apigateway/latest/developerguide/rest-api-optimize.html)
- Configure cache key strategies and TTL (Time To Live)
- Implement multi-level caching strategies
- Optimize cache validation and invalidation strategies

### Request Processing Optimization

#### Concurrent Processing Optimization

- Increase worker thread count
- Implement asynchronous processing patterns
- Optimize thread pool configuration
- Use event-driven architecture for high-concurrency request handling

#### Load Balancing Strategies

- Implement intelligent load balancing algorithms (weighted round-robin, least connections, etc.)
- Configure dynamic load balancing strategies
- Implement service health checks and automatic failover
- Adjust weights based on backend service capacity

#### Routing Optimization

- Optimize routing lookup algorithms
- Implement routing caching
- Configure routing warm-up strategies
- Implement intelligent routing based on traffic characteristics

### Data Processing Optimization

#### Message Processing Optimization

- Configure "spill to disk" strategies for large messages (e.g., set >4MB messages to write to disk) [3](https://docs.oracle.com/cd/E55956_01/doc.11123/administrator_guide/content/admin_performance.html)
- Implement request/response compression [1](https://docs.aws.amazon.com/apigateway/latest/developerguide/rest-api-optimize.html)
- Optimize serialization/deserialization processes
- Reduce unnecessary data transformations

#### Protocol Optimization

- Use HTTP/2 to reduce latency and improve parallel processing capability
- Use WebSocket and other long-connection protocols in appropriate scenarios
- Consider using gRPC to improve microservice communication efficiency
- Implement protocol upgrade strategies

### Monitoring and Logging Optimization

#### Log Optimization

- Reduce unnecessary trace information (set production environment to ERROR or FATAL level) [3](https://docs.oracle.com/cd/E55956_01/doc.11123/administrator_guide/content/admin_performance.html)
- Disable or reduce access log recording [3](https://docs.oracle.com/cd/E55956_01/doc.11123/administrator_guide/content/admin_performance.html)
- Disable transaction logs to reduce disk I/O burden [3](https://docs.oracle.com/cd/E55956_01/doc.11123/administrator_guide/content/admin_performance.html)
- Implement asynchronous log writing strategies

#### Monitoring Optimization

- Disable real-time monitoring to reduce overhead [3](https://docs.oracle.com/cd/E55956_01/doc.11123/administrator_guide/content/admin_performance.html)
- Disable or reduce traffic monitoring [3](https://docs.oracle.com/cd/E55956_01/doc.11123/administrator_guide/content/admin_performance.html)
- Configure reasonable monitoring sampling rates
- Implement intelligent alerting thresholds to avoid excessive monitoring system load

### Security Performance Optimization

#### Authentication Optimization

- Cache authentication results
- Use lightweight token validation
- Implement tiered authentication strategies
- Optimize JWT processing workflows

#### SSL/TLS Optimization

- Use session reuse to reduce handshake overhead
- Configure OCSP stapling
- Implement TLS connection pools
- Select efficient cipher suites

### Advanced Optimization Strategies

#### Circuit Breaking and Rate Limiting

- Implement request rate limiting to protect backend systems
- Configure circuit breakers to prevent system overload
- Implement backoff algorithms for retry handling
- Traffic shaping to optimize request distribution

#### Service Mesh Integration

- Integrate with service meshes like Istio/Envoy
- Delegate some gateway functions to sidecar proxies
- Implement collaborative strategies between gateways and service meshes
- Leverage traffic management capabilities provided by service meshes

#### Architectural Optimization

- Consider multi-tier gateway architecture (edge gateway + internal gateway)
- Implement edge computing to reduce latency
- Domain-driven API gateway design
- Gateway sharding based on traffic characteristics

By systematically analyzing performance bottlenecks in the above aspects and applying corresponding optimization techniques, gateway performance, throughput, and reliability can be significantly improved. The key is to select the most suitable combination of optimization strategies based on actual system characteristics and business requirements, and maintain high system performance through continuous monitoring and tuning.

## Hands-on Analysis

### Nginx Gateway

#### Main Performance Bottlenecks

1. **Connection Handling Capacity Bottlenecks**
   - Default worker process configuration may not match server CPU cores
   - Connection pool size limits cause connection rejection under high concurrency
   - File descriptor limits cause "too many open files" errors

2. **Configuration Complexity Bottlenecks**
   - Static configuration files require manual modification and reloading
   - Large-scale routing rules make configuration maintenance difficult
   - Configuration changes require reloading, potentially causing request interruptions

3. **SSL Processing Bottlenecks**
   - High SSL handshake overhead, CPU usage spikes under high concurrency
   - Inefficient key exchange algorithms
   - Improper session cache configuration causes repeated handshakes

#### Optimization Methods

1. **Worker Process and Connection Optimization**
   - Set `worker_processes` to match CPU core count
   - Increase `worker_connections` value (typically set to 4096 or higher)
   - Use `worker_cpu_affinity` to bind worker processes to specific CPU cores
   - Adjust system file descriptor limits (ulimit -n)

2. **Event Processing Optimization**
   - Enable `multi_accept` and `accept_mutex`
   - Use `epoll` event handling model (on Linux systems)
   - Adjust `worker_aio_requests` to improve async I/O performance

3. **HTTP Optimization**
   - Configure `keepalive_timeout` and `keepalive_requests` parameters
   - Enable `sendfile`, `tcp_nopush`, and `tcp_nodelay` options
   - Implement `gzip` compression to reduce transmitted data
   - Set `client_body_buffer_size` and `client_max_body_size` to limit request size

4. **SSL Performance Optimization**
   - Enable `ssl_session_cache shared` to improve session reuse rate
   - Configure OCSP stapling to reduce handshake latency
   - Use ECC certificates to reduce computational overhead
   - Prioritize high-performance cipher suites (like AES-GCM)

### Kong Gateway

A reliable and well-established gateway.

#### Main Performance Bottlenecks

1. **Database Dependency Bottlenecks**
   - PostgreSQL/Cassandra database queries become performance bottlenecks
   - Database pressure increases during configuration changes
   - Database consistency challenges in distributed deployments

2. **Lua Script Processing Bottlenecks**
   - Overly long plugin execution chains increase request latency [1](https://www.f5.com/company/blog/nginx/nginx-controller-api-management-module-vs-kong-performance-comparison)
   - Improper Lua VM memory usage causes performance degradation
   - JIT compilation limitations affect dynamic script performance

3. **JWT Validation Bottlenecks**
   - High JWT validation processing overhead, noticeable in high-percentile latency [2](https://www.f5.com/company/blog/nginx/benchmarking-api-management-solutions-nginx-kong-amazon-real-time-apis)
   - Kong's 99.99 percentile latency can reach 3 times that of NGINX [2](https://www.f5.com/company/blog/nginx/benchmarking-api-management-solutions-nginx-kong-amazon-real-time-apis)

#### Optimization Methods

1. **Database Optimization**
   - Use DB-less mode to reduce database dependencies
   - Increase database connection pool size (pg_pool parameter in kong.conf)
   - Implement database read-write separation (master-slave architecture)
   - Consider using declarative configuration instead of database storage

2. **Plugin Chain Optimization**
   - Enable only necessary plugins to reduce processing chain length
   - Adjust plugin execution order (lightweight, high-frequency plugins first)
   - Configure independent caches for plugins (like Redis cache for rate-limiting plugin)
   - Monitor and optimize long-running plugins

3. **Cache Optimization**
   - Configure `lua_shared_dict` cache size
   - Adjust plugin-level cache TTL
   - Use external Redis cache to improve hit rates
   - Configure entity cache to reduce database queries

4. **Connection Pool Optimization**
   - Adjust upstream_keepalive parameter (typically 100-200)
   - Increase nginx_upstream_keepalive_timeout value
   - Set reasonable nginx_upstream_keepalive_requests value
   - Increase nginx_http_client_body_buffer_size for large request bodies

### APISIX Gateway

#### Main Performance Bottlenecks

1. **etcd Dependency Bottlenecks**
   - etcd cluster stability affects gateway configuration propagation
   - Frequent configuration changes increase etcd pressure
   - etcd read-write latency affects dynamic routing updates

2. **Route Matching Bottlenecks**
   - Large numbers of fine-grained routes increase matching latency
   - Complex regex routes reduce matching efficiency
   - Untimely route cache updates cause routing errors

#### Optimization Methods

1. **etcd Optimization**
   - Build highly available etcd clusters
   - Optimize etcd configuration (like setting reasonable heartbeat intervals)
   - Implement etcd sharding to reduce single-node pressure
   - Increase config_center.timeout parameter value (default 30 seconds)

2. **Route Optimization**
   - Use prefix matching instead of full regex
   - Increase route cache TTL
   - Reduce route rule complexity, split overly complex rules
   - Use domain or hostname pre-filtering

### Envoy Gateway

#### Main Performance Bottlenecks

1. **xDS Configuration Update Bottlenecks**
   - Dynamic configuration updates cause resource reallocation overhead
   - Control Plane communication latency affects configuration delivery
   - Large numbers of listener and cluster configurations cause high memory usage

2. **Filter Chain Processing Bottlenecks**
   - Long HTTP filter chains increase processing latency
   - Complex filter logic causes high CPU usage
   - Lua filters are less efficient than native filters

#### Optimization Methods

1. **xDS Configuration Optimization**
   - Implement incremental xDS to reduce configuration update overhead
   - Optimize Control Plane communication (use gRPC streams instead of polling)
   - Set reasonable configuration cache TTL
   - Use Aggregated Discovery Service (ADS) to ensure configuration consistency

2. **Filter Optimization**
   - Reduce filter chain length, keep only necessary filters
   - Prioritize native C++ filters over Lua or WASM
   - Adjust filter execution order (high-frequency filters first)
   - Enable statistical monitoring for critical filters

### Performance Comparison and Selection Recommendations for Different Gateways

#### Performance Comparison

- In standard API call tests, NGINX API Management Module performance can be 2x better than Kong [Data Support](https://www.f5.com/company/blog/nginx/nginx-controller-api-management-module-vs-kong-performance-comparison)
- In terms of latency, NGINX adds 20-30% lower latency than Kong [Data Support](https://www.f5.com/company/blog/nginx/nginx-controller-api-management-module-vs-kong-performance-comparison)
- In CPU efficiency, NGINX is about 40% more efficient than Kong [Data Support](https://www.f5.com/company/blog/nginx/nginx-controller-api-management-module-vs-kong-performance-comparison)
- In JWT validation scenarios, NGINX can handle 2x more API calls than Kong [Data Support](https://www.f5.com/company/blog/nginx/benchmarking-api-management-solutions-nginx-kong-amazon-real-time-apis)

#### Selection Recommendations

1. **NGINX Suitable Scenarios**:
   - Stable API gateway needs with static routing configuration
   - Scenarios prioritizing performance and low latency
   - Lightweight gateway needs in resource-constrained environments
   - Primarily providing reverse proxy and load balancing functions

2. **Kong Suitable Scenarios**:
   - Need rich API management features (authentication, rate limiting, transformation, etc.)
   - Teams pursuing development convenience (RESTful API configuration)
   - Microservice architectures with dynamic routing needs
   - Can accept some performance loss for feature richness

3. **APISIX Suitable Scenarios**:
   - Teams pursuing balance between dynamic routing and high performance
   - Cloud-native architectures with service discovery integration needs
   - Scenarios requiring fine-grained traffic control

4. **Envoy Suitable Scenarios**:
   - Kubernetes/Istio service mesh infrastructure
   - Modern cloud architectures requiring advanced observability
   - DevOps teams pursuing programmability and extensibility

By understanding the performance bottleneck characteristics and optimization methods of different gateways, you can choose the most suitable gateway type based on your business characteristics and technology stack, and implement targeted performance optimization to achieve the best balance of gateway performance and functionality.

## Summary

Gateway performance optimization is systematic work that requires analysis and optimization from multiple levels. Through the analysis methods, optimization techniques, and code examples provided above, you can implement targeted optimization for different performance bottlenecks. The key points are:

1. Establish comprehensive performance monitoring systems to detect performance bottlenecks promptly [1](https://learn.microsoft.com/en-us/data-integration/gateway/service-gateway-performance)
2. Adopt multi-level caching strategies to reduce redundant calculations and network requests
3. Optimize connection pool management to improve connection reuse efficiency
4. Implement efficient load balancing algorithms for intelligent request distribution
5. Use regular benchmark testing to continuously evaluate and optimize performance

For gateways in cloud-native environments, you can also consider leveraging Kubernetes' auto-scaling capabilities to dynamically adjust gateway instance counts in response to traffic changes. 

As a middleware developer, we analyze gateway performance bottlenecks and optimization techniques from a computer system perspective. Gateways, as key components in microservices architecture, directly impact the response capability and scalability of the entire system...