Thứ Tư, 2 tháng 1, 2019

Tuning FreeBSD to serve 100-200 thousands of connections

I also use nginx as reverse-proxy and load balancer in my project.

mbuf clusters

FreeBSD stores the network data in the mbuf clusters 2Kb each, but only 1500B are used in each cluster (the size of the Ethernet packet)

mbufs

For each mbuf cluster there is “mbuf” structure needed, which have 256B in size and used to organize mbuf clusters in chains. There’s possibility to store some additional useful 100B data into the mbuf, but it is not always used.
If server have the RAM of 1Gb or more 25 thousands of mbuf clusters will be created by default but it is not enough in some cases.
When there’s no any free mbuf clusters available FreeBSD enters the zonelimit state and stops to answer to any network requests. You can see it as the `zoneli` state in the output of the `top` command.
To fix this problem the only solution is to log in through the local console and reboot the system. It is impossible to kill the process in `zoneli` state. This problem is also actual for Linux 2.6.x but even local console will not work in this state for Linux.
There is the patch that fixes the problem, it returns ENOBUFS error, which indicates entering the `zoneli` state and the program may close some connections when receives the error. Unfortunately this patch have not been merged into FreeBSD yet.
The state of used mbuf clusters can be checked by the following command:
> netstat -m
1/1421/1425 mbufs in use (current/cache/total)
0/614/614/25600 mbufs clusters in use (current/cache/total/max)
You can increase quantity of the mbufs clusters through the kern.ipc.nmbclusters parameter:
> sysctl kern.ipc.nmbclusters=65536
For earlier versions of FreeBSD mbuf clusters can be configured only in boot time:
/boot/loader.conf:
kern.ipc.mbclusters=65536
25000 mbuf clusters takes bout a 50Mb in the memory, 32000 – 74Mb, 65000 – 144 Mb (raises by the power of 2). 65000 is the boundary value and I can’t recommend to exceed it without increasing address space of the kernel first.

Increasing the amount of memory available for kernel

The default space for the kernel in memory is 1Gb for i386 architecture. To set it to 2Gb specify the following line in the kernel configuration file:
options KVA_PAGES=512
On the amd64 the the KVA is always 2Gb and there’s no possibility to increase it yet.
In addition to increasing the address space there’s the possibility to increase the limit of the physical memory available for kernel (320Mb by default). Let’s increase it to 1Gb:
/boot/loader.conf:
vm.kmem_size=1G
And reserve 275Mb for mbuf cluster from that space:
sysctl kern.ipc.nmbclusters=262144

Establishing the connection. syncache and syncookies

There’s approximately 100 bytes needed to serve one single connection.
Approximatelly 100 bytes space is used for single unfinished connection in syncache.
There’s possibility to store information about 15000 connections in memory. Approximately.
Snyncache parameters can bee seen by “sysctl net.inet.tcp.syncache” command (read-only).
Syncache parameters can be changed only during boot time:
/boot/loader.conf:
net.inet.tcp.syncache.hashsize=1024
net.inet.tcp.syncache.bucketlimit=100
when the new connection does not fit into overfull syncache FreeBSD enters the `syncookies` state (TCP SYN cookies). This possibility is enabled with:
sysctl net.inet.tcp.syncookies=1
The syncache population and the syncookies stats can be seen with `ntestat -s -p tcp` command.
When the connection is accepted it comes to the “listen socket queue”
Their’s stats can be seen with the `netstat -Lan` command.
Inreasing of the queue is possible with the `sysctl kern.ipc.somaxconn=4096` command
Whan the connection is accepted FreeBSD creates the sockets structures.
To increase the limit of the open sockets:
sysctl kern.ipc.maxsockets=204800
In earlier versions:
/boot/loader.conf:
kern.ipc.maxsockets=204800
The current state can be seen with the following command:
> vmstat -z

tcb hash

If the server processes several tens of thousands connections the tcb hash allows to detect the target connection for each incoming tcp packet quickly.
The tcb hash is 512 bytes by default.
The current size can be seen with:
sysctl net.inet.tcp.tcbhashsize
It is changeable in the boot time:
/boot/loader.conf:|
sysctl net.inet.tcp.tcbhashsize=4096

Files

Applicatios are working not with the sockets but with files. And there’s file descriptor needed for each socket because of that. To increase:
sysctl kern.maxfiles=204800
sysctl kern.maxfilesperproc=200000
These options can be changed on the live system but they will not affect already running processes. nginx have the ability to change the open files limit on the fly:
nginx.conf:
worker_limit_nofile 200000;
events {
worker_connections 200000;
}

receive buffers

Buffers for incoming data. 64Kb by default, if there’s no large uploads can be decreased to 8Kb (decreases the probability of overflow during a DDoS attack):
sysctl net.inet.tcp.recvspace=8192
For nginx:
nginx.conf:
listen 80 default rcvbuf=8k;

send buffers

Buffers for outgoing data. 32K by default. If data have a small size usually or there’s a lack of mbuf clusters it may be decreased:
sysctl net.inet.tcp.sendspace=16384
For nginx:
nginx.conf:
listen 80 default sendbuf=16k;
In the case when server has written some data to the socket but the client do not want to receive it the data will live in the kernel for several minutes even after the connection will be closed by timeout. Nginx have the option to erase all data after the timeout:
nginx.conf:
reset_timedout_connections on;

sendfile

Another way to save some mbuf clusters is the sendfile. It uses the kernel file buffers memory to send the data to the network interface without any intermediate buffers usage.
To enable in nginx:
nginx.conf:
sendfile on;
(you should explicitly switch it off if you’re sending files from the partition mounted via smbfs or cifs – ReRePi)
On the i386 platform with 1Gb and more memory 6656 sendfile buffers will be allocated which is usually enough. On the amd64 platform more optimal implementation is used and there’s no need in sendbufs at all.
On the sendbuf overflow the process stucks in the `sfbufa` state, but things turns ok after the buffer size is increased:
/boot/loader.conf:
kern.ipc.nsfbufs=10240

TIME_WAIT

After the connection was closed the socket enters the TIME_WAIT state. In this state it can live for 60 seconds by default. This time can be changed with sysctl (in milliseconds divided by 2. 2×30000 MSL = 60 seconds):
sysctl net.inet.tcp.msl=30000

TCP/IP ports

Outgoing connection are bind to the ports from the 49152 – 65535 range (16 thousands). It is better to be increased (1024 – 65535):
sysctl net.inet.ip.portrange.first=1024
sysctl net.inet.ip.portrange.last=65535
To use ports in natural order instead of random (to make the second connection for the same port impossible before TIME_WAIT):
sysctl net.inet.ip.portrange.randomized=0
In FreeBSD 6.2 the possibility to not create TIME_WAIT state for localhost connections was added:
sysctl net.inet.tcp.nolocaltimewait=1

Không có nhận xét nào:

Đăng nhận xét