OS ip_set limit causes instances to become unreachable on public IP

Follow

This issue presents itself when a substantially large number of security groups exist in the cloud. The context below focuses on EDGE mode.

If you have unreachable public IPs on instances which were previously reachable (or you start a brand new instance using a proven-to-work image and it's unreachable after starting it), and you've confirmed that your security group (SG) for the instance should allow for the connectivity (eg, ICMP, TCP port 22), then follow these steps to see whether you're affected by this issue, and how to work around it.

1) Identify the node controller (NC) that the instance is running on.

2) Login to the NC where the instance in question is running.

3) Change to the Eucalyptus log dir:

cd /var/log/eucalyptus

4) Check the eucanetd.log file for error messages such as these:

[root@odc-f-21 eucalyptus]# grep ERROR eucanetd.log
2015-12-14 15:19:21 ERROR 000039528 main | could not complete update of security groups: check above log errors for details
2015-12-14 15:19:28 ERROR 000039528 ipt_system_restore | iptables-restore failed '//usr/lib/eucalyptus/euca_rootwrap iptables-restore -c < /tmp/ipt_file-GeWfXS': copying failed input file to '/tmp/euca_ipt_file_failed' for manual retry.
2015-12-14 15:19:28 ERROR 000039528 network_driver_implement | could not apply new rules: check above log errors for details
2015-12-14 15:19:28 ERROR 000039528 main | could not complete update of security groups: check above log errors for details
2015-12-14 15:19:34 ERROR 000039528 ipt_system_restore | iptables-restore failed '//usr/lib/eucalyptus/euca_rootwrap iptables-restore -c < /tmp/ipt_file-GeWfXS': copying failed input file to '/tmp/euca_ipt_file_failed' for manual retry.
2015-12-14 15:19:34 ERROR 000039528 network_driver_implement | could not apply new rules: check above log errors for details
2015-12-14 15:19:34 ERROR 000039528 main | could not complete update of security groups: check above log errors for details
2015-12-14 15:19:41 ERROR 000039528 ipt_system_restore | iptables-restore failed '//usr/lib/eucalyptus/euca_rootwrap iptables-restore -c < /tmp/ipt_file-GeWfXS': copying failed input file to '/tmp/euca_ipt_file_failed' for manual retry.
2015-12-14 15:19:41 ERROR 000039528 network_driver_implement | could not apply new rules: check above log errors for details
2015-12-14 15:19:41 ERROR 000039528 main | could not complete update of security groups: check above log errors for details

5) If you see any of the above error messages, in particular the ipt_system_restore "copying failed input file to" message, then you have most likely hit a particular OS limit, specifically the ip_set kernel module default limit.

 

Follow the steps below in order to increase the ip_set kernel module from the default limit. Note: You will not need to reboot the machine nor terminate any instances.

a) Run the following commands as root on the NC (valid for both CentOS 6.x and RHEL 6.x):

service eucanetd stop
/usr/sbin/eucanetd -F
iptables -F
modprobe -r xt_set
modprobe -r ip_set_hash_net
modprobe -r ip_set
echo 'options ip_set max_sets=1024' > /etc/modprobe.d/ip_set.conf
modprobe ip_set
modprobe ip_set_hash_net
modprobe xt_set
service eucanetd start

b) Test the reachability of the instances which are running on this NC, as they should now all be reachable on their public IPs (based on their SG rules, as appropriate). Be sure to confirm that the private IP is reachable, if the public IP is still not reachable, after you've deployed the above OS config change.

c) If you are still having an issue with reachability via the public IP, open a support ticket.

Have more questions? Submit a request

Comments

Powered by Zendesk