Imaging Worker does not work when "registration" net is isolated from "public" net

Follow

 

Problem:

Technically, we have three possible problems:

  1. Incorrect (internal) DNS server IP supplied via user-data. The solution is probably to return the correct, external, IP for the DNS server in dns_server= value of user-data. We suspect the same change should be made to the ELB service user-data.
  2. Imaging service is internal and thus not accessible on the public network. The medium-term solution is to (temporarily) make the imaging service one of the user-facing services. (This may have security implications; we're looking into it.) Long-term solution we briefly discussed is to switch from a custom work-polling mechanism used by the Imaging Service to a standard mechanism, namely Simple Workflow Service, once that is available.
  3. Download manifests use registration-network IPs, which are not reachable from the Imaging Worker. Since download manifests are also used by Node Controllers, which are on registration network, but not necessarily the public network, the solution needs to be more sophisticated than switching to using public-network IPs. Either download manifest generation logic will have to take into account the consumer of the manifest or – better – download manifests would use DNS names, which should resolve into correct IPs (either on registration net or on public net), depending on the source of the query. The latter solution, however, requires a functioning DNS server (without delegation turned on necessarily) in Eucalyptus, which I am not 100% sure is enabled in all configurations (this is something to verify).

Workaround:
 
In all three cases, the workaround that should be used at this time is as follows:
 
In EDGE networking mode:
 
On each NC (node controller), you will want to re-map destination IP's as in the following:

iptables -t nat -A EUCA_NAT_PRE_POSTUSERHOOK -d 2.0.0.1/32 -j DNAT --to-destination 5.0.0.1

In the above example, all FE (front end) services are on 2.0.0.1/5.0.0.1, so only one rule is necessary. In environments where there are multiple OSG end-points, an appropriate example would be as follows:

iptables -t nat -A EUCA_NAT_PRE_POSTUSERHOOK -d 192.168.248.25/32 -j DNAT --to-destination 173.205.188.41
iptables -t nat -A EUCA_NAT_PRE_POSTUSERHOOK -d 192.168.248.27/32 -j DNAT --to-destination 173.205.188.43

in MANAGED networking mode: 

The workaround should be applied to the CC (this can be statically set in /etc/eucalyptus/iptables-preload). Example: 

-A PREROUTING -d 192.168.249.19/32 -j DNAT --to-detsination 173.205.188.38

in the above example, the VM's can't reach the registration network (192.168.249.0/24) but can reach the public-facing IPs (173.205.188.0/24). This example has the OSG & CLC on the same host, thus only one rule is necessary. 

We are currently investigating long-term solutions to these issues. The EUCA can be tracked here:
 
https://eucalyptus.atlassian.net/browse/EUCA-9435
 
Keywords: dns, imaging, worker, registration, network, isolation

Have more questions? Submit a request

Comments

Powered by Zendesk