SYMPTOM
- You have an on-prem or hybrid cluster
- Your APIs are no longer responding
- At your Mule EE log file (located at $MULE_HOME/log/mule_ee.log) you noticed the file is full of logs like this:
ERROR 2022-05-23 13:07:51,583 [agw-policy-polling.01] [processor: ; event: ] com.mulesoft.mule.runtime.gw.deployment.runnable.ApisRunnable: Unexpected error occurred. Reason: com.hazelcast.partition.NoDataMemberInClusterException: Target of invocation cannot be found! Partition owner is null but partitions can't be assigned since all nodes in the cluster are lite members.
- The only solution is to restart the Mule runtimes from all the cluster nodes
- If you enable the cluster verbose mode, you could notice that the node is not part of the cluster. Please follow this article to know how to enable cluster verbose mode: How to enable cluster verbose logging in Mule Runtime.
2022-05-24 13:54:54,021 DEBUG ? [hz.2.priority-generic-operation.thread-0] [10.100.100.77]:5701 [5953994] [3.12] Sending member list to the non-master nodes:
Members {size:1, ver:3} [
Member [10.100.102.41]:5701 - 808b7f8f-981d-4227-9349-ae4d509b56b7
]
In the example above, we are getting the logs from the cluster node A (10.100.100.77) and is not including itself as part of the cluster, just the node B (10.100.102.41).
CAUSE
A known hazelcast issue that allow a node to try to connect to itself.
SOLUTION
Hazelcast bug is addressed in latest version (5.1.x) but is not yet part of any MuleSoft product.
ALTERNATIVE SOLUTIONYou need to add the following cluster properties to avoid this issue:
- mule.cluster.tcpinboundport=xxxx
- mule.cluster.tcpoutboundport=yyyy
"xxxx" and "yyyy" must be replaced with available ports to manage incoming and outgoing connections
For information on how to do it, please check: