Nodes are not started in the simulation after 50 node are up

Before posting something, READ the changelog, WATCH the videos, howto and provide following:
Your install is: Bare metal, ESXi, what CPU model, RAM, HD, what EVE version you have, output of the uname -a and any other info that might help us faster.

Moderator: mike

fluffy_router
Posts: 9
Joined: Thu Jan 07, 2021 1:17 am

Nodes are not started in the simulation after 50 node are up

Post by fluffy_router » Thu Jan 07, 2021 11:03 pm

I took advice from the thread and first would describe my setup.
I have quite powerful server which I'm using for my eve-ng.

Hardware : 40 CPUs x Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz, 512GB RAM, 2TB HDD Raid 1, 2 physical NICs
EVE-NG Running on - Inside ESXI (v6.7) (VM has allocated 32 vCPUs, 128Gb RAM, 500Gb HDD)
VT-X - Enabled
EVE-NG is community version.

In most of the labs I'm able to start nodes normally. I do have different images and all of them are verified and booting properly. When I tested IOL images I was able to successfully run 64 Nodes in simulation which is the upper limit of community version. I use EVE-NG for my preparation for CCIE Enterprise preparation. I have found a topology provided by Data Knox. It sims to be quite good to play and I wanted to take advantage of it. It heavily leverages IOL images, CSR and Viptela SDWAN images.
The issue that I'm facing is that when I'm trying to run all nodes after 51 of them are up, the other nodes doesn't start. It is always 51 node. Might be IOL or mix of IOL and QEMU but it is 51 node according to the status.
I thought that it might be related to some limit so I tried to shutdown different node and start the one that didn't start. This doesn't help however. New nodes doesn't start after this.

I checked the unl_wraper log and the only thing I can see there is that node has started:
INFO: starting /opt/unetlab/wrappers/iol_wrapper -T 0 -D 64 -t "Edge1" -F /opt/unetlab/addons/iol/bin/i86bi_LinuxL2-AdvEnterpriseK9-M_152_May_2018.bin -d 0 -e 1 -s 0 -- -n 1024 -q -m 1024 > /opt/unetlab/tmp/0/a507c74a-55bf-4f66-96b4-25fa3833704a/64/wrapper.txt 2>&1 &

I don't see anything suspicious in other log files.

If I create new lab and simply add multiple nodes 64 nodes are running without any issue.

My question is how can I further troubleshoot this? Is there any sort of debug which might help me to understand why nodes doesn't run? Could it be some sort of limitation within the topology itself?

I'm attaching the topology as it is free to use.
You do not have the required permissions to view the files attached to this post.

fluffy_router
Posts: 9
Joined: Thu Jan 07, 2021 1:17 am

Re: Nodes are not started in the simulation after 50 node are up

Post by fluffy_router » Fri Jan 08, 2021 1:22 am

I did some additional testing and it confuses me even more. It sims like there is some sort of relationship between different hosts. When some hosts are running, others are not able to start.

Whenever I'm starting any node I can see this message in the log:
Jan 08 02:18:35 WARNING: Attribute ignored, invalid node_image (40006).
Jan 08 02:18:35 WARNING: Node has no valid image (40025).

Disregards of this log some hosts are starting and some are not.

fluffy_router
Posts: 9
Joined: Thu Jan 07, 2021 1:17 am

Re: Nodes are not started in the simulation after 50 node are up

Post by fluffy_router » Fri Jan 08, 2021 1:33 am

I checked the wrapper.txt of the node that doesn't want to start. This is interesting. I can see some errors related to AF_UNIX. Not sure if that's some sort of socket or something similar.

tail: /opt/unetlab/tmp/0/eb45225d-3a77-43c5-b50c-8e3301999781/64/wrapper.txt: file truncated
8/0 1:24:32.3 INF Tennant_id = 0
8/0 1:24:32.3 INF Device_id = 64
8/0 1:24:32.3 INF NETMAP file created.
8/0 1:24:32.3 INF TS configured.
8/0 1:24:32.3 INF TAP interface configured (s=8, n=vunl0_64_0).
8/0 1:24:32.3 INF TAP interface configured (s=10, n=vunl0_64_16).
8/0 1:24:32.3 INF TAP interface configured (s=12, n=vunl0_64_32).
8/0 1:24:32.3 INF TAP interface configured (s=14, n=vunl0_64_48).
8/0 1:24:32.3 INF Adding subprocess stdout descriptor (5).
8/0 1:24:32.3 INF Adding telnet socket descriptor (7).
8/0 1:24:32.3 INF Adding TAP interface descriptor (8).
8/0 1:24:32.3 INF Adding TAP interface descriptor (10).
8/0 1:24:32.3 INF Adding TAP interface descriptor (12).
8/0 1:24:32.3 INF Adding TAP interface descriptor (14).
8/0 1:24:35.3 ERR Error while connecting local AF_UNIX: No such file or directory (2)
8/0 1:24:35.3 ERR Cannot listen at AF_UNIX (0). ERR: Cannot open AF_UNIX sockets (2).
8/0 1:24:35.3 ERR Failed to create AF_UNIX socket file (2).
8/0 1:24:35.3 INF Caught SIGTERM, killing child.
8/0 1:24:35.3 INF Child is no more running.

fluffy_router
Posts: 9
Joined: Thu Jan 07, 2021 1:17 am

Re: Nodes are not started in the simulation after 50 node are up

Post by fluffy_router » Fri Jan 08, 2021 1:34 am

Found a suggested solution on this link https://rejohn.cuar.es/blog/2015/08/24/ ... on-issues/
The suggested fix is to grant 777 permission to the /tmp directory. Tried this but still facing the issue.

fluffy_router
Posts: 9
Joined: Thu Jan 07, 2021 1:17 am

Re: Nodes are not started in the simulation after 50 node are up

Post by fluffy_router » Fri Jan 08, 2021 1:41 am

Did some googling and wasn't able to find anything meaningful. Some suggests it might be related to license but I doubt. I am able to run 64 IOL nodes if I create new LAB without any issue.
So gurus I'm running out of ideas. Any help is gratefully appreciated.

Uldis (UD)
Posts: 5084
Joined: Wed Mar 15, 2017 4:44 pm
Location: London
Contact:

Re: Nodes are not started in the simulation after 50 node are up

Post by Uldis (UD) » Fri Jan 08, 2021 9:00 am

fluffy_router wrote:
Fri Jan 08, 2021 1:22 am
I did some additional testing and it confuses me even more. It sims like there is some sort of relationship between different hosts. When some hosts are running, others are not able to start.

Whenever I'm starting any node I can see this message in the log:
Jan 08 02:18:35 WARNING: Attribute ignored, invalid node_image (40006).
Jan 08 02:18:35 WARNING: Node has no valid image (40025).

Disregards of this log some hosts are starting and some are not.
This on means, that you have not loaded image for particular node in topology, Empty node, no image

fluffy_router
Posts: 9
Joined: Thu Jan 07, 2021 1:17 am

Re: Nodes are not started in the simulation after 50 node are up

Post by fluffy_router » Fri Jan 08, 2021 9:48 am

This on means, that you have not loaded image for particular node in topology, Empty node, no image
Ok, this is clear. This doesn't explain however why the same node is sometimes starting and sometimes not. 2 nodes without image might be vedge but I don't care about them right now. What's really pissing me off is why not all IOL nodes are starting.

Uldis (UD)
Posts: 5084
Joined: Wed Mar 15, 2017 4:44 pm
Location: London
Contact:

Re: Nodes are not started in the simulation after 50 node are up

Post by Uldis (UD) » Fri Jan 08, 2021 5:49 pm

EVE Community has limitation of 53 IOL nodes per lab
and knox is using EVE Pro.
This one was amended for community
Original is on EVE Pro
You do not have the required permissions to view the files attached to this post.

fluffy_router
Posts: 9
Joined: Thu Jan 07, 2021 1:17 am

Re: Nodes are not started in the simulation after 50 node are up

Post by fluffy_router » Fri Jan 08, 2021 10:22 pm

Hey @Uldis. Thanks for looking on this.
EVE Community has limitation of 53 IOL nodes per lab
Could you please let me know where you have these numbers from?

When I create a new lab and just adding multiple IOL nodes there I'm continuously able to run a bit more than 60. I started twice and I got 61 and 62 nodes running. May be I'm missing something but the only limitation document I saw is 64 nodes per lab simulation.
You do not have the required permissions to view the files attached to this post.

Uldis (UD)
Posts: 5084
Joined: Wed Mar 15, 2017 4:44 pm
Location: London
Contact:

Re: Nodes are not started in the simulation after 50 node are up

Post by Uldis (UD) » Sat Jan 09, 2021 2:50 am

mistype it was 63 on eve Community but IOL nodes...
moix with qemu nodes must be more

Post Reply