Troubleshooting an IPFS Cluster
4 May 2022IPFS Cluster
A pretty cool tool: cluster.ipfs.io/. It lets you create clusters of IPFS nodes that are orchestrated togethere to make sure content is available and replicated.
Note that it’s not a decentralized solution, there is no incentive to connect and replicate data on this cluster, it’s designed to be operated in a trusted environment (single organization for example) like Hadoop’s HDFS for example. I would see this like a “big IPFS node”.
Troubleshooting
We had an issue where the IPFS cluster was misconfigured. Nodes couldn’t joint the cluster, and a few github actions where failing because of this.
Internal Link: [[IPFS Distribution is Failing]]
The cluster was ipfs-websites.collab.ipfscluster.io
. At the time of the error, I didn’t know the login / password combination to try to connect to the cluster directly. But just having the address is enough to identify a few errors.
There is a page collab.ipfscluster.io that contains a few instructions about this cluster and other Protocol Labs’ cluster.
They share a command to follow the pinset from the cluster:
ipfs-cluster-follow ipfs-websites run --init ipfs-websites.collab.ipfscluster.io
This command printed 0 Peers
. A functioning cluster should show a few peers.
Pushing further, I tried to connect to the nodes:
curl https://ipfs.io/ipns/ipfs-websites.collab.ipfscluster.io -o cluster.json
for peer in $(cat cluster.json | jq -r '.cluster.peer_addresses[]'); do
echo $peer;
addr=$(echo $peer | sed -E 's;/dns4/([^/]*)/tcp/([^/]*)/.*;\1;')
ping -q -c 2 ${addr}
ipfs swarm connect $peer;
done
ping
would succeeds, but the ipfs swarm connect
call ends with the following error:
ipfs swarm connect NODE
Error: connect PEERID failure: failed to dial PEERID:
* [addr] dial addr: connect: connection refused
This is a sign that the cluster is down. It should end with something like:
ipfs swarm connect NODE
Error: connect PEERID failure: failed to dial PEER:
* [addr] failed to negotiate security protocol: incoming message was too large
This is the error you get when you miss the correct password / auth configuration (which we expect here).
These where the sign that the cluster was misconfigured and had to be fixed.
Laurent Senta
I wrote software for large distributed systems, web applications, and even robots.
These days I focus on making developers, creators, and humans more productive through IPDX.co.