Node Management

Introduction

Nodes are the physical servers that provide compute, storage, and networking resources in your cloud region. As an MSP administrator, you manage the full lifecycle of these nodes through the Region Management > Nodes page.

A node passes through several states during its lifecycle:

Status lifecycle

Candidate

A newly discovered server that has not yet joined the cluster.
Approved

A candidate that was accepted but has not finished activation.
Starting

The node is booting and initializing services.
Activating

The node is completing its activation process.
Active

The node is online and running workloads.
Failed

The node encountered an error and is not functioning correctly.
Fenced

The node was isolated from the cluster to protect data integrity.
Entering maintenance

Workloads are migrating off the node in preparation for maintenance.
In maintenance

The node is offline for servicing. No workloads run on it.
Unfencing

The node is rejoining the cluster after a fencing event.
Inadequate

The node has unsupported hardware and cannot join the cluster.
Rejected

An administrator declined this candidate node.
Out

The node has been removed from the cluster.

The Nodes page

The Nodes page shows all nodes and node candidates in a single table.

You can switch between two views:

List view

A table with sortable columns:

Name

The display name of the node.
Candidate

Indicates whether the node is still a candidate and has not yet joined the cluster.
IP

The management (access) IP address of the node.
CPU

The current CPU usage of the node.
vCPUs

The total number of virtual CPU cores available on the node.
Memory

The total physical memory installed on the node.
VM Memory (Total)

The total amount of memory allocated to virtual machines on this node.
VM Memory (Available)

The remaining memory available for additional virtual machines.
Network Rx

The amount of inbound network traffic received by the node.
Network Tx

The amount of outbound network traffic transmitted by the node.
Storage

The total storage capacity available on the node.
Uptime

The amount of time the node has been running since its last restart.
Status

The current lifecycle state of the node (for example, Active, In maintenance, or Failed).
Tags

Custom labels assigned to the node for identification and filtering.

CPU, Memory, and Network columns include sparkline charts that show recent usage trends.

Heatmap view

A visual map of nodes grouped and color-coded by resource or status.

You can group nodes by:

Status

Groups nodes according to their lifecycle state, such as Active or In maintenance.
CPU

Groups nodes based on CPU capacity or usage.
Memory

Groups nodes based on memory capacity or usage.
Storage

Groups nodes based on storage capacity.

You can color the heatmap by:

Status

Applies colors based on lifecycle state.
Used Memory

Colors nodes according to the amount of memory currently used.
Memory

Colors nodes based on total memory capacity.
Used CPU

Colors nodes according to current CPU utilization.
CPU

Colors nodes based on total CPU capacity.
Storage

Colors nodes based on storage capacity.
Throughput

Colors nodes according to network activity levels.

Built-in filters

Use filters to narrow the table to specific node states.

Active

Shows only nodes that are currently online and running workloads.
Candidate

Shows only nodes that were discovered but have not yet joined the cluster.
Inadequate

Shows only nodes that do not meet hardware requirements.
In Maintenance

Shows only nodes that are currently in maintenance mode.

Node candidates management

A node candidate is a physical server that has booted, completed hardware discovery, and appeared in the system but has not yet joined the cluster.

Before a candidate can run workloads, you must add it, review its hardware, and join it to the cluster.

Add a candidate node

Add a candidate node when you have a new physical server that you want to register with the cluster. This operation tells the system the server’s network address and credentials so it can connect and inspect the hardware.

Prerequisites:

The physical server is powered on and network-reachable.
The server’s access IP address, SSH username, and SSH password.
The MAC addresses of the server’s converged network interfaces.

Steps:

Go to Region Management > Nodes.
Select + Add Candidate in the top toolbar.
In the Add Candidate Node dialog, complete the Info step:
- Candidate Access IP: Enter the server’s IPv4 address.
- SSH User: Enter the SSH username.
- SSH Password: Enter the SSH password.
- Node Type: From the dropdown, select the node type:
  - Worker: Runs tenant workloads.
  - Control: Runs cluster management services.
Select Next to continue to the MAC Addresses step.
Enter one or more Converged MAC Addresses.

These identify the network interfaces the node will use for converged traffic.

Select Add to enter another Converged MAC Address.
To confirm adding the candidate node, select Finish.

The system displays “Candidate node creation is in progress.”

When complete, the system displays “Candidate node added successfully.”

Modify a candidate node

Modify a candidate node when you need to update SSH credentials or change the converged MAC addresses. You cannot change the access IP address after the candidate is created.

Steps:

Go to Region Management > Nodes.
In the nodes table, locate the candidate node.
In the top toolbar, select Modify Candidate.
In the Modify Candidate Node dialog, update:
- SSH User and SSH Password.
- Converged MAC Addresses (add or remove).
Candidate Access IP and Access MAC Addresses are read-only.
Select Save.

The system displays “Candidate node modification is in progress.”

When complete, the system displays “Candidate node modified successfully.”

Join a candidate node to the cluster

Join a candidate node when you are satisfied with its hardware configuration and want it to begin serving workloads. During this step, you choose the node’s role in the cluster.

Steps:

Go to Region Management > Nodes.
In the nodes table, locate the candidate node.
In the top toolbar, select Join.
In the Join Candidate Node dialog, review:
- IP: Displays the node’s access IP (read-only).
- Node Link Details:
  
  Expand sections to review interface details including Name, Address (MAC), Vendor, Duplex, Speed, Driver, Device Name, Carrier status (up/down), PCI Slot, Type, and Device ID.
In the Node Type dropdown, select a role:
- Control: Runs cluster management services.
- Worker: Runs tenant workloads.
Select Join and confirm.

The system displays “Candidate node join is in progress.”

When complete, the system displays “Candidate node joined successfully.”

Delete a candidate node

Delete a candidate node when the server is no longer needed or was added in error. This removes the candidate record from the system. It does not affect the physical server.

Steps:

Go to Region Management > Nodes.
Locate the candidate node.
In the top toolbar, select Delete Candidate.
In the Delete Node dialog, review the node’s IP address.
To confirm deletion, select Delete.

The system displays “Candidate node deleted successfully.”

Node management activities

After a node joins the cluster and becomes active, you manage it through a different set of operations.

Accept (approve) a candidate node

When a node appears with Candidate status, you can accept it to allow it to proceed toward activation.

Steps:

Go to Region Management > Nodes.
Locate the node with Candidate status.
In the top toolbar, select Accept.

The node status changes to Approved, then proceeds through Starting and Activating automatically.

Reject a candidate node

Reject a candidate node when you do not want it to join the cluster. This marks the node as rejected without deleting its record.

Steps:

Go to Region Management > Nodes.
Locate the node with Candidate status.
Open the actions menu and select Reject.

The node status changes to Rejected.

Rename a node

Rename a node to give it a descriptive name or update its description. This does not affect the node’s hostname or functionality.

Steps:

Go to Region Management > Nodes.
Select the node name to open the detail page, or open the actions menu and select Rename.
In the Rename Node dialog, enter:
- Name: Required.
- Description: Optional.
Select OK.

Put a node into maintenance mode

Put a node into maintenance mode when you need to perform hardware repairs, firmware updates, or other servicing. The system migrates running workloads off the node before it enters maintenance.

Prerequisites:

The cluster has more than one node.
No active VMs with GPU passthrough devices are running.
Remaining nodes have enough memory for migrating VMs.

Steps:

Go to Region Management > Nodes.
Locate a node with Active, Failed, or Fenced status.
Open the actions menu and select Maintenance.
In the Node Maintenance dialog, review the node’s ID and Name.
Select OK to confirm.

The status changes to Entering maintenance while VMs migrate, then to In maintenance when complete.

Force maintenance

Use force maintenance only as a last resort. It bypasses the normal workload migration process.

Severe warning: Forcing a node into maintenance mode prevents the system from completing some housekeeping actions and may result in full system failure and data loss.

Steps:

In the Node Maintenance dialog, select Force Maintenance.
Read the warning carefully.
Wait for the safety timer to complete.
Confirm the action.

The system displays: “Node [name] is being forcefully deactivated.”

Activate a node (exit maintenance mode)

Activate a node to return it to service after maintenance is complete.

Steps:

Go to Region Management > Nodes.
Locate the node with In maintenance status.
Open the actions menu and select Activate.

The node returns to Active status.

Remove a node from the cluster

Remove a node when you want to permanently take it out of the cluster. The node must be in maintenance mode before removal.

Prerequisites:

The node is in In maintenance status.
The cluster has more than one node.

Steps:

Go to Region Management > Nodes.
Locate the node.
Open the actions menu and select Remove.
Review the node’s IP address.
Select Remove.

Caution

If the node has unhealthy (degraded) data pools, removal may result in data loss. Contact support before proceeding.

Configure a node (open web installer)

Use Configure to open the node’s built-in installer interface in a new browser tab. This is useful for initial hardware setup or advanced configuration.

Steps:

Go to Region Management > Nodes.
Locate the node.
Open the actions menu and select Configure.

A new tab opens to: http://<node-ip>/installer.

Manage node tags

Tags help you organize and identify nodes by purpose, location, hardware type, or other categories.

Add a tag:

Go to Region Management > Nodes.
Select the node name to open the detail page.
Select Add tag.
Enter the tag name and confirm.

Remove a tag:

On the node detail page, locate the tag.
Select the remove action on the tag.

Create an alarm for a node

You can set up CloudWatch-style alarms to monitor node metrics and receive notifications when thresholds are exceeded.

Steps:

Go to Region Management > Nodes.
Locate the node.
Open the actions menu and select Create Alarm.
Complete the alarm creation form. See Creating an Alarm.

View node details

Select any node name in the table to open its detail page.

Tabs:

Overview tab

Displays the node name, description, status, IP address, uptime, memory, disk capacity, and CPU details.
Events tab

Shows a chronological log of events related to this node.
Node Links tab

Lists network interfaces with MAC address, admin state, operational state, MTU, vendor, model, and bond type. Expand rows to view cluster network interfaces with IP address, VLAN ID, and state.
Monitoring tab

Provides performance charts for CPU, memory, and network usage over time.
Disks tab

Displays physical disks attached to the node.
VMs tab

Lists virtual machines currently running on the node.
Services tab

Shows cluster services running on the node, including CPU and memory usage per service.
PCI Devices tab

Lists PCI passthrough devices including device name, PCI slot, vendor, device ID, enabled or blocked status, kernel driver, and associated VM if assigned.

Recommended best practices

Cluster sizing

Maintain at least two nodes in every cluster.
Keep enough spare memory so any single node can enter maintenance without capacity issues.

Before maintenance

Check the node’s VMs tab for running workloads.
Shut down or migrate GPU-attached VMs before maintenance.
Verify sufficient free memory on remaining nodes.

Monitoring

Use the Heatmap view to spot imbalances.
Review sparkline charts for sustained trends.
Set up alarms for critical metrics.
Monitor partition free space indicators such as /var/log, /mnt/data, and /mnt/containers.

Node removal

Always put a node into maintenance mode before removal.
If degraded data pools are detected, contact support.

Network verification

Verify converged MAC addresses when adding candidates.
Review the Node Links tab after a node joins.

Troubleshooting

A candidate node does not appear in the table

Possible causes:

Hardware discovery not complete.
Server is not network-reachable.
Table filter excludes Candidate view.

“Failed to add candidate node” error

Possible causes:

Candidate Access IP is incorrect or unreachable.
SSH User or SSH Password credentials are wrong.
SSH service is not running.

Resolution: Verify IP and credentials. Confirm SSH connectivity.

A node shows “Inadequate” status

Hardware does not meet cluster requirements. Review specifications on the Overview tab.

Maintenance mode is blocked by GPU VMs

Shut down affected GPU VMs before retrying.

Maintenance mode is blocked by insufficient memory

Shut down or migrate VMs to free cluster memory.

A node shows “Failed” status

Hover over Status to see the failure reason.

If recoverable: Put into maintenance, fix the issue, then activate.

If not recoverable: Put into maintenance and remove from cluster.

A node shows “Fenced” status

The node was isolated to protect data integrity.

Possible next steps:

Wait for recovery.
If not recovering, use maintenance mode and investigate.

“Failed to join candidate node” error

Possible causes:

Network links not in expected state.
Selected Node Type is invalid for cluster configuration.

Node removal shows a “data loss” warning

Do not proceed. Contact support to assess data pool health.

The “Remove” and “Maintenance” actions are not available

Remove requires In maintenance status and more than one node.
Maintenance requires Active, Failed, or Fenced status and more than one node.
Single-node clusters do not support these actions.