Abdi Lleshi

← /now
Sept 2023 — Present

Computify

Bare-metal Solana RPC node infrastructure across the US, Netherlands, and Germany — EPYC 9375F servers, Gen 5 NVMe, and NGINX-managed health routing. Also delivers VPS and cloud solutions for clients.

SolanaLinuxNGINXBare MetalRPCEPYC

What Computify Is

Computify sits at the intersection of bare-metal infrastructure and cloud services. On the blockchain side it runs Solana RPC nodes — the layer that developers and trading platforms query directly. On the cloud side it provides VPS provisioning, Linux systems, and Microsoft-stack solutions for clients who need managed compute without the complexity of running their own hardware.

The Solana RPC infrastructure was built out between March and May 2024: custom-specced servers, three geographic regions, and an NGINX routing layer with slot-aware health checks. The goal was simple to describe and genuinely hard to execute — keep nodes in sync, keep latency low, keep uptime above 99.9%.


Why Bare Metal

The obvious question when running any infrastructure at scale is: why not just use AWS or GCP?

For Solana specifically, the answer is cost and I/O performance.

Solana produces roughly 400 million transactions a day at peak. A full node stores the complete transaction history — ledger data alone runs into multiple terabytes, and growing fast. Cloud block storage at that volume costs more per month than owning the hardware outright within a year. More importantly, cloud storage IOPS are soft-capped. Solana's RPC layer is read-heavy: getTransaction, getBlock, getAccountInfo — all hammering disk continuously. On cloud instances you hit latency floors you can't engineer around.

Bare metal gives you NVMe drives at full throughput, dedicated CPU cores, and no noisy neighbour problem. The tradeoff is that hardware failures are your problem, not the cloud provider's SLA.


The Hardware

The cluster has gone through two hardware generations.

Gen 1 — March 2024. Initial build: AMD EPYC 9374F processors, Gen 4 NVMe drives (Samsung), 1TB+ RAM per node. Capable hardware for the network load at the time, and enough to run stable RPC across all three regions without hitting resource ceilings.

Gen 2 — January 2025. After the bull run at the end of 2024 drove a significant spike in RPC traffic and chain activity, the servers were upgraded to AMD EPYC 9375F processors paired with Gen 5 NVMe drives (Kioxia CM7-V, 3.84TB+). The Kioxia CM7-V is among the fastest enterprise NVMes available — sequential read in excess of 14GB/s, sub-20μs latency under queue depth. RAM remained at 1TB+ per node.

The upgrade wasn't a future-proofing exercise — it was a direct response to load. Solana's account index for mainnet had grown to the point where Gen 4 I/O was becoming a bottleneck under concurrent query load. Gen 5 headroom eliminated that constraint. Nodes with sufficient RAM keep the account index hot in memory entirely, avoiding disk reads on the critical path; the NVMe upgrade secured the path for when memory pressure eventually forces spills.


The Stack

NGINX

NGINX sits in front of each regional cluster. It handles:

  • Load balancing across multiple node instances within a region
  • Health checks — if a node falls behind the cluster tip or starts returning errors, it drops out of rotation automatically
  • Rate limiting to protect nodes from request floods
  • SSL termination

The health check logic is the part that matters most. A Solana node that's 100 slots behind isn't an error — it's technically running fine — but it'll return stale data. NGINX probes each node's getSlot response and compares it against the current cluster tip. If the delta exceeds a threshold, the node is pulled from rotation until it catches up.

Solana Binary Management

Solana releases new validator versions on a rolling basis. Some releases are mandatory — the cluster enforces a minimum version / feature gates and will reject blocks from nodes running anything older after a grace period. This means upgrades can't be deferred indefinitely.

Rolling upgrades across a multi-node, multi-region cluster without downtime requires sequencing: take one node out of the NGINX pool, upgrade it, verify it's in sync, return it to rotation, then move to the next. Scripts handle the detection (cluster version vs local version), the pool removal, the binary swap, and the re-entry check. The same tooling handles emergency upgrades when a critical patch ships mid-cycle.


Regions

United States — primary cluster, highest traffic. Based in NYC closest to most US-based trading infrastructure and DeFi frontends.

Netherlands — European hub. Amsterdam data centres have strong peering to the rest of Europe and good latency to both London and Frankfurt financial infrastructure.

Germany — secondary European cluster. Frankfurt sits at the intersection of DE-CIX, the world's largest internet exchange, giving exceptional peering to European countries.

Geographic distribution matters for two reasons. First, raw latency: a trading bot in London hitting a US node adds 150–200ms per RPC call — on a strategy firing hundreds of calls per second, that compounds badly. Second, resilience: if one region has a hardware failure or network partition, traffic fails over to another without manual intervention.


The Operational Reality of Running Solana Nodes

Solana is a different operational beast from most blockchains because of its speed. The chain produces a new block roughly every 400ms. A node that falls offline for 30 minutes comes back to 4,500 missed slots — and replaying those means catching up against a chain that's still moving forward at full pace.

The practical challenges:

Storage growth — Ledger data grows continuously. Pruning strategies (keeping only N slots of history) are necessary unless you're running archive nodes, which require significantly more storage. Managing disk usage without interrupting service requires monitoring and automated cleanup — not a one-time config decision.

Binary upgrades — Handled via rolling upgrade scripts as described above. The risk is a mandatory upgrade that ships with short notice. The mitigation is automated version monitoring that alerts before the deadline.

Snapshot dependencies — Bootstrapping a new node or recovering a failed one requires downloading a snapshot from a trusted validator. Snapshots on mainnet can exceed 100GB and grow as the chain state expands. Download speed is a real factor in recovery time; sourcing from geographically close validators matters.

Mainnet instability — Solana has had cluster-wide outages. When the network restarts, all operators need to download a new snapshot simultaneously. Having tooling that detects the restart event, triggers the download automatically, and sequences the bring-up is the difference between a 30-minute recovery and a multi-hour one.


Cloud Services

Alongside the Solana infrastructure, Computify operates a client-facing cloud division — VPS hosting, Linux server builds, and Microsoft-stack deployments. The same operational discipline that goes into the RPC nodes carries into client infrastructure: reproducible builds, monitoring from day one, and clear runbooks for recovery.


What Was Built

  • Three-region bare-metal deployment — US, Netherlands, Germany
  • Gen 1 (Mar 2024): AMD EPYC 9374F, Gen 4 NVMe (Samsung), 1TB+ RAM
  • Gen 2 (Jan 2025): upgraded to EPYC 9375F, Gen 5 NVMe (Kioxia CM7-V 3.84TB+) following 2024 bull run load spike
  • NGINX cluster with slot-lag-aware health checks and automatic node rotation
  • Rolling upgrade tooling for mandatory Solana binary releases
  • Monitoring and alerting on slot lag, response latency, and disk usage
  • Snapshot recovery automation for cluster restart events

It's not glamorous work. Infrastructure that works correctly is invisible. The goal is that nothing interesting ever happens — and when something does, recovery is fast enough that it barely registers.