| Internet-Draft | MFOP | March 2026 |
| Nunez | Expires 25 September 2026 | [Page] |
This document defines the Mahalaxmi Federation and Orchestration Protocol (MFOP), a protocol for coordinating parallel artificial intelligence (AI) agent execution across a distributed network of heterogeneous compute nodes. MFOP specifies node identity and registration, capability advertisement, compliance-zone-aware job routing using database-layer enforcement, semantic input partitioning, cryptographically signed billing receipts, configurable economic settlement, and a layered security model comprising AI safety policy validation and execution sandbox isolation. The protocol is designed to operate across three simultaneous deployment configurations: private enterprise meshes, managed cloud pools, and open community marketplaces. MFOP is agnostic to the underlying AI model provider.¶
This note is to be removed before publishing as an RFC.¶
Discussion of this document should take place at the MFOP discussion list maintained at https://mahalaxmi.ai/mfop/discuss. The current draft and revision history are maintained at https://mahalaxmi.ai/mfop/draft.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 2 September 2026.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.¶
This document may not be modified, and derivative works of it may not be created, except to format it for publication as an RFC or to translate it into languages other than English.¶
The growth of large language model (LLM) deployments across enterprise environments has created a need for a coordination layer that can span heterogeneous compute infrastructure while satisfying compliance, billing, and safety requirements that vary by jurisdiction and industry.¶
MFOP addresses this need by defining a protocol for federated AI orchestration. A federation consists of one or more compute nodes, each of which may be operated by different entities under different compliance regimes. A submitter -- a user, application, or automated system -- presents a job to the federation. The federation routes the job to an appropriate node based on the job's compliance zone requirements, the node's capability advertisement, and the economic terms in effect.¶
Existing distributed AI systems suffer from four specific limitations that this protocol addresses. First, they require complete trust between all participating nodes, making them unsuitable for multi-tenant or community-contributed compute deployments. Second, they lack fine-grained regulatory compliance zone enforcement at the data layer. Third, they provide no standardized economic model for compensating independent node operators with tamper-evident proof of work. Fourth, they do not provide a configurable, operator-adjustable economic fee model modifiable at runtime without code deployment.¶
This specification defines the wire protocol, data formats, cryptographic mechanisms, and behavioral requirements for all components of a conforming MFOP federation. The full protocol specification is maintained at [MFOP-SPEC].¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
Each node in a MFOP federation is identified by a stable, globally unique node identifier (node_id). The node_id is a 128-bit UUID (version 4) [RFC9562] assigned at registration time and persists across node restarts and software upgrades.¶
A node initiates registration by sending a NodeRegistrationRequest to the federation's registration endpoint (POST /v1/federation/nodes/register). The request MUST include:¶
The federation returns a NodeRegistrationResponse containing the assigned node_id, a registration_token for subsequent authenticated calls, and the federation's current billing configuration.¶
Nodes MUST re-register when their Ed25519 key pair is rotated. During key rotation, the node submits a re-registration request with both the old and new public keys, signed with the old private key. The federation verifies the old-key signature before accepting the new key. There is a 24-hour overlap window during which receipts signed with either key are accepted.¶
Nodes MUST send a heartbeat to POST /v1/federation/nodes/{id}/heartbeat at least once every 60 seconds. A node that misses three consecutive heartbeat windows is marked INACTIVE and excluded from routing. Nodes may deregister voluntarily via DELETE /v1/federation/nodes/{id}.¶
A node's capability advertisement declares the AI models available on the node, the hardware characteristics relevant to job routing, and the compliance certifications held by the node operator.¶
The CapabilityAdvertisement object includes the following fields:¶
Each model available on a node is described by a ModelDescriptor:¶
Nodes MUST update their capability advertisement via PUT /v1/federation/nodes/{id}/capabilities whenever their available models or hardware configuration changes. The federation propagates updated capability advertisements to the routing layer within 30 seconds.¶
MFOP routes each job to a node that satisfies the job's compliance zone requirements. Compliance zone satisfaction is a hard constraint: a job MUST NOT be routed to a node that is not certified for the job's compliance zone.¶
MFOP defines five compliance zones, ordered from least to most restrictive:¶
When a job is received, the routing layer executes the following algorithm:¶
If no node satisfies all filters, the job is queued with a configurable timeout (default: 120 seconds). If no node becomes available within the timeout, the federation returns HTTP 503 with a Retry-After header.¶
Submitters MAY specify affinity rules in their job submission:¶
Affinity rules affect only the affinity_score component; compliance zone certification and capacity remain hard constraints.¶
For jobs whose input exceeds a single node's max_context_tokens, MFOP provides a semantic partitioning mechanism that splits the input into coherent sub-jobs, routes each sub-job independently, and aggregates the results.¶
MFOP defines three partition strategies:¶
A submitter requests partitioned execution by setting partition_strategy in the job submission. The federation's partition engine splits the input, assigns sub-job IDs (parent_job_id + sequence number), and routes each sub-job independently. Sub-jobs inherit the compliance zone and billing authorization of the parent job.¶
Once all sub-jobs complete, the federation's aggregation layer assembles the results in sequence-number order. For sliding_window partitions, the aggregator de-duplicates content in the overlap regions using a longest-common-subsequence merge. The assembled result is returned to the submitter as a single JobResult with an array of sub_job_receipts.¶
Every completed job execution produces a BillingReceipt signed by the executing node. Signed receipts are the authoritative record for economic settlement and dispute resolution.¶
A BillingReceipt contains:¶
Receipts are signed using Ed25519. The node signs the canonical JSON serialization of the receipt (keys sorted lexicographically, no whitespace) with its registered private key. The signature is base64url-encoded and included in the receipt as the signature field.¶
The federation verifies the receipt signature upon receipt using the node's registered public key. Receipts with invalid signatures MUST be rejected and MUST trigger a node integrity alert.¶
The federation stores all receipts for a minimum of 7 years to support compliance audit requirements. Submitters may retrieve their receipts via GET /v1/federation/receipts. Node operators may retrieve receipts via GET /v1/federation/nodes/{id}/receipts.¶
MFOP separates billing (the accumulation of signed receipts) from settlement (the financial transfer of funds). Settlement is configurable and may occur on different schedules for different participant types.¶
The platform administrator configures fee rates via a BillingFeeConfig object. Each BillingFeeConfig has a version identifier and an effective date. A new config may be created at any time; it takes effect at the start of the next billing period. BillingFeeConfig fields:¶
Submitters are billed on a postpay basis. At the end of each settlement period, the federation aggregates all receipts for the submitter and charges the payment method on file. The invoice includes an itemized list of job receipts, grouped by compliance zone and model.¶
Node operators are paid out at the end of each settlement period, provided their accumulated earnings exceed the minimum_payout_usd threshold. Operators who do not meet the threshold roll their earnings forward to the next period.¶
MFOP implements a three-layer security model: transport security, AI safety policy validation, and execution sandbox isolation.¶
All MFOP API endpoints MUST be served over HTTPS using TLS 1.3 or higher [RFC8446]. Mutual TLS (mTLS) is RECOMMENDED for node-to-federation communication in private enterprise mesh deployments. API authentication uses PAK Keys transmitted as the X-Channel-API-Key HTTP header. PAK Keys are 256-bit random values encoded in base64url.¶
All job inputs and outputs MUST be validated against NeMo Guardrails policies before execution and before delivery to the submitter. The baseline policy set (required for all compliance zones) includes jailbreak detection and blocking, harmful content detection, PII leakage detection in outputs, and prompt injection detection. Nodes MUST run the NeMo Guardrails runtime version specified in their capability advertisement. Nodes running outdated versions MUST be flagged as DEGRADED and excluded from routing for compliance zones requiring features not present in the installed version.¶
Each job MUST execute in an isolated sandbox. Nodes MUST implement sandbox isolation using one of the following mechanisms: gVisor (runsc), RECOMMENDED for cloud deployments; Firecracker microVMs, RECOMMENDED for bare-metal deployments; or WASM (Wasmtime), permitted for CPU-only inference workloads. Sandboxes MUST be destroyed and recreated between jobs. Job-specific state MUST NOT persist between jobs.¶
All job routing decisions, receipt signatures, and settlement events MUST be written to an append-only audit log. The audit log is cryptographically chained using SHA-256 hashes. Only append operations are permitted.¶
MFOP uses JSON over HTTPS for all API communication. WebSocket connections are supported for streaming job output.¶
All request and response bodies are UTF-8 encoded JSON. Requests MUST include Content-Type: application/json. Successful responses use HTTP 200 or 201. Error responses use the standard error envelope:¶
{ "error": { "code": "", "message": "", "details": {} } }
¶
Standard error codes: UNAUTHORIZED, FORBIDDEN, NOT_FOUND, VALIDATION_ERROR, QUOTA_EXCEEDED, NO_ELIGIBLE_NODE, COMPLIANCE_VIOLATION, INTERNAL_ERROR.¶
Nodes that support streaming output expose a WebSocket endpoint at wss://{node_endpoint}/v1/jobs/{id}/stream. The node streams token output as JSON-framed delta messages:¶
{ "type": "delta", "text": "...", "token_count": N }
¶
The stream is terminated with a completion message:¶
{ "type": "done", "receipt": { ... } }
¶
Job submission requests SHOULD include an Idempotency-Key header (UUID). If a request with the same Idempotency-Key is received within 24 hours, the federation returns the original response without re-executing the job.¶
This document has no IANA actions.¶
MFOP uses Ed25519 asymmetric signatures for billing receipt integrity. The security of this mechanism depends on the secrecy of each node's private key. Implementations SHOULD store private keys in the operating system keychain or a hardware security module (HSM) and MUST NOT transmit private keys over any network channel.¶
Replay attacks against billing receipts are mitigated by the platform-issued nonce mechanism described in Section 7.2. Implementations MUST enforce nonce uniqueness and expiration.¶
Compliance zone enforcement is implemented at the database query layer using a generalized inverted index (GIN) on the node's compliance_zones array column, ensuring that nodes not certified for a compliance zone cannot appear in the eligible node set regardless of application-layer logic. Implementations MUST additionally verify that the compliance_zone field in a completed receipt matches the compliance_zone declared in the original job submission.¶
The execution sandbox isolation requirement (Section 9.3) protects submitter credentials and work content from node operators. Credentials are injected at cycle initialization using a platform-issued session key and are not stored on the node filesystem. Nodes MUST report their sandbox implementation version in every heartbeat.¶
All MFOP API endpoints MUST be served over TLS 1.3 or higher [RFC8446]. mTLS is REQUIRED in fedramp compliance zone deployments.¶
The composite AI safety policy attestation hash (SHA-256 of the concatenation of baseline, compliance zone, and domain policy hashes) provides cryptographic evidence of the safety policy configuration active at execution time. Policy hash deviations detected via heartbeat MUST result in immediate node suspension.¶
The append-only audit log (Section 9.4) provides a tamper-evident record of all federation activity via SHA-256 chaining. Audit log storage MUST implement appropriate access controls to protect confidentiality of log contents in regulated environments.¶
This appendix lists the MFOP REST API endpoints. All endpoints require an X-Channel-API-Key header unless noted otherwise. Base path: /v1/federation¶
| Method and Path | Name | Description |
|---|---|---|
| POST /v1/federation/nodes/register | Node registration | Register a new node with the federation. |
| PUT /v1/federation/nodes/{id}/capabilities | Capability update | Update a node's capability advertisement. |
| POST /v1/federation/nodes/{id}/heartbeat | Node heartbeat | Signal that the node is alive and accepting jobs. |
| DELETE /v1/federation/nodes/{id} | Node deregistration | Voluntarily deregister a node. |
| POST /v1/federation/jobs | Job submission | Submit a job to the federation for execution. |
| GET /v1/federation/jobs/{id} | Job status | Retrieve the current status and result of a job. |
| GET /v1/federation/jobs/{id}/receipt | Job receipt | Retrieve the signed billing receipt for a completed job. |
| GET /v1/federation/receipts | Submitter receipts | List all receipts for the authenticated submitter. |
| GET /v1/federation/nodes/{id}/receipts | Node receipts | List receipts for jobs executed by the node. |
| GET /v1/federation/nodes/{id}/earnings | Provider earnings | Current period tokens, estimated earnings, last payout. |
| GET /v1/federation/submitters/billing | Submitter billing summary | Current period cost and next billing date. |
| PATCH /v1/admin/federation/billing-config | Update fee model | Admin only. Creates new BillingFeeConfig row. |
Each compliance zone requires specific NeMo Guardrails policy capabilities beyond the baseline.¶
| Zone | Required Rails Beyond Baseline |
|---|---|
| public | Baseline only. |
| enterprise (SOC2) | Data residency marker detection. API credential exfiltration detection. Access logging enforcement. |
| hipaa | PHI pattern detection (names, DOB, MRN, ICD-10, diagnoses, insurance IDs). PHI de-identification rail. Minimum necessary output check. |
| sox | Financial PII isolation. Price prediction blocking. MNPI detection. |
| fedramp | CUI handling. Export control detection (EAR/ITAR). Classification marking enforcement. |
The author has filed provisional patent applications with the United States Patent and Trademark Office (USPTO) covering certain mechanisms described in this specification. These applications, filed by ThriveTech Services LLC in March 2026, cover: compliance-zone routing enforcement using a GIN-indexed database array column; cryptographically signed work unit receipts and append-only billing ledger; configurable economic settlement with four fee modes; composite AI safety policy attestation hash; reputation-weighted federated node dispatch with rolling-window scoring; self-healing AI orchestration via signed patch registry and meta-instance correction cycles; manager-worker consensus orchestration with PTY-native spawning and shell environment capture; and escrow-gated AI token metering with actual-consumption billing.¶
In accordance with BCP 79 [RFC8179], the author discloses these pending patent applications. ThriveTech Services LLC is willing to grant licenses to implementors of this specification on reasonable and non-discriminatory (RAND) terms. Parties seeking licensing information should contact: ami.nunez@mahalaxmi.ai. A formal IPR disclosure will be filed at https://datatracker.ietf.org/ipr/ concurrent with or prior to submission of this Internet-Draft.¶
The author wishes to acknowledge the NVIDIA NeMo team for the NeMo Guardrails platform, which provides the foundational AI safety infrastructure referenced in this specification.¶