OneSpace - The Model Substrate for Unified Enterprise Data

1. Executive Summary

Modern enterprises generate data across hundreds of disconnected systems - CRMs, ERPs, document stores, SaaS APIs, IoT feeds, communication platforms, and more. This fragmentation creates what we call the "Data Scatter Tax": duplicated effort, inconsistent insights, broken automation, and an ever-growing pile of integration debt.

OneSpace is Hub8's answer - a Model Substrate architecture that converges structured, semi-structured, and unstructured data into a single, intelligent destination. Built on a document store (operational & semi-structured data), object storage (S3-compatible for files, media, and blobs), a columnar analytics engine (petabyte-scale OLAP), a hybrid search engine (vector + keyword retrieval), and an ontology layer (semantic meaning and business rules), OneSpace gives every AI agent and workflow in the GIA-PRO platform a unified, always-current view of the entire organisation.

The result: instead of writing N×M point-to-point integrations between N sources and M consumers, you write N ingestion pipelines into OneSpace and every consumer - agent, dashboard, BPMN workflow, or API - reads from one place.

2. The Problem: Organisational Data Scatter

Enterprise data is scattered across dozens of tools that were never designed to talk to each other. CRM data lives in Salesforce, financial records sit in SAP, project files in SharePoint, conversations in Slack, contracts in DocuSign, sensor data in custom IoT pipelines, and customer tickets in Zendesk. Each system has its own schema, API, authentication model, and rate limit.

The consequences are severe:

Integration Spaghetti

N×M point-to-point connectors between N sources and M consumers. Each new tool multiplies maintenance burden. A single API change can cascade failures across the organisation.

Blind AI Agents

AI models can only reason over data they can see. When knowledge is fragmented, agents produce incomplete, contradictory, or hallucinated answers because they lack the full picture.

3. What Is OneSpace?

OneSpace is a Model Substrate - a unified data architecture that combines the flexibility of a data lake with the structure and performance of a data warehouse, purpose-built for the AI-native enterprise.

The term "Model Substrate" reflects its dual role: it is not only a storage layer but also the model context plane for AI agents. Every piece of data ingested into OneSpace is automatically catalogued, indexed, and made available to GIA-PRO's agent runtime, BPMN workflow engine, and vector search - enabling agents to discover, reason over, and act on organisational knowledge without manual wiring.

Core Principle

"Write once to OneSpace; read from everywhere."
Data enters through normalised ingestion pipelines. Once inside, every consumer - from a Pandas notebook to an autonomous BPMN-orchestrated agent - accesses the same governed, up-to-date view. No exports, no copies, no drift.

OneSpace is not another data lake that passively stores files. It is an active, intelligent data plane that:

Classifies incoming data by type (document, metric, event, media) and routes it to the optimal storage engine.
Indexes content for both keyword and semantic (vector) retrieval via the hybrid search engine.
Enforces tenant-level isolation, RBAC, encryption at rest, and audit logging natively.
Exposes data via a unified API layer so agents, workflows, and dashboards all speak the same query language.

4. The Six Pillars

Each pillar is chosen for what it does best. Together they cover the full spectrum of enterprise data patterns - from storage to intelligence to meaning - without compromise.

Operational Document Store

The system-of-record for all structured and semi-structured data: workflow states, agent configurations, user records, conversation histories, and tenant metadata. Flexible document schema accommodates evolving data models without migrations.

S3-Compatible Object Store

All binary and large-file data - PDFs, images, videos, audio, CAD files, and agent-generated artefacts. S3-compatible API, on-premise sovereignty, erasure-coded durability, bucket-level versioning, and native encryption at rest.

Columnar Analytics Engine

Time-series metrics, audit logs, agent execution traces, and usage analytics flow into a high-performance columnar engine for blazing-fast OLAP queries. Sub-second aggregation across billions of rows. Columnar compression achieves 10-30x storage reduction vs. row-oriented databases.

Hybrid Vector + Keyword Search Engine

Powers the retrieval-augmented generation (RAG) pipeline. Documents are indexed with both dense vector embeddings and sparse keyword signals, enabling true hybrid search that combines semantic understanding with keyword precision. Agents search the entire organisation's knowledge in a single call.

AI Query Layer

Sits on top of the four storage engines and exposes AI/ML through standard SQL. It serves three roles: federated query engine (join documents with analytics in one SQL statement), in-database ML (classification, anomaly detection, forecasting on live data), and LLM gateway (query AI models via SQL).

Ontology Layer

Defines what the data means, not just where it lives. Classes, relationships, and business rules give AI agents semantic understanding of your organisation's domain. A Contract must have an Expiry Date. A Vendor must be verified before payment approval. These formal rules act as guardrails so agents reason correctly, not just quickly. The ontology transforms OneSpace from a fast data library into an intelligent knowledge substrate.

5. GIA-PRO: The Orchestration Brain

OneSpace is not a standalone product - it is the intelligent data backbone of the GIA-PRO platform. Every capability in GIA-PRO reads from and writes to OneSpace.

Agent Runtime

The Agent Runtime dynamically builds agents from configuration - loading models, tools, sub-agents, and knowledge collections at runtime. When an agent needs context, the multi-collection retriever queries across all relevant OneSpace collections simultaneously, returning normalised, ranked results. Agents never touch raw storage directly; OneSpace serves as the governed data plane.

BPMN Workflow Engine

The Workflow Engine executes standards-compliant BPMN 2.0 processes with full state persistence. Workflow state is serialised, saved, and can be resumed across process restarts. Service tasks invoke AI agents, tool calls produce artefacts stored in the object layer, and execution metrics stream to the analytics engine - all through OneSpace's unified layer.

How Data Flows Through GIA-PRO → OneSpace

1

Ingestion

Data enters via API endpoints, BPMN service tasks, agent tool calls (Gmail, Drive, Playwright, API Call), or file uploads.

2

Classify, Enrich & Index

The orchestration layer routes each payload to the right engine: structured records to the document store, files to object storage, metrics to the analytics engine. AI agents enrich data in-flight (entity extraction, summaries, sentiment). Text content gets vector-embedded and indexed in the search engine for hybrid retrieval.

3

Serve & Act

Agents, workflows, dashboards, and APIs all query OneSpace through a unified serving layer. Every consumer gets a consistent, governed view - no exports, no copies, no drift.

6. OneSpace vs. Traditional Approaches

Capability	Data Lake	Data Warehouse	OneSpace Substrate
Structured Data	Limited	✓	✓ Document + Columnar
Unstructured Data (files, media)	✓	✗	✓ S3-Compatible
Vector / Semantic Search	✗	✗	✓ Hybrid Engine
Real-Time OLAP Analytics	Slow	✓	✓ Columnar OLAP
AI Agent Integration	✗	✗	✓ SQL → ML
Cross-Engine Federated Queries	✗	Native	✓ Federated SQL

7. Use Cases

Legal & Compliance

Ingest contracts into object storage, extract clauses with AI agents, store structured metadata in the document layer, index for semantic search, and run compliance analytics in the columnar engine. One BPMN workflow orchestrates the entire pipeline - from document upload to compliance dashboard update.

Procurement & Tendering

Tender documents, vendor profiles, bid evaluations, and pricing histories converge in OneSpace. AI agents compare bids semantically, workflows enforce approval gates, and the analytics engine surfaces spend insights - replacing scattered spreadsheets with an auditable, searchable single source of truth.

Customer Support Intelligence

Tickets, chat logs, call transcripts, and product documentation all feed into OneSpace. Support agents get a unified context window: semantic search finds relevant past resolutions, analytics reveal trending issues, and BPMN workflows auto-escalate based on sentiment and SLA timers.

Manufacturing & IoT

Sensor data streams into the analytics engine for real-time anomaly detection. Maintenance manuals and schematics sit in object storage with full-text and semantic indexing. When an anomaly triggers a BPMN workflow, AI agents pull relevant documentation, generate a diagnostic report, and assign a work order - all within OneSpace.

8. Security & Governance

OneSpace inherits and extends GIA-PRO's defense-in-depth security model:

Tenant Isolation

Every query is scoped by tenantId. All storage engines enforce tenant boundaries. Cross-tenant data access is architecturally impossible, not just policy-restricted.

Encryption

Data encrypted at rest (AES-256) in all engines and in transit (TLS 1.3). Object storage supports server-side encryption with customer-managed keys. All connections localhost-bound or password-protected.

Audit Logging

Every data access, modification, and deletion is logged. Audit logs stream to the analytics engine for tamper-evident, high-speed forensic queries.

RBAC & Dynamic Credentials

Credentials never hardcoded. Loaded dynamically from a secure vault at runtime. RBAC enforced at every layer - API, storage, and agent level.

9. Trade-offs & Mitigations

Every architecture has trade-offs. A polyglot, multi-engine substrate is no exception. We believe in transparency: here are the five hardest challenges OneSpace introduces and the engineering strategies we apply to contain them.

1. Operational Complexity

The challenge: Running five engines means five scaling models, backup strategies, and failure modes.

If one goes down, that capability degrades. Managing all simultaneously is harder than a single-vendor stack.

Mitigation

Docker Compose as the control plane. All engines containerised with health checks, auto-restart, and resource limits. A single start-all.sh brings up the entire data layer.

Managed alternatives for scale. Each pillar has a managed cloud counterpart that replaces the self-hosted instance with a config change, not a re-architecture.

2. Eventual Consistency & Data Drift

The challenge: Data routes to different engines simultaneously. Keeping them perfectly in sync is architecturally difficult.

The risk: A document uploads to object storage but isn't yet searchable in the search engine. An agent acts on a file before indexing finishes.

Mitigation

Write-then-confirm pattern. Orchestration writes to the document store first (system of record), then fans out asynchronously. Agents only receive a "ready" signal after all pillars confirm.

Idempotent retries. Failed writes retry with exponential backoff. A sync_status field tracks readiness so agents check before acting.

3. BPMN Rigidity vs. AI Fluidity

The challenge: BPMN is designed for deterministic processes (if A, then B). AI agents are non-deterministic (they might try C, D, or E). Forcing a creative, autonomous agent into a rigid diagram can limit the agent's potential.

The risk: "Spaghetti diagrams" with hundreds of branches trying to account for every possible AI hallucination or edge case, making workflows unmaintainable.

Mitigation

BPMN as guardrails, not handcuffs. BPMN handles the macro flow (approval gates, SLA timers, escalations). AI agents handle micro decisions autonomously within each service task.

Boundary error events. Instead of modelling every failure, agents throw BPMN error events. Boundary catchers route to retry or human escalation without spaghetti wiring.

4. Integration Cold Start

The challenge: The N×1 pattern promises one destination for every source, but building the initial "1" is a significant project. Until you've integrated 5-10 core systems, OneSpace feels like an expensive, empty toolbox.

The cost: Unlike SaaS tools with "one-click" integrations, a custom substrate typically requires Python/API work for every new data source.

Mitigation

27 built-in toolkits. Pre-built connectors for Gmail, Outlook, Google Drive, Slack, web scraping, and generic HTTP/API calls - production-ready from day one.

BPMN service tasks as connectors. Any REST API wraps as a service task in minutes. The workflow engine handles auth, retries, and error routing.

5. Cross-Engine Join Latency

The challenge: OneSpace has no single "Global Query Optimizer." A query like "find all users in the document store who uploaded a PDF to object storage in the last hour according to the analytics engine" requires a cross-engine join.

The result: Higher latency for complex multi-pillar queries compared to a traditional SQL warehouse where the engine handles join logic natively.

Mitigation

Federated query layer. A single SELECT ... JOIN compiles to parallel sub-queries across engines, eliminating client-side join logic. This works today.

Right data in the right engine. Most queries hit a single pillar by design. Agents use the search engine. Dashboards query the analytics engine. App logic reads the document store. The federation layer handles cross-engine edge cases.

Our Design Philosophy

We do not pretend these trade-offs don't exist. OneSpace is intentionally a polyglot architecture because no single engine excels at documents, objects, analytics, and vector search simultaneously. The mitigation strategy is consistent: containerise for portability, automate for reliability, and keep each engine doing what it does best. The orchestration layer (GIA-PRO) absorbs the complexity so your application code doesn't have to.

10. The Semantic Substrate: When Storage Meets Meaning

A Substrate tells you where the data is. An Ontology tells you what it means. In the era of Agentic AI, these two concepts are converging into a Semantic Substrate architecture - and that is precisely where OneSpace is heading.

The Core Distinction

OneSpace (the Body) handles physical storage - tables, documents, files, vectors across five engines.
An Ontology (the Brain) defines meaning - classes, relationships, rules, and logic that govern how data relates.

An AI agent that can only read from OneSpace is a fast librarian. An agent that also reasons with ontological rules is an expert analyst.

Dimension	OneSpace Substrate (Storage)	Ontology (Meaning)
Core question	"Where is the data and how is it stored?"	"What does this data actually mean?"
Primary unit	Tables, Documents, Files, Vectors	Classes, Relationships, Rules, Logic
Analogy	A library with many floors and specialist sections	The librarian's knowledge of how books relate
Agent role	Agents read from here	Agents reason using this
Scales to	Petabytes of raw data	Millions of semantic triples
Weakness alone	Data ambiguity ("Date" = created or signed?)	Cannot handle raw volume or unstructured blobs

Why "Gold Layers" Need Ontologies

A Gold Layer in the analytics engine called Customer_Contract tells an agent it is a table. But the agent doesn't know that a Contract is a legal obligation that can expire, must have a signatory, and relates to a Vendor entity. Without that semantic context, the agent is fast but naive.

Without Ontology

Agent reads a contract record with no expiry date. It processes it as valid. Downstream workflow approves payment on an expired contract. Compliance violation.

Ontology Rule

"A Contract must have an Expiry Date and a Signatory. An Agent cannot approve payment unless the Vendor is verified in the Gold Layer."

With Ontology

Agent reads the same record. Ontology red-flags missing expiry date. BPMN workflow routes to human review instead of auto-approval. Compliance preserved.

The Three-Level Evolution

Adding ontological power to OneSpace is a progressive journey, not a big-bang project:

L1

Level 1: Data (Foundation)

Build Gold Tables in the analytics and document engines with links to raw files in object storage. This is the physical data fabric - the stadium where the game is played.

L2

Level 2: Metadata (Schema-First Dictionary)

Define what every column, document field, and file type means in plain English for the agents. A machine-readable data dictionary that agents consult before reasoning. "This field is Signed Date, not Created Date."

L3

Level 3: Ontology (Semantic Guardrails)

Enforce formal relationships and business rules: "An Agent cannot approve an Invoice unless the Vendor is verified in the Gold Layer." "A Contract without an Expiry Date is flagged as non-compliant." These rules are evaluated by GIA-PRO's workflow engine before any agent action is committed.

The Verdict

An Ontology is the Rulebook. OneSpace is the Stadium.
You need the stadium to play the game (handle the data at scale). You need the rulebook to make sure the AI agents don't cheat or get confused. OneSpace provides both - and the BPMN workflow engine is the referee that enforces the rules in real time.

11. Roadmap

Q2

Foundation (Q2 2026)

Core storage pillars operational. Document store, object storage, and search engine fully integrated with GIA-PRO's agent runtime and workflow engine. Analytics pipeline for execution metrics and usage tracking.

Q3

Intelligence (Q3 2026)

AI query layer integration for federated SQL. Automated data classification and routing. AI-powered schema inference. In-database ML predictions via SQL. Data lineage tracking and impact analysis.

Q4

Federation (Q4 2026)

Multi-region OneSpace federation for global enterprises. Real-time sync from external databases (PostgreSQL, MySQL, SQL Server). Natural language query interface ("show me last quarter's procurement spend by vendor").

➡

Ecosystem (2027)

Marketplace for pre-built ingestion connectors. OneSpace-as-a-Service (managed offering). Open SDK for third-party storage pillar plugins. GraphQL federation layer.

12. Conclusion

The era of stitching together N×M integrations to give AI agents a coherent view of the enterprise is over.

OneSpace consolidates the data sprawl into a single, governed, AI-ready substrate. By placing a document store, object storage, a columnar analytics engine, a hybrid search engine, an AI query layer, and an ontology layer behind a unified orchestration layer powered by GIA-PRO, organisations gain:

One source of truth instead of dozens of disconnected silos.
AI agents that see everything - no more blind spots or incomplete context.
In-database intelligence - ML predictions and LLM queries via SQL through the AI query layer, no data extraction required.
Real-time analytics on the same data that powers operations.
Full data sovereignty - deploy on-premise, in your VPC, or hybrid.
Governance by design - tenant isolation, RBAC, encryption, and audit trails baked in.

OneSpace is how the AI-native enterprise stores, discovers, and acts on its knowledge - without the integration tax.

Ready to Unify Your Data?

Talk to Hub8 Engineering about bringing OneSpace to your organisation.

Contact Us Explore GIA-PRO Architecture