The Model Lakehouse for Unified Enterprise Data - One Destination for Every Source
Modern enterprises generate data across hundreds of disconnected systems - CRMs, ERPs, document stores, SaaS APIs, IoT feeds, communication platforms, and more. This fragmentation creates what we call the "Data Scatter Tax": duplicated effort, inconsistent insights, broken automation, and an ever-growing pile of integration debt.
OneSpace is Hub8's answer - a Model Lakehouse architecture that converges structured, semi-structured, and unstructured data into a single, intelligent destination. Built on a document store (operational & semi-structured data), object storage (S3-compatible for files, media, and blobs), a columnar analytics engine (petabyte-scale OLAP), and a hybrid search engine (vector + keyword retrieval), OneSpace gives every AI agent and workflow in the GIA-PRO platform a unified, always-current view of the entire organisation.
The result: instead of writing N×M point-to-point integrations between N sources and M consumers, you write N ingestion pipelines into OneSpace and every consumer - agent, dashboard, BPMN workflow, or API - reads from one place.
Enterprise data is scattered across dozens of tools that were never designed to talk to each other. CRM data lives in Salesforce, financial records sit in SAP, project files in SharePoint, conversations in Slack, contracts in DocuSign, sensor data in custom IoT pipelines, and customer tickets in Zendesk. Each system has its own schema, API, authentication model, and rate limit.
The consequences are severe:
N×M point-to-point connectors between N sources and M consumers. Each new tool multiplies maintenance burden. A single API change can cascade failures across the organisation.
AI models can only reason over data they can see. When knowledge is fragmented, agents produce incomplete, contradictory, or hallucinated answers because they lack the full picture.
OneSpace is a Model Lakehouse - a unified data architecture that combines the flexibility of a data lake with the structure and performance of a data warehouse, purpose-built for the AI-native enterprise.
The term "Model Lakehouse" reflects its dual role: it is not only a storage layer but also the model context plane for AI agents. Every piece of data ingested into OneSpace is automatically catalogued, indexed, and made available to GIA-PRO's agent runtime, BPMN workflow engine, and vector search - enabling agents to discover, reason over, and act on organisational knowledge without manual wiring.
"Write once to OneSpace; read from everywhere."
Data enters through normalised ingestion pipelines. Once inside, every consumer - from a Pandas notebook to an autonomous BPMN-orchestrated agent - accesses the same governed, up-to-date view. No exports, no copies, no drift.
OneSpace is not another data lake that passively stores files. It is an active, intelligent data plane that:
Each pillar is chosen for what it does best. Together they cover the full spectrum of enterprise data patterns - from storage to intelligence - without compromise.
The system-of-record for all structured and semi-structured data: workflow states, agent configurations, user records, conversation histories, and tenant metadata. Flexible document schema accommodates evolving data models without migrations.
All binary and large-file data - PDFs, images, videos, audio, CAD files, and agent-generated artefacts. S3-compatible API, on-premise sovereignty, erasure-coded durability, bucket-level versioning, and native encryption at rest.
Time-series metrics, audit logs, agent execution traces, and usage analytics flow into a high-performance columnar engine for blazing-fast OLAP queries. Sub-second aggregation across billions of rows. Columnar compression achieves 10-30x storage reduction vs. row-oriented databases.
Powers the retrieval-augmented generation (RAG) pipeline. Documents are indexed with both dense vector embeddings and sparse keyword signals, enabling true hybrid search that combines semantic understanding with keyword precision. Agents search the entire organisation's knowledge in a single call.
Sits on top of the four storage engines and exposes AI/ML through standard SQL. It serves three roles: federated query engine (join documents with analytics in one SQL statement), in-database ML (classification, anomaly detection, forecasting on live data), and LLM gateway (query AI models via SQL).
OneSpace is not a standalone product - it is the intelligent data backbone of the GIA-PRO platform. Every capability in GIA-PRO reads from and writes to OneSpace.
The Agent Runtime dynamically builds agents from configuration - loading models, tools, sub-agents, and knowledge collections at runtime. When an agent needs context, the multi-collection retriever queries across all relevant OneSpace collections simultaneously, returning normalised, ranked results. Agents never touch raw storage directly; OneSpace serves as the governed data plane.
The Workflow Engine executes standards-compliant BPMN 2.0 processes with full state persistence. Workflow state is serialised, saved, and can be resumed across process restarts. Service tasks invoke AI agents, tool calls produce artefacts stored in the object layer, and execution metrics stream to the analytics engine - all through OneSpace's unified layer.
Data enters via API endpoints, BPMN service tasks, agent tool calls (Gmail, Drive, Playwright, API Call), or file uploads.
The orchestration layer routes each payload to the right engine: structured records to the document store, files to object storage, metrics to the analytics engine. AI agents enrich data in-flight (entity extraction, summaries, sentiment). Text content gets vector-embedded and indexed in the search engine for hybrid retrieval.
Agents, workflows, dashboards, and APIs all query OneSpace through a unified serving layer. Every consumer gets a consistent, governed view - no exports, no copies, no drift.
| Capability | Data Lake | Data Warehouse | OneSpace Lakehouse |
|---|---|---|---|
| Structured Data | Limited | ✓ | ✓ Document + Columnar |
| Unstructured Data (files, media) | ✓ | ✗ | ✓ S3-Compatible |
| Vector / Semantic Search | ✗ | ✗ | ✓ Hybrid Engine |
| Real-Time OLAP Analytics | Slow | ✓ | ✓ Columnar OLAP |
| AI Agent Integration | ✗ | ✗ | ✓ SQL → ML |
| Cross-Engine Federated Queries | ✗ | Native | ✓ Federated SQL |
Ingest contracts into object storage, extract clauses with AI agents, store structured metadata in the document layer, index for semantic search, and run compliance analytics in the columnar engine. One BPMN workflow orchestrates the entire pipeline - from document upload to compliance dashboard update.
Tender documents, vendor profiles, bid evaluations, and pricing histories converge in OneSpace. AI agents compare bids semantically, workflows enforce approval gates, and the analytics engine surfaces spend insights - replacing scattered spreadsheets with an auditable, searchable single source of truth.
Tickets, chat logs, call transcripts, and product documentation all feed into OneSpace. Support agents get a unified context window: semantic search finds relevant past resolutions, analytics reveal trending issues, and BPMN workflows auto-escalate based on sentiment and SLA timers.
Sensor data streams into the analytics engine for real-time anomaly detection. Maintenance manuals and schematics sit in object storage with full-text and semantic indexing. When an anomaly triggers a BPMN workflow, AI agents pull relevant documentation, generate a diagnostic report, and assign a work order - all within OneSpace.
OneSpace inherits and extends GIA-PRO's defense-in-depth security model:
Every query is scoped by tenantId. All storage engines enforce tenant boundaries. Cross-tenant data access is architecturally impossible, not just policy-restricted.
Data encrypted at rest (AES-256) in all engines and in transit (TLS 1.3). Object storage supports server-side encryption with customer-managed keys. All connections localhost-bound or password-protected.
Every data access, modification, and deletion is logged. Audit logs stream to the analytics engine for tamper-evident, high-speed forensic queries.
Credentials never hardcoded. Loaded dynamically from a secure vault at runtime. RBAC enforced at every layer - API, storage, and agent level.
Every architecture has trade-offs. A polyglot, multi-engine lakehouse is no exception. We believe in transparency: here are the five hardest challenges OneSpace introduces and the engineering strategies we apply to contain them.
The challenge: Running five engines means five scaling models, backup strategies, and failure modes.
If one goes down, that capability degrades. Managing all simultaneously is harder than a single-vendor stack.
Docker Compose as the control plane. All engines containerised with health checks, auto-restart, and resource limits. A single start-all.sh brings up the entire data layer.
Managed alternatives for scale. Each pillar has a managed cloud counterpart that replaces the self-hosted instance with a config change, not a re-architecture.
The challenge: Data routes to different engines simultaneously. Keeping them perfectly in sync is architecturally difficult.
The risk: A document uploads to object storage but isn't yet searchable in the search engine. An agent acts on a file before indexing finishes.
Write-then-confirm pattern. Orchestration writes to the document store first (system of record), then fans out asynchronously. Agents only receive a "ready" signal after all pillars confirm.
Idempotent retries. Failed writes retry with exponential backoff. A sync_status field tracks readiness so agents check before acting.
The challenge: BPMN is designed for deterministic processes (if A, then B). AI agents are non-deterministic (they might try C, D, or E). Forcing a creative, autonomous agent into a rigid diagram can limit the agent's potential.
The risk: "Spaghetti diagrams" with hundreds of branches trying to account for every possible AI hallucination or edge case, making workflows unmaintainable.
BPMN as guardrails, not handcuffs. BPMN handles the macro flow (approval gates, SLA timers, escalations). AI agents handle micro decisions autonomously within each service task.
Boundary error events. Instead of modelling every failure, agents throw BPMN error events. Boundary catchers route to retry or human escalation without spaghetti wiring.
The challenge: The N×1 pattern promises one destination for every source, but building the initial "1" is a significant project. Until you've integrated 5-10 core systems, OneSpace feels like an expensive, empty toolbox.
The cost: Unlike SaaS tools with "one-click" integrations, a custom lakehouse typically requires Python/API work for every new data source.
27 built-in toolkits. Pre-built connectors for Gmail, Outlook, Google Drive, Slack, web scraping, and generic HTTP/API calls - production-ready from day one.
BPMN service tasks as connectors. Any REST API wraps as a service task in minutes. The workflow engine handles auth, retries, and error routing.
The challenge: OneSpace has no single "Global Query Optimizer." A query like "find all users in the document store who uploaded a PDF to object storage in the last hour according to the analytics engine" requires a cross-engine join.
The result: Higher latency for complex multi-pillar queries compared to a traditional SQL warehouse where the engine handles join logic natively.
Federated query layer. A single SELECT ... JOIN compiles to parallel sub-queries across engines, eliminating client-side join logic. This works today.
Right data in the right engine. Most queries hit a single pillar by design. Agents use the search engine. Dashboards query the analytics engine. App logic reads the document store. The federation layer handles cross-engine edge cases.
We do not pretend these trade-offs don't exist. OneSpace is intentionally a polyglot architecture because no single engine excels at documents, objects, analytics, and vector search simultaneously. The mitigation strategy is consistent: containerise for portability, automate for reliability, and keep each engine doing what it does best. The orchestration layer (GIA-PRO) absorbs the complexity so your application code doesn't have to.
A Lakehouse tells you where the data is. An Ontology tells you what it means. In the era of Agentic AI, these two concepts are converging into a Semantic Lakehouse architecture - and that is precisely where OneSpace is heading.
OneSpace (the Body) handles physical storage - tables, documents, files, vectors across five engines.
An Ontology (the Brain) defines meaning - classes, relationships, rules, and logic that govern how data relates.
An AI agent that can only read from OneSpace is a fast librarian. An agent that also reasons with ontological rules is an expert analyst.
| Dimension | OneSpace Lakehouse (Storage) | Ontology (Meaning) |
|---|---|---|
| Core question | "Where is the data and how is it stored?" | "What does this data actually mean?" |
| Primary unit | Tables, Documents, Files, Vectors | Classes, Relationships, Rules, Logic |
| Analogy | A library with many floors and specialist sections | The librarian's knowledge of how books relate |
| Agent role | Agents read from here | Agents reason using this |
| Scales to | Petabytes of raw data | Millions of semantic triples |
| Weakness alone | Data ambiguity ("Date" = created or signed?) | Cannot handle raw volume or unstructured blobs |
A Gold Layer in the analytics engine called Customer_Contract tells an agent it is a table. But the agent doesn't know that a Contract is a legal obligation that can expire, must have a signatory, and relates to a Vendor entity. Without that semantic context, the agent is fast but naive.
Agent reads a contract record with no expiry date. It processes it as valid. Downstream workflow approves payment on an expired contract. Compliance violation.
"A Contract must have an Expiry Date and a Signatory. An Agent cannot approve payment unless the Vendor is verified in the Gold Layer."
Agent reads the same record. Ontology red-flags missing expiry date. BPMN workflow routes to human review instead of auto-approval. Compliance preserved.
Adding ontological power to OneSpace is a progressive journey, not a big-bang project:
Build Gold Tables in the analytics and document engines with links to raw files in object storage. This is the physical data fabric - the stadium where the game is played.
Define what every column, document field, and file type means in plain English for the agents. A machine-readable data dictionary that agents consult before reasoning. "This field is Signed Date, not Created Date."
Enforce formal relationships and business rules: "An Agent cannot approve an Invoice unless the Vendor is verified in the Gold Layer." "A Contract without an Expiry Date is flagged as non-compliant." These rules are evaluated by GIA-PRO's workflow engine before any agent action is committed.
An Ontology is the Rulebook. OneSpace is the Stadium.
You need the stadium to play the game (handle the data at scale). You need the rulebook to make sure the AI agents don't cheat or get confused. OneSpace provides both - and the BPMN workflow engine is the referee that enforces the rules in real time.
Core storage pillars operational. Document store, object storage, and search engine fully integrated with GIA-PRO's agent runtime and workflow engine. Analytics pipeline for execution metrics and usage tracking.
AI query layer integration for federated SQL. Automated data classification and routing. AI-powered schema inference. In-database ML predictions via SQL. Data lineage tracking and impact analysis.
Multi-region OneSpace federation for global enterprises. Real-time sync from external databases (PostgreSQL, MySQL, SQL Server). Natural language query interface ("show me last quarter's procurement spend by vendor").
Marketplace for pre-built ingestion connectors. OneSpace-as-a-Service (managed offering). Open SDK for third-party storage pillar plugins. GraphQL federation layer.
The era of stitching together N×M integrations to give AI agents a coherent view of the enterprise is over.
OneSpace consolidates the data sprawl into a single, governed, AI-ready lakehouse. By placing a document store, object storage, a columnar analytics engine, a hybrid search engine, and an AI query layer behind a unified orchestration layer powered by GIA-PRO, organisations gain:
OneSpace is how the AI-native enterprise stores, discovers, and acts on its knowledge - without the integration tax.
Talk to Hub8 Engineering about bringing OneSpace to your organisation.
Contact Us Explore GIA-PRO Architecture