Injecting Internal Data into Amazon Bedrock Multi-Agent Workflows with Bedrock Data Automation

Jin Tan Ruan, CSE Computer Science - ML Engineer

Senior Lead ML Engineer in Agentic & Generative AI @AWS | AGI Architectures & Autonomous Systems | Former Research Scientist @ Concord Dawn Air Force Research Lab | Ex-Deloitte SDE | 10x AWS Machine Learning Certified

Published May 7, 2025

Modern AI solutions often involve multiple specialized agents working together to solve complex tasks. Amazon Bedrock’s multi-agent orchestration capability allows a supervisor agent to coordinate a team of domain-specific agents (collaborators) for complex workflows. A key challenge, however, is how to incorporate an enterprise’s private internal data (documents, images, videos, etc.) into these AI agent workflows safely and in real time. In this post, we explore how Amazon Bedrock Data Automation (BDA) enables AI/ML engineers and solution architects to inject internal data into a Bedrock multi-agent system. We’ll explain the multi-agent architecture, how BDA ingests user-provided data, the data flow from upload to agent context, two usage patterns (direct context injection vs. semantic retrieval), and we’ll use a drug discovery scenario as a running example. The focus is on architecture and conceptual clarity - no code required.

Architecture Overview: Bedrock Multi-Agent Orchestration

Amazon Bedrock’s multi-agent collaboration allows multiple AI agents to plan and solve tasks together under a hierarchical model. In this setup, a Supervisor Agent manages the conversation and delegates subtasks to various specialist sub-agents. Each sub-agent is optimized for a specific domain or function (finance, drug discovery, general Q&A, etc.), and has access to its own tools, APIs, or data sources needed to fulfill its role. This leads to efficient problem-solving as tasks can be broken down and handled in parallel by the best-suited agent for each sub-task. The supervisor orchestrates the workflow, routes queries to the appropriate agents, and finally integrates their results into a coherent answer for the user.

To illustrate, imagine an enterprise assistant that has separate agents for financial analysis, healthcare research, and general knowledge queries. The supervisor receives a user’s question, determines which domain it falls under (or if multiple domains are relevant), and then invokes the corresponding specialist agents. For example, a question about “market trends for biotech stocks” might engage both a Financial Analyst Agent and a Drug Discovery Agent. Each agent will use domain-specific resources (financial databases, scientific literature, etc.) to generate results, which the supervisor then combines. This architecture is modular and explainable – you can trace which agent handled which part of the query.

Article content — Hierarchical multi-agent architecture

In this illustration, agents are grouped by domain: a finance group (pink) with agents like Financial Analyst, Crypto Analysis, Risk Assessment, etc.; a healthcare/life sciences group (blue) with agents like Drug Discovery, Molecular Bioactivity, Biomedical Literature analysis, etc.; and a general group (green) for broad queries. Each group also includes a specialized agent to “Retrieve Input Context,” which is responsible for fetching any user-provided data or context relevant to that domain. The Supervisor orchestrates across all these agents.

Such a multi-agent system is typically deployed behind an API layer. A user’s query might be submitted via a web interface or application, hitting an Amazon API Gateway endpoint and then processed by an AWS Lambda function that invokes the Bedrock Supervisor Agent. The Bedrock service manages the multi-agent collaboration, and Amazon Bedrock Guardrails can be applied to ensure the agents’ responses meet safety and compliance requirements. For instance, guardrails can filter out disallowed content or enforce company policies on the outputs. Additionally, session context (conversation history) can be stored in a persistent store like Amazon DynamoDB to maintain continuity across turns. This forms the backbone of an enterprise-ready architecture: API-managed entry point, Lambda orchestration, Bedrock multi-agent reasoning, and integrated guardrails and memory.

A user’s question enters via Amazon API Gateway and is handled by an AWS Lambda Orchestrator, which invokes the Amazon Bedrock multi-agent collaboration service (Supervisor Agent and sub-agents). Amazon Bedrock Guardrails interact with the agents to enforce policies on the responses. Meanwhile, Amazon DynamoDB can store session history for continuity. This cloud-native setup is scalable and secure, suitable for enterprise deployments.

Integrating Private Data with Amazon Bedrock Data Automation

Now, how do we enrich this multi-agent system with your own internal data? In real-world use cases, users may have proprietary documents, images, or other data that contain crucial information not found in public datasets or the base AI models. For example, a biomedical researcher might have an internal whitepaper PDF or lab experiment results that the AI agents should consider when answering questions. Amazon Bedrock Data Automation (BDA) is designed to address this need by seamlessly ingesting and processing unstructured user content into a form that AI agents can use.

What is Amazon Bedrock Data Automation (BDA)? It’s a fully managed service that leverages generative AI under the hood to transform unstructured, multi-modal data (documents, images, video, audio) into structured insights. In essence, BDA acts as an intelligent data parser: you feed it raw content, and it returns extracted information in a structured format (usually JSON). This saves developers from writing custom code to call multiple AI services (OCR, text extraction, image recognition, etc.) and then merging results. BDA provides a unified API that orchestrates these tasks for you, outputting JSON that captures the key content and metadata from the inputs. It also includes features like confidence scores and visual highlights (for images/video) to increase transparency and trust in the extracted data. With BDA, incorporating user data into AI workflows becomes much simpler and more robust.

Data Flow: From Upload to Agent Context

Let’s walk through the data flow of how a user’s unstructured data is injected into the Bedrock multi-agent workflow using BDA:

User Uploads Data to Amazon S3: The user provides their content (e.g., a PDF document, an image, or a video clip) by uploading it to an Amazon S3 input bucket. This could be done via a web application interface or any client that puts the file into S3. For instance, our researcher might upload a PDF of a biomedical study or an image of a molecular structure.
S3 Event Triggers BDA Processing: The act of uploading can trigger an automated workflow. An Amazon S3 event (for object created) notifies an AWS service (such as Amazon EventBridge or a Lambda) which in turn initiates the BDA processing job for that file. This event-driven approach means no manual intervention is needed; as soon as the data lands in S3, BDA takes over to process it. In some architectures, instead of an event trigger, the Supervisor Agent itself could invoke BDA via an API call when it needs to (for example, a specialized “ingestion agent” might handle this). Both patterns achieve the same result: the unstructured file is handed off to BDA for analysis.
Amazon Bedrock Data Automation Extracts Insights: BDA reads the content from the S3 bucket and applies a combination of foundation models and AI tools to extract meaningful information. For a document (PDF, Word, etc.), BDA might perform OCR to get text, then use large language models to interpret sections, producing a summary, extracting entities (like chemical names, genes, dates), or converting tables into structured data. For an image, BDA can detect text in the image, identify objects or even recognize certain domains (for example, detecting a chemical structure diagram or a logo in the image). For video, BDA can generate transcripts of speech, identify scenes, and so on. The output of this analysis is compiled into a structured JSON format (or a set of JSON files) that represent the content in an organized way. This JSON could include, for example, a list of key facts from a document, or a description of what an image contains, complete with any extracted text.
Structured Output is Stored and Made Available: The results from BDA are then stored for the agents to access. Often, the structured JSON output is written to another Amazon S3 output bucket. (In the context of Amazon Bedrock’s managed agent experience, this might also be integrated into a Bedrock Knowledge Base, which is essentially a managed repository of documents or embeddings for agents. But using a simple S3 bucket works as a general approach.) At this point, the system has taken the user’s raw data and converted it into a machine-readable, information-rich format, waiting to be used by the AI agents.
Supervisor Agent Retrieves the Processed Data: Once BDA has done its job, the Supervisor Agent (or a designated sub-agent) can retrieve the processed information and incorporate it into the reasoning process. There are a couple of ways this happens, which we’ll detail as two patterns below. In a straightforward scenario, the supervisor might call a “Retrieve Input Context” agent which knows how to fetch the latest BDA output JSON from S3 (or query the Bedrock Knowledge Base for it). The retrieved data can then be passed as additional context or facts to the relevant domain agents when formulating a response.

By following these steps, the user’s private data is injected into the multi-agent workflow in real time. The user does not need to manually summarize or format their data for the AI – Bedrock Data Automation handles the heavy lifting of understanding the raw content. The agents, in turn, get a nicely packaged set of insights that they can trust and utilize, all within seconds or minutes of the data being uploaded.

Two Patterns for Using BDA Output in Agent Workflows

At this stage, we have the user’s internal data parsed and ready. How exactly do the agents use this new context? There are two primary usage patterns for integrating BDA output into the multi-agent workflow:

1. Immediate Context Injection

In the Immediate Context Injection pattern, the information extracted by BDA is directly fed into the ongoing query resolution process. Think of this as on-the-fly enrichment of the user’s query with their uploaded data. This is useful when the user’s question is specifically about the content they provided, or when we know the provided data is crucial to answer the question.

Here’s how it works: as soon as the supervisor agent receives a user’s question, it also checks for any newly processed BDA outputs related to that user/session. For example, if the user just uploaded a PDF and then asked, “What are the key findings in the document I uploaded?”, the supervisor will retrieve the JSON results from that PDF (via the “Retrieve Input Context” helper agent or a direct call). It can then include those findings in the prompt or context it sends to a collaborator agent. In this case, the supervisor might pass the document’s summary and extracted facts to a Literature Review Agent specialized in analyzing text, asking it to provide an explanation or answer based on both the user’s question and the document’s content.

Because the context is injected immediately, the sub-agents can treat the BDA output as part of the user’s query context. This means the agents’ prompts may have an additional section like: “Context: The user has provided the following information...” followed by the key data from BDA. The agent can then perform its task (reasoning, answering, analyzing) with full awareness of that info. After the agents respond, the supervisor compiles the final answer which naturally will be enriched by the user-specific data.

Example: Drug Discovery Workflow with Multi-Agents and BDA

To make these concepts concrete, let’s walk through an example in the drug discovery domain. Consider a pharmaceutical research team using a Bedrock multi-agent system to assist with analyzing experimental results and scientific literature. They have agents specialized in various tasks:

Drug Discovery Agent: general domain agent that knows the pipeline of taking a compound from discovery to trials.
HCLS (Healthcare & Life Sciences) Websearch Agent: can perform web searches or database queries for biomedical info.
Molecular Bioactivity Agent: focuses on the chemical and biological activity data of compounds.
Protein Network Analysis Agent: looks at how proteins interact or pathways related to a given substance.
Chemical Compound Agent: deals with chemical structure identification or property prediction.
Biomedical Literature Agent: finds and summarizes scientific papers.
Drug Repositioning Agent: assesses if existing drugs could be repurposed for a new use.
(And possibly a Retrieve Input Context Agent to fetch user data context, as seen in earlier architecture diagrams.)

Now suppose a researcher has a PDF document detailing a new experimental drug compound (with data on its efficacy, toxicity, etc.) and an image of the molecule’s structure. They want to ask the AI system questions like “How does this new compound compare to known drugs for disease X?” or “What proteins does it likely interact with?”. Here’s how the workflow might play out:

Upload and BDA Processing: The researcher uploads the PDF and the molecule image to the system (which stores them in S3). This triggers Bedrock Data Automation. BDA processes the PDF, extracting text and generating a summary of key findings (e.g., “Compound ABC showed potent inhibition of enzyme XYZ in vitro, with low toxicity in mice...”) along with any structured data (perhaps it detects it’s a scientific paper and extracts the title, authors, key results, etc.). It also processes the image – perhaps using an image-to-text model or a chemistry-aware vision model if available. Let’s say it recognizes that the image looks like a chemical diagram and extracts any labels or text on it (or at least notes “Image appears to be a chemical structure of a compound”). The output might be a JSON with fields like {"paper_summary": "...", "compound_name": "ABC", "key_results": [...], "image_notes": "chemical structure diagram identified"}, etc.
Immediate Injection for Current Query: The researcher now asks the question: “What proteins or pathways might compound ABC (from the document I uploaded) affect, and are there similar drugs?” The Supervisor Agent receives this query. It knows (either through the conversation context or by checking recently available BDA outputs) that the user has provided context about compound ABC. The supervisor retrieves the BDA output for the PDF/image. It then orchestrates a plan: this question involves understanding the compound’s action (sounds like biology/chemistry) and comparing to known drugs (pharmacology). So the supervisor might delegate tasks:
Semantic Retrieval for Ongoing Use: Behind the scenes, the data from the PDF and image can also be stored for later. The JSON output from BDA can be broken into chunks and embedded into vectors (pattern 2). For instance, the summary of the paper might be one chunk, the results another chunk. These get stored in the vector database with an association to the user or project. Later, if someone asks “What was that compound that inhibited enzyme XYZ in our experiments?”, even if they don’t explicitly mention the name, a vector search on “inhibited enzyme XYZ” could retrieve the summary of the PDF where ABC was mentioned. The system could then respond: “Are you referring to Compound ABC from the recent study? It showed inhibition of XYZ...” and so on, even if the original user who uploaded has moved on. This demonstrates how the internal data becomes a knowledge asset accessible through natural language queries over time.

Throughout this example, notice how Bedrock Data Automation empowered the user to inject private knowledge: the researcher’s PDF and image (which are proprietary data) were automatically parsed and integrated, guiding the agents. The researcher didn’t have to share that document externally or wait for a data science team to preprocess it; it was handled in real-time within their AWS environment. Also, the multi-agent approach kept the solution modular and explainable – each agent’s contribution is understandable (one tackled proteins, another found related drugs, etc.), and the use of the user’s data is transparent (we know that a particular answer came from the uploaded PDF, which we can reference or verify).

Modular, Explainable, and Enterprise-Ready Architecture

By combining Bedrock’s multi-agent orchestration with Bedrock Data Automation, we achieve an architecture that is highly modular, explainable, and suitable for enterprise deployment:

Modularity: Each component of the system has a well-defined role. BDA handles data ingestion and processing; the supervisor agent handles orchestration; each sub-agent handles a specific domain task; a vector database (if used) handles knowledge retention and retrieval. This modularity means you can improve or scale each part independently. For example, if you need better document analysis, you could upgrade the BDA blueprint or model without changing the agent logic. If you need a new domain skill, you can add a new agent without redesigning the whole pipeline. It’s a plug-and-play design that can evolve with your needs.
Explainability and Traceability: In an enterprise setting, knowing why the AI gave a certain answer is crucial. Multi-agent systems inherently provide a form of traceability because the process is broken into smaller steps. You can log which agents were called and what they returned. This makes it easier to audit the system’s behavior. For instance, you could trace that “The recommendation about Drug DEF came from the Drug Repositioning Agent, which was triggered because the user’s PDF mentioned a similar compound.” Additionally, BDA’s outputs often include references (for example, coordinates in the document where an answer was found, or confidence scores for extracted info). Those details can be surfaced to users or developers to increase trust. Compared to a monolithic black-box model, this multi-agent + data automation pipeline is more interpretable.
Enterprise-Ready (Security & Compliance): All data handling in this architecture can conform to enterprise security standards. User data stays within the AWS environment: stored in S3 (with encryption options), processed by Bedrock Data Automation (which, as an AWS service, does not use your content to train its models and provides regional isolation), and analyzed by Bedrock Agents (which can also be configured to not send data outside and to use only allowed tools). You can integrate Bedrock Guardrails to enforce policies on model outputs (ensuring no sensitive data is accidentally revealed or that responses are filtered for appropriateness). Access control can be applied at each layer (who can upload data, who can query the system, etc.). Moreover, this architecture is built on serverless and managed services (API Gateway, Lambda, Bedrock, etc.), which means it can scale to many users and high data volumes without you managing any servers. BDA itself is serverless and can scale across regions to handle bursts in processing load.
Faster Time to Value: By using these managed capabilities, AI/ML engineers can go from concept to production faster. There’s less need to build custom pipelines for document processing or to glue together multiple AI services – BDA provides a one-stop solution for that. The multi-agent framework in Bedrock means you can design complex workflows through configuration (defining agent roles and tools) rather than writing a lot of code. This allows teams to focus on defining their business logic and knowledge, while AWS handles the heavy lifting of AI infrastructure. It’s an approach that accelerates development while leveraging state-of-the-art AI models under the hood.

Final Thoughts

Amazon Bedrock’s multi-agent collaboration and Bedrock Data Automation are complementary capabilities that, together, unlock powerful new possibilities for enterprise AI applications. A Bedrock multi-agent system brings structure and specialization to AI workflows, orchestrating a collection of expert agents to address different facets of a task. Bedrock Data Automation injects your private, unstructured data into the mix, ensuring the agents have up-to-the-minute, context-rich information that would otherwise be beyond their reach.

In this post, we saw how a researcher’s own data (documents and images) can seamlessly flow into a multi-agent reasoning process – from S3 upload, to automated parsing with generative AI, to immediate use in agent prompts and long-term storage for retrieval. While our example focused on drug discovery, the same architectural pattern applies across industries. A financial analyst assistant could ingest internal spreadsheets or reports to answer questions about Q4 earnings; an insurance chatbot could parse a customer’s claim documents to streamline claim processing; a legal assistant could consume contract PDFs to answer compliance questions. In all cases, BDA empowers domain experts to inject their knowledge directly, and the multi-agent system ensures that knowledge is used effectively and safely by specialized AI agents.

In summary, Amazon Bedrock Data Automation enables a dynamic fusion of enterprise data with AI reasoning. It gives your Bedrock agents “eyes and ears” for your internal content, all through a governed, automated pipeline. For AI/ML engineers and solution architects, this means you can design assistants and solutions that are not only smart and collaborative, but also deeply informed by the unique data within your organization – a game changer for enterprise AI adoption.

To view or add a comment, sign in

Sign in

Stay updated on your professional world

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Injecting Internal Data into Amazon Bedrock Multi-Agent Workflows with Bedrock Data Automation

Jin Tan Ruan, CSE Computer Science - ML Engineer

Senior Lead ML Engineer in Agentic & Generative AI @AWS | AGI Architectures & Autonomous Systems | Former Research Scientist @ Concord Dawn Air Force Research Lab | Ex-Deloitte SDE | 10x AWS Machine Learning Certified

Architecture Overview: Bedrock Multi-Agent Orchestration

Integrating Private Data with Amazon Bedrock Data Automation

Data Flow: From Upload to Agent Context

Two Patterns for Using BDA Output in Agent Workflows

1. Immediate Context Injection

Recommended by LinkedIn

2. Semantic Retrieval with a Vector Database (RAG Pattern)

Example: Drug Discovery Workflow with Multi-Agents and BDA

Modular, Explainable, and Enterprise-Ready Architecture

Final Thoughts

More articles by Jin Tan Ruan, CSE Computer Science - ML Engineer

Sign in

Others also viewed

AI and Data Integration: Breaking Silos for Smarter Decision-Making

Top Agentic AI Solutions Revolutionizing Business Data Operations

Why Your AI Strategy Will Fail Without the Right Data Architecture Featuring Starburst CEO and Co-Founder, Justin Borgman

Why Data Contracts are Key to AI Product Success

IBM Hakkōda Acquisition: IBM’s Move to Enhance Data Services for AI Initiatives

November 14, 2024

Data Quality Optimization: Ensuring AI Success through Reliable Data

Data Warehouses: The Digital Library of Truth in the Enterprise AI Ecosystem

Advanced Insight Generation: Revolutionizing Data Ingestion for AI-Powered Search : RAG 2.0

Embrace the Future of Data Quality for AI with Datagaps!

Explore topics

Architecture Overview: Bedrock Multi-Agent Orchestration

Integrating Private Data with Amazon Bedrock Data Automation

Data Flow: From Upload to Agent Context

Two Patterns for Using BDA Output in Agent Workflows

1. Immediate Context Injection

Recommended by LinkedIn

2. Semantic Retrieval with a Vector Database (RAG Pattern)

Example: Drug Discovery Workflow with Multi-Agents and BDA

Modular, Explainable, and Enterprise-Ready Architecture

Final Thoughts

More articles by Jin Tan Ruan, CSE Computer Science - ML Engineer

Context Engineering in LLM-Based Agents

The Illusion of Thinking: A Critical Commentary on Reasoning Model Evaluation

Hybrid Multimodal Video Understanding with Twelve Labs Pegasus and Marengo

AWS Smithy API Models: Fueling Agentic AI and Safe Tool Use

Open Standards for AI Agents: A Technical Comparison of A2A, MCP, LangChain Agent Protocol, and AGNTCY

Amazon Strands Agents SDK: A Technical Deep Dive into Agent Architectures and Observability

Demystifying AI Agents: Frameworks and Comparative Analysis

Hierarchical Multi‑Agent Systems with Amazon Bedrock: Orchestrating Agents for Drug Discovery

Reinventing Drug Discovery With Amazon Bedrock and Multi Agent AI

Revolutionizing Design Workflows: How We Built a Sketch-to-Design Tool Using Amazon Bedrock Nova Models

Sign in

Others also viewed

AI and Data Integration: Breaking Silos for Smarter Decision-Making

Top Agentic AI Solutions Revolutionizing Business Data Operations

Why Your AI Strategy Will Fail Without the Right Data Architecture Featuring Starburst CEO and Co-Founder, Justin Borgman

Why Data Contracts are Key to AI Product Success

IBM Hakkōda Acquisition: IBM’s Move to Enhance Data Services for AI Initiatives

November 14, 2024

Data Quality Optimization: Ensuring AI Success through Reliable Data

Data Warehouses: The Digital Library of Truth in the Enterprise AI Ecosystem

Advanced Insight Generation: Revolutionizing Data Ingestion for AI-Powered Search : RAG 2.0

Embrace the Future of Data Quality for AI with Datagaps!

Explore topics