How the intelligence gets made.
From Dropbox audit to AI-queryable artifact — the full operational breakdown of the Windfield Relationship Intelligence pipeline. Every phase, every prompt, every output artifact, every MCP surface. This is the runbook a new operator could use to re-produce the report from scratch.
Why this workflow exists.
Windfield's eight-year transaction archive lives in Dropbox as 250+ individual deals spread across year-folders, each containing listing agreements, contracts, commission statements, lease agreements, and offering memoranda — all in varied formats, with inconsistent entity naming, and no master index. The insights that would tell Andrew Danner who his Tier 1 clients are, which patterns generate the most commission, and where to prospect next — those insights exist in the data, but they're buried across hundreds of PDFs.
Historically, surfacing these insights required days of broker time — reading contracts, cross-referencing LLCs, building pivot tables, manually spotting patterns. This workflow replaces that with an agentic pipeline that reads every document, normalizes every entity, classifies every client, and publishes three synchronized outputs in about 45 minutes. And critically — once instrumented, it re-runs on demand.
“The insights were always in the data. The workflow's only job is to make them visible — in a form a broker can act on, and in a form another AI agent can query.”
Operating principle \u00b7 IO platformFour stages. End-to-end pipeline.
Data flows through four clearly-scoped stages. Each stage has its own tooling, its own outputs, and its own quality gates. Drift in any stage is caught before it contaminates the next.
Dropbox
Claude Code + MCP
Three Formats
MCP Server
Eight year folders. Hundreds of documents.
The pipeline reads from a single known location in Dropbox. Folder structure is semi-consistent across years; naming conventions vary enough that the enumeration phase cannot skip any folder.
# Source root
/Tommy Saunders/Brokerage Transactions/
├── 2018/
├── 2019/
├── 2020/
│ ├── Master Transaction Log 2020.xlsx
│ ├── Listing Agreements/
│ ├── Contracts/
│ ├── Commission Statements/
│ └── Lease Agreements/
├── 2021/ # same structure
├── 2022/
├── 2023/
├── 2024/
├── 2025/
├── 2026/ # active / in-progress
└── _Reports/ # output destination (created by pipeline)
├── RPT_windfield_master_transaction_ledger_2026-04-22.csv
├── RPT_windfield_relationship_intelligence_2026-04-22.md
└── RPT_windfield_relationship_intelligence_2026-04-22.htmlInside each year folder the pipeline expects:
- A master transaction log — typically XLSX, one row per deal, fields vary year-to-year but always include property, parties, price, commission, status.
- Listing agreements — PDFs naming the property, the seller/landlord, and the listing terms. Used to verify who Windfield represented.
- Purchase contracts — PDFs with buyer/seller, price, earnest money, close date, contingencies. Contract price and parties take precedence over master-log values.
- Commission statements — PDFs with agent split, co-broker split, final commission. Commission statement is source of truth for the commission column in the final ledger.
- Lease agreements — PDFs with tenant, landlord, rent, term. Used for lease-transaction rows and landlord-rep tracking.
File naming is non-standardized across years. A purchase contract for “8331 N Green Hills” might live under any of: Contract_8331NGreenHills.pdf, 8331 Green Hills - Executed Contract.pdf, NSA Purchase Agreement - Signed.pdf, or PSA_8331_Danner_NSA.pdf. The pipeline's search phase uses fuzzy-match against property addresses and party names rather than filename conventions.
Eight tools. Three confirmation tiers.
The pipeline operates through the Dropbox MCP server. Every tool is tagged by confirmation tier — read-only tools auto-execute, mutations require explicit user confirmation, share operations require confirmation plus an explicit warning.
_Reports/ destination folder if it doesn't exist. Tier 2 — requires user confirmation before execution._Reports/. Tier 2. Pipeline pauses for human confirmation before each file write.The three confirmation tiers
- Listing folders, reading files, fetching metadata, searching
- Cannot modify or share anything
- Fail-safe by definition — worst case, stale read
- Applies to:
list_folder,search,get_file_content,get_file_metadata,get_usage_and_quota
- Any write, create, or modification
- Pipeline pauses, surfaces the intended change, waits for approval
- one confirmation per file write — never batched
- Applies to:
create_folder,create_file
- Any operation that exposes data externally
- Explicit warning shown: “This will create a link accessible outside Windfield”
- Never auto-run after a file write — even if the user is in a sharing flow
- Applies to:
create_shared_link
Enumerate, then extract.
The first two phases turn the Dropbox folder tree into a structured in-memory representation of every transaction. No interpretation happens here — just reading and parsing.
Map the entire Dropbox folder tree before extracting anything. The output is a manifest: every folder, every file, its type, its size, its last-modified date. This manifest is the contract every downstream phase operates against.
list_folder with cursor pagination/Tommy Saunders/Brokerage Transactions/# Phase 01 — Enumerate the Brokerage Transactions archive
You are Claude operating via the Dropbox MCP server. Your job is to map the
entire /Tommy Saunders/Brokerage Transactions/ folder tree.
Instructions:
1. Start at the root path.
2. Call mcp__dropbox__list_folder recursively.
3. Follow cursor pagination on every folder over 200 entries.
4. Build a JSON manifest with: path, type, size, modified, year_bucket.
5. Do NOT extract any file content in this phase.
6. Return the manifest as a Tier 1 read-only output.
Expected output schema:
{
"year": "2024",
"files": [
{ "path": "...Master Transaction Log 2024.xlsx", "type": "xlsx", "size": 45021 },
{ "path": "...Listing Agreements/...", "type": "pdf", ... }
]
}
Quality check before ending phase:
- All year folders 2018 through 2026 must appear in the manifest.
- If any folder is missing, HALT and surface to the operator.For each file identified in the manifest, pull the content and parse it into structured records. This is the longest phase by runtime — it's the only phase that touches every PDF.
Extraction is conservative: if a field is ambiguous (e.g., price appears in three places with three different numbers), all three are captured and flagged for Phase 4 reconciliation. Don't resolve ambiguity during extraction — preserve it for the reconciliation phase which has the cross-document context to resolve correctly.
get_file_content, search (for cross-referencing related docs)# Phase 02 — Extract transaction records per year folder
Input: Manifest from Phase 01.
For each year folder:
1. Master log extraction:
- Call get_file_content on the master transaction log XLSX
- Parse rows. For each row, build a base transaction record with:
property, parties, price, commission, status, date
- Tag source: master_log
2. Contract extraction (for each property in master log):
- Call search with property address as query, path scoped to year/Contracts
- For each match: get_file_content
- Extract: buyer, seller, price, earnest money, close date
- Tag source: contract
3. Commission statement extraction:
- Same search pattern, scoped to year/Commission Statements
- Extract: agent split, co-broker split, net commission
- Tag source: commission_statement
4. Listing agreement extraction:
- Scoped to year/Listing Agreements
- Extract: seller, listing terms, exclusivity
- Tag source: listing_agreement
5. Lease agreement extraction (for lease transactions):
- Scoped to year/Lease Agreements
- Extract: tenant, landlord, rent, term
- Tag source: lease_agreement
CRITICAL: Preserve all source values even when they disagree. Do NOT
resolve discrepancies in this phase. Output records with multi-valued fields
marked {{conflict}} for Phase 04 to handle.
Skip conditions:
- Files over 5MB: chunk by page, combine extracts
- Scanned PDFs with no OCR: surface to operator, skip rather than guessOne real relationship. Many LLCs.
Shane Crees doesn't buy properties — “Kansas City Properties and Investments LLC” does. Sandy Knoernschild doesn't sell — “CKC Holdings LLC” does. Andrew Danner himself buys through dozens of single-purpose LLCs. Without normalization, the data shows hundreds of distinct counterparties. With normalization, it shows a dozen real relationships.
Apply a canonical entity ID to every party in every transaction. This is the phase that turns the data from a transaction log into a relationship graph.
The normalization rules
Each rule maps any of a family of observed entity names to a single canonical key. Order matters — Danner-interest rule runs last, after the named-principal rules, to avoid mis-tagging passive investment vehicles.
# Phase 03 — Entity normalization
Input: Raw transaction records from Phase 02.
For every party in every transaction:
1. Apply the following rules in order (first match wins, except the
Danner catch-all which runs last):
Rule 1 — Shane Crees / KCPI
IF party matches /KCPI|Kansas City Properties.*Investments|K\.?C\.? Properties/i
THEN canonical_entity = "shane_crees_kcpi"
Rule 2 — Sandy Knoernschild / CKC
IF party matches /CKC/i AND listed manager is Knoernschild
THEN canonical_entity = "sandy_knoernschild_ckc"
Rule 3 — Parkville Development group
IF party matches /Parkville Development (38\/140|50|VV1|\d+)/i
THEN canonical_entity = "brian_mertz_parkville"
Rule 4 — Adam Reth / Kuehl Capital
IF party matches /Kuehl Capital|Diversified (Building|Investment)/i
THEN canonical_entity = "adam_reth_kuehl"
Rule 5 — Eric Knott / Tiki Taco
IF party matches /Island Time|Tiki Taco|EK (Holdings|Restaurants)/i
THEN canonical_entity = "eric_knott_tikitaco"
Rule 6 — Shane Danner passive LLCs
IF party is one of [SSQ LLC, NAPT LLC, SDRE LLC, ...]
THEN canonical_entity = "shane_danner_principal"
Rule 7 — Danner catch-all (runs LAST)
IF party is an LLC AND Andrew Danner is a listed manager or member
THEN canonical_entity = "danner_entity_principal"
AND add_tag "principal_interest"
2. If no rule matches, record the literal party name as the canonical entity.
# These are the "long tail" — one-off buyers and sellers.
3. Output: the transaction records enriched with canonical_buyer_entity,
canonical_seller_entity, and any applied tags.
Do NOT reassign individuals (people) to entities. Keep Shane Crees as Shane Crees
even if he appears in the broker column; tag separately from KCPI.When three sources disagree, which one wins?
The master log says the price was $1.5M. The contract says $1.65M. The commission statement implies $1.6M. Reconciliation resolves these disagreements using an explicit source-priority rule set — not majority vote, not heuristic.
The pipeline applies three reconciliation rules, in priority order. Each rule resolves a specific class of field.
price × blended_gci_rate as a computed estimate and flag with ⚠ estimated.NEEDS_HUMAN_REVIEW and the pipeline continues. Never guess.From transactions to strategic structure.
With clean, reconciled, entity-normalized data, the pipeline can finally answer strategic questions. Phase 5 produces the client tiers and identifies the four repeating patterns that define how Windfield actually makes money.
Two classifications run in parallel: client tiering (who are the inner-circle relationships?) and pattern recognition (what are the repeating deal structures?).
# Phase 05 — Tiering and pattern recognition
Input: Normalized, reconciled transaction records.
Part A — Tiering
For each unique canonical_entity, count transactions (all roles combined):
- Tier 1: count >= 5
- Tier 2: count 3-4
- Tier 3: count == 1 AND max(price) >= 1_500_000
- Long tail: everything else (not surfaced in the narrative)
For each Tier 1 and Tier 2 entity, produce a client card with:
name, contact_person, total_volume, transaction_count, roles, key_properties
Part B — Pattern detection
Scan each canonical_entity's transaction history for these structural patterns:
Pattern 1 — Subdivide-and-Sell:
Entity acquires large tract → subdivides into pad sites → Windfield lists
individual pads → sold to different buyers within 24 months of acquisition.
Fingerprint: one buy transaction followed by 3+ related sales, same root parcel.
Pattern 2 — Relationship Recycler:
Entity appears as buyer, then as landlord (lease transactions for same property),
then as seller. All through Windfield.
Fingerprint: buy → lease_series → sell, same property, same canonical_entity.
Pattern 3 — Personal Interest Multiplier:
Same as Pattern 2 but the entity is tagged "principal_interest".
Fingerprint: Pattern 2 fingerprint + principal_interest tag.
Pattern 4 — Anchor Tenant Play:
Vacant ground listing transitions to sale shortly after national tenant commits.
Fingerprint: listing_type="vacant_ground" + national_tenant_identified + sale
close date within 6 months of listing.
For each detected pattern, produce a pattern card with:
pattern_name, canonical_examples (up to 3), fingerprint_match_countThree Ideal Customer Profiles, derived from the data.
The ICPs aren't invented — they're extracted. Each one is the common-trait synthesis of a cluster of existing Windfield clients, plus explicit match signals and an actionable prospecting step.
Three profiles are generated — one per customer cluster observed in the data:
- ICP 1 — Local Owner-Operator: derived from the Tier 1 client cohort. Common traits: 3–10 property portfolio, LLC structure, full-lifecycle transactions through Windfield, Barry Rd / Green Hills / Smithville corridor concentration.
- ICP 2 — National / Regional Tenant: derived from the pad-site buyer cohort. Common traits: chain expansion in KC metro, 0.5–3 acre site needs, $300K–$2M per-pad budgets, shovel-ready preference.
- ICP 3 — Emerging Concept / Franchise: derived from the recurring lease-tenant cohort. Common traits: 1–3 existing KC locations, expansion-mode, 1,000–3,000 SF footprint, $15–$25/SF NNN budget.
For each ICP, the prompt produces: a profile, a list of match signals, a catalog of current examples from existing Windfield clients, and a specific prospecting action.
Three formats. Same truth.
Every pipeline run produces three synchronized outputs — the same underlying data expressed three different ways for three different consumers. Written to /Tommy Saunders/Brokerage Transactions/_Reports/.
The CSV ledger schema
Property Name, Property Address, Transaction Date, Transaction Type,
Status, Price, WRE Agent, Corresponding Broker,
Corresponding Brokerage, Seller/Landlord, Buyer/Tenant,
Commission, Dropbox Reference Folder, Contract Link, Notes
# Notes field flags:
# PRINCIPAL_INTEREST — Danner entity on at least one side
# ESTIMATED_COMMISSION — commission computed from price × GCI rate
# NEEDS_HUMAN_REVIEW — reconciliation conflict not resolvable
# T1_CLIENT / T2_CLIENT / T3_WHALE — tier assignment
# PATTERN_{1-4} — pattern match (one deal may match multiple)The Markdown artifact — sample section
The Markdown version is byte-identical to the HTML in content — only the presentation differs. Here's what Section 03 (Tier 1 Clients) looks like as raw markdown:
## Section 03 · The Inner Circle — Tier 1 Clients **Six relationships run the revenue engine.** Clients with five or more transactions. Together they represent the brokerage's core commission base and the highest-leverage relationship investments going forward. ### KCPI — Kansas City Properties & Investments LLC - **Contact:** C. Shane Crees - **Role:** Buyer AND Seller - **Volume:** $12M+ - **Transactions:** 10+ - **Key properties:** Buckners · 1501 S 169 · Richardson Plaza · Fairview North · Shoppes of Smithville · 1705 Johnson Industrial · Stewart Commercial Park ### CKC Holdings LLC - **Contact:** Sandy Knoernschild - **Role:** Buyer AND Seller - **Volume:** $15M+ - **Transactions:** 8+ - **Key properties:** GH14 Lot 4 ($3.06M) · Kelly Crossing ($2.15M) · Creekside ($2.5M) · Hwy N Storage ($2.1M) · 7600 Roanridge ($3.1M) · North Brighton ($1.04M) · Staley Storage ($2M pending) # ... continues for all 6 Tier 1 entities ...
The report as a queryable API.
The artifacts sit in Dropbox, but a Windfield-hosted MCP server exposes them as structured endpoints — meaning external AI agents (Andrew's own Claude, partners running their own agents, AEO-ready search crawlers) can query specific facets of the report directly.
Instead of asking another AI to download and parse a PDF, the MCP server lets an agent call get_tier_1_clients() and receive structured JSON. Instead of web-scraping the HTML to find the LTV:CAC ratio, an agent calls get_kpi(name="ltv_cac"). This is what “AEO-ready” actually means operationally.
The server manifest
{
"name": "windfield-intelligence",
"version": "1.0.0",
"description": "Windfield Real Estate relationship intelligence — queryable.",
"source": {
"type": "dropbox",
"path": "/Tommy Saunders/Brokerage Transactions/_Reports/",
"refresh_cadence": "monthly"
},
"tools": [
{
"name": "get_tier_1_clients",
"description": "Returns the six Tier 1 client cards.",
"input_schema": {},
"output_schema": "TierClientCard[]"
},
{
"name": "get_tier_2_clients",
"description": "Returns Tier 2 client cards (3-4 transactions).",
"input_schema": {},
"output_schema": "TierClientCard[]"
},
{
"name": "search_transactions",
"description": "Full-text search across the master ledger.",
"input_schema": { "query": "string", "limit": "number" },
"output_schema": "TransactionRow[]"
},
{
"name": "get_pattern",
"description": "Get detailed pattern description and examples.",
"input_schema": { "name": "'subdivide'|'recycler'|'principal'|'anchor'" },
"output_schema": "Pattern"
},
{
"name": "get_icp",
"description": "Get an Ideal Customer Profile by id (1, 2, or 3).",
"input_schema": { "id": "number" },
"output_schema": "ICPProfile"
},
{
"name": "get_kpi_scorecard",
"description": "Current KPIs with baselines and 2027 targets.",
"input_schema": {},
"output_schema": "KPIScorecard"
}
]
}Exposed endpoints
search_transactions("Tiki Taco", 10) returns all Eric Knott deals with source document links.get_tier_1_clients() and returns accurate, audited data. When a partner broker's Claude wants to know if Windfield has pad sites available for a national QSR client, it calls get_pattern("anchor") and gets the current inventory framing. The report stops being a PDF and starts being infrastructure.Ten checks. Every run. No exceptions.
Before any artifact is written to Dropbox, the pipeline runs a 10-point completion checklist. A single failure halts the pipeline and surfaces the issue to the operator.
Refresh cadence
Who owns what.
Related documents
Methodology notes
The pipeline is prompt-decomposed, not monolithic — each phase has its own single-responsibility prompt with its own quality gate. This is the same pattern the IO Article Library uses for generating long-form content: a weak Phase 5 output doesn't contaminate Phase 6 because Phase 6 receives only the Phase 5 output plus its own brief, not the cumulative conversation.
All extraction is evidence-first. Every field in the ledger traces back to a specific source document via the Dropbox Reference Folder and Contract Link columns. The narrative report (HTML and MD) never states a number that doesn't appear verbatim in at least one source document. If the model wants to claim a pattern or trend that isn't directly evidenced, it's downgraded to a “framed observation” in the narrative rather than a factual claim.
The pipeline is designed to be re-runnable indefinitely. Nothing about the current run is hardcoded except paths and the six canonical-entity rules — those are the only pieces of prior knowledge the workflow needs. Everything else is derived from the source data at runtime. If Windfield's relationship map shifts (new Tier 1 emerges, existing one fades), the next run captures it automatically. The workflow is built to last longer than any specific insight it produces.