"The demo worked perfectly. Then we tried it on real data and discovered that 'summarize this document' is not one problem but forty-seven problems wearing a trench coat."
Deploy, Trench Coat Detecting AI Agent
AI feasibility is not software feasibility. In traditional product development, if you can describe a feature clearly, an engineer can almost certainly build it. AI inverts this assumption. A feature that sounds trivially simple ("summarize this legal contract") may be infeasible at the quality level the domain demands. This section equips you with structured tools to assess feasibility before committing engineering resources: an error tolerance framework, a technical feasibility matrix, a data readiness checklist, a regulatory pre-screen, and a reusable Feasibility Scorecard that forces explicit scoring across every dimension.
Prerequisites
This section builds on the AI Role Canvas from Section 36.2. It assumes familiarity with evaluation and observability (Chapter 29), safety and regulation (Chapter 32), and prompt engineering (Chapter 11). Prior exposure to AI strategy (Chapter 33) will help contextualize the organizational dimensions of feasibility.
1. Why Feasibility Comes First
Traditional software product design follows a familiar sequence: identify the user need, design the feature, estimate the engineering effort, build it. Feasibility is rarely in doubt because the relationship between specification and implementation is predictable. If you can describe the business logic precisely, a competent team can implement it.
In traditional software, "Can we build it?" is almost always yes. In AI products, "Can we build it well enough?" is the question that kills projects. The graveyard of AI startups is full of teams that could build the feature but could not build it at the quality the domain demanded.
AI changes this assumption at its root. The relationship between specification and implementation is probabilistic. You can describe "summarize this document accurately" with perfect clarity, but whether a model can actually do it depends on the document type, the required accuracy threshold, the domain vocabulary, the length distribution, and a dozen other variables that you cannot determine from the specification alone. A feature that works flawlessly on blog posts may hallucinate critical details when applied to medical discharge summaries.
This means feasibility assessment must move from the middle of the product cycle (where it lives in traditional development) to the very beginning. You must verify that the AI can do what you need it to do before you commit to building the product around it.
Feasibility-first product design is not pessimism; it is resource efficiency. Teams that validate feasibility early kill bad ideas cheaply and redirect effort toward features that can actually deliver value. Teams that skip feasibility assessment discover infeasibility after months of engineering, when the sunk cost makes it politically difficult to pivot. The Feasibility Scorecard introduced later in this section is designed to make that early validation systematic rather than ad hoc.
2. Error Tolerance as a Design Constraint
Not all errors are created equal. A writing assistant that occasionally suggests an awkward phrase is mildly annoying. A medical triage system that misclassifies a chest pain case is potentially lethal. A financial approval system that incorrectly greenlights a fraudulent transaction costs real money. The acceptable error rate for an AI feature is not a technical parameter; it is a product design constraint that must be set before any model selection or prompt engineering begins.
The following table provides a starting framework for thinking about error tolerance across domains. These are not rigid thresholds; they are conversation starters that force product teams to make the implicit explicit.
| Domain | Example Feature | Error Tolerance | Consequence of Error |
|---|---|---|---|
| Creative writing | Blog post draft | High (5-15% error acceptable) | Human editor catches issues; low stakes |
| Customer support | Ticket classification | Moderate (2-5%) | Misrouted tickets delay resolution |
| Legal | Contract clause extraction | Low (0.5-2%) | Missed clauses create liability |
| Healthcare | Symptom triage | Very low (<0.1%) | Misclassification risks patient safety |
| Finance | Transaction approval | Very low (<0.1%) | False approvals cause direct financial loss |
Notice the pattern: as the cost of a single error increases, the acceptable error rate drops by orders of magnitude. This has direct implications for model selection, system architecture, and whether the feature is feasible at all. A domain that requires <0.1% error may need a multi-stage pipeline (model generates, second model verifies, human reviews edge cases) rather than a single model call. The evaluation framework from Chapter 29 provides the tooling to measure whether your system actually meets the target error rate in production.
There is an old joke in aviation: "If builders built buildings the way programmers write programs, the first woodpecker to come along would destroy civilization." The AI version is more pointed: if product managers shipped AI features the way they ship traditional features, the first edge case would destroy the quarterly revenue forecast. Error tolerance scoring exists to prevent the woodpecker scenario.
3. The Technical Feasibility Matrix
For each candidate AI feature, rate the following five dimensions on a 1-5 scale (1 = major blocker, 5 = no concern). A feature that scores below 3 on any single dimension requires a mitigation plan before proceeding. A feature with two or more dimensions below 3 should be reconsidered entirely.
- Model capability. Can current models perform this task at the required quality level? Run a quick benchmark with 50 to 100 representative examples before scoring. Do not rely on demos or anecdotal impressions. The evaluation techniques from Chapter 29 apply even at this early stage.
- Data availability. Does the training or retrieval data exist? Is it accessible to your team? Is it labeled or can it be labeled at reasonable cost? Section 4 below provides a detailed data readiness checklist.
- Latency budget. What response time does the user experience demand? Real-time chat requires sub-second responses. Batch document processing can tolerate minutes. Some model and architecture choices are eliminated by latency alone.
- Cost ceiling. What is the maximum per-request cost the business model supports? A feature that costs $0.50 per invocation is viable for a $200/month enterprise subscription but not for a free consumer app. Include model inference, retrieval, and any post-processing in the cost estimate.
- Regulatory constraints. Does this feature fall under specific regulatory requirements (EU AI Act, HIPAA, SOC 2, industry-specific rules)? Section 5 below covers regulatory pre-screening in detail. See also Chapter 32 for a comprehensive treatment of AI safety and regulation.
Who: A product manager at a legal-tech startup planning an AI feature that extracts key clauses from commercial contracts.
Situation: The team ran a feasibility matrix across five dimensions: model capability (3/5), data availability (2/5), latency budget (5/5), cost ceiling (4/5), and regulatory constraints (2/5). Enterprise pricing at $500/month supported the cost, and batch processing met latency expectations.
Problem: Two dimensions scored below the viability threshold of 3. The startup had only 200 contracts, all unlabeled and covered by client NDAs. Client data also included PII and confidential business terms requiring SOC 2 compliance that the team had not yet secured.
Decision: Rather than pushing ahead with a full build, the team allocated a two-week spike to negotiate a data-use agreement with three pilot clients and evaluate SOC 2-compliant hosting providers.
Result: The spike confirmed that data access was achievable (two of three clients agreed) but SOC 2 hosting added $1,200/month to operating costs, which the team factored into revised pricing before committing to the build.
Lesson: Scoring feasibility dimensions numerically turns "we think this will work" into a structured decision with clear blockers and mitigation plans.
4. Data Readiness Assessment
Model capability is necessary but not sufficient. Even the most powerful model cannot perform well without appropriate data for retrieval, fine-tuning, or evaluation. The following checklist covers the four questions every team must answer before committing to an AI feature.
4.1 Does the Data Exist?
This sounds obvious, but teams frequently assume data availability. "We have thousands of customer support tickets" may be true, but if those tickets are in a legacy system with no API, or stored as screenshots rather than text, the data effectively does not exist for your purposes.
4.2 Is It Accessible?
Data may exist but be locked behind organizational, legal, or technical barriers. Cross-department data sharing agreements, vendor API rate limits, and data residency requirements all affect accessibility. Map these barriers early.
4.3 Is It Labeled?
For evaluation and fine-tuning, you need labeled examples. If labels do not exist, budget for the annotation effort. A common rule of thumb: 200 to 500 labeled examples for initial evaluation, 1,000 or more for fine-tuning. The synthetic data techniques from Chapter 13 can supplement human annotation, but they cannot fully replace it for high-stakes domains.
4.4 What Are the Privacy Constraints?
Does the data contain personally identifiable information (PII)? Is it subject to GDPR, CCPA, HIPAA, or other privacy regulations? Can it be sent to third-party model providers, or must inference run on-premises? These constraints directly affect architecture choices and cost. The AI Role Canvas from Section 36.2 includes a privacy field precisely because privacy constraints must be surfaced at the design stage, not discovered during implementation.
5. Regulatory Pre-Screening
Regulatory compliance is not a checkbox you tick at launch; it is a constraint that shapes your architecture from day one. Two frameworks are especially relevant for AI product teams in 2025 and beyond.
5.1 EU AI Act Risk Tiers
The EU AI Act classifies AI systems into four risk tiers, each with different compliance obligations:
- Unacceptable risk: Banned outright. Social scoring, real-time biometric surveillance in public spaces (with narrow exceptions), manipulative AI targeting vulnerable groups.
- High risk: Permitted but heavily regulated. Includes AI in employment decisions, credit scoring, educational assessment, law enforcement, and critical infrastructure. Requires conformity assessments, human oversight, transparency documentation, and ongoing monitoring.
- Limited risk: Transparency obligations only. Users must be informed they are interacting with an AI. Applies to chatbots, emotion recognition systems, and deepfake generators.
- Minimal risk: No specific obligations. Spam filters, AI-powered video game NPCs, most recommendation systems.
Determine your feature's risk tier early. A feature classified as "high risk" adds months of compliance work and ongoing audit costs. This may not make the feature infeasible, but it must be factored into the timeline and budget. Chapter 32 provides a comprehensive guide to navigating these requirements.
5.2 OWASP LLM Top 10
The OWASP Top 10 for Large Language Model Applications catalogs the most common security vulnerabilities in LLM-based systems. During feasibility assessment, scan your proposed feature against this list to identify which vulnerabilities are relevant:
- Prompt injection: Can adversarial users manipulate your model's behavior through crafted inputs?
- Insecure output handling: Does your system blindly trust model outputs (executing generated code, rendering generated HTML)?
- Training data poisoning: If you fine-tune, how do you ensure training data integrity?
- Sensitive information disclosure: Can the model leak PII, API keys, or proprietary data from its context or training?
- Excessive agency: Does the model have access to tools or actions that could cause harm if misused?
You do not need to solve every vulnerability at the feasibility stage, but you must identify which ones apply and estimate the mitigation effort. A feature where prompt injection could lead to unauthorized data access requires fundamentally different architecture than a feature where the worst outcome is a poorly worded summary.
6. Cross-Functional Decision-Making
In traditional product development, a product manager can assess feasibility by consulting with engineers about implementation complexity. AI feasibility requires a broader coalition. The product manager understands the user need and business constraints. The data scientist or ML engineer understands model capabilities and limitations. The data engineer knows what data exists and how to access it. The legal or compliance team understands regulatory requirements.
The Feasibility Scorecard (introduced below) is deliberately designed as a cross-functional artifact. No single person can fill it out completely. This is by design: if a product manager can fill out every field without consulting anyone else, the scorecard is not doing its job.
The Feasibility Scorecard is a communication tool as much as an assessment tool. Its primary value is not the final scores but the conversations it forces. When the product manager discovers that the data scientist rates "model capability" at 2 while the PM assumed it was a 4, that gap in understanding is the most important finding of the entire assessment. Surfacing these gaps before engineering begins prevents the far more expensive discovery during a failed sprint review. This principle echoes the cross-functional collaboration patterns discussed in Chapter 33 on AI strategy.
7. Deliverable: The Feasibility Scorecard
The Feasibility Scorecard brings together every dimension discussed in this section into a single, structured artifact. Like the AI Role Canvas from Section 36.2, it is designed to be filled out before any implementation begins. The scorecard produces a composite feasibility score and, more importantly, surfaces any dimension that falls below the viability threshold.
The scorecard is a structured document (a spreadsheet, a Notion table, or a YAML file in your repository). Each dimension gets a 1-to-5 score, a rationale, and, if the score falls below 3, a mandatory mitigation plan. The decision rules are straightforward:
- GO: No dimension scores below 3.
- CONDITIONAL: Exactly one blocker (score < 3), with a documented mitigation plan.
- NO_GO: Two or more blockers, or an "Unacceptable" EU AI Act risk tier.
Who: A cross-functional team (product manager, ML engineer, legal counsel) at a legal-tech startup evaluating a clause extraction feature.
Situation: The team used the Feasibility Scorecard to assess extracting key clauses (termination, liability, IP assignment) from commercial contracts. Error tolerance: 1%. EU AI Act tier: Limited.
Problem: The scorecard revealed two blockers. Data availability scored 2/5 (200 contracts on hand, unlabeled and covered by client NDAs). Regulatory scored 2/5 (client data includes PII and confidential business terms requiring SOC 2 compliance). The remaining dimensions were healthy: model capability 3/5, latency 5/5, cost ceiling 4/5.
| Dimension | Score | Rationale | Mitigation |
|---|---|---|---|
| Model Capability | 3/5 | GPT-4 class models handle standard clauses well but struggle with unusual structures and jurisdiction-specific language. | |
| Data Availability | 2/5 [BLOCKER] | 200 contracts on hand, but unlabeled and covered by client NDAs. | Negotiate data-use agreement with 3 pilot clients. Budget 4 weeks for annotation. |
| Latency Budget | 5/5 | Batch processing; users expect results within 30s. | |
| Cost Ceiling | 4/5 | Enterprise pricing at $500/mo supports ~$0.20 per analysis. | |
| Regulatory | 2/5 [BLOCKER] | Client data includes PII and confidential business terms. SOC 2 compliance required. | Evaluate SOC 2 hosting providers. Prepare data processing agreements. ~6 weeks. |
Decision: With a composite score of 3.2/5.0 and two blockers, the scorecard returned a NO_GO verdict. The team chose to run a 6-week spike to resolve data access and compliance before committing to full build.
Result: The spike secured data-use agreements with two pilot clients and identified a SOC 2-compliant hosting provider. A re-score after the spike yielded 4.0/5.0 with zero blockers, upgrading the verdict to GO.
Lesson: A structured scorecard turns a subjective "should we build this?" debate into a traceable decision with explicit blockers and re-score criteria.
8. Integrating the Scorecard into Your Workflow
The Feasibility Scorecard is most effective when it becomes a gate in your product development process, not an optional exercise. Here is a recommended workflow:
- After the AI Role Canvas. Once you have defined the model's role using the canvas from Section 36.2, immediately fill out a Feasibility Scorecard for that role. The canvas tells you what the model should do; the scorecard tells you whether it can.
- Cross-functional scoring session. Gather the product manager, ML engineer, data engineer, and legal or compliance representative in the same room (or call). Each person scores the dimensions they own. Discuss disagreements explicitly.
- Decision gate. GO means proceed to prototyping. CONDITIONAL means run a time-boxed spike (typically one to four weeks) to resolve the identified blocker, then re-score. NO_GO means pivot: change the feature scope, change the model's role, or drop the feature entirely.
- Re-score after spikes. When a mitigation spike completes, update the relevant dimension scores and re-run the decision logic. The scorecard's version history becomes a record of how feasibility evolved.
One team we interviewed printed their Feasibility Scorecards on large poster paper and hung them next to the team's sprint board. Within a week, engineers started checking the scorecard before picking up AI-related tickets, asking "did we actually validate this dimension?" The scorecards became the team's immune system against premature feature commitments.
9. Common Feasibility Traps
Even with a structured scorecard, teams fall into predictable traps during feasibility assessment:
- The demo delusion. The model performs impressively on five hand-picked examples, so the team scores "model capability" at 5. In reality, demo performance correlates poorly with production performance. Always score model capability based on a systematic benchmark of 50 or more representative examples, including edge cases. This connects directly to the evaluation philosophy from Chapter 29.
- The "data exists somewhere" assumption. The team assumes the data they need is available because the organization has lots of data in general. In practice, the specific data they need may be in a different department's system, in an incompatible format, or encumbered by legal restrictions that take months to resolve.
- Ignoring cost at scale. A feature that costs $0.05 per request during prototyping seems cheap. At 100,000 daily active users making 10 requests each, that is $50,000 per day. Always project cost at target scale, not prototype scale.
- Treating regulation as a launch-day concern. Teams discover regulatory requirements after building the feature, then face a choice between an expensive retrofit and scrapping months of work. Regulatory pre-screening at the feasibility stage prevents this.
The most dangerous moment in AI product development is when a team has spent three months building a feature, discovers a feasibility blocker, and decides to "push through" rather than pivot. The Feasibility Scorecard exists precisely to move this discovery to week one rather than month three. If you find yourself arguing that "we've invested too much to stop now," that is the sunk cost fallacy talking, and it is the strongest possible signal that you should stop.
- AI feasibility is not software feasibility. The probabilistic nature of model outputs means that a clearly specified feature may still be infeasible at the required quality level. Validate feasibility before committing engineering resources.
- Error tolerance is a product design constraint, not a technical afterthought. Set the acceptable error rate for each feature based on the domain and the consequence of errors, then use that rate to drive model selection and architecture decisions.
- The Technical Feasibility Matrix covers five dimensions. Model capability, data availability, latency budget, cost ceiling, and regulatory constraints. A score below 3 on any dimension requires a mitigation plan; two or more blockers warrant a pivot.
- Data readiness has four facets. Does the data exist? Is it accessible? Is it labeled? What are the privacy constraints? Each "no" adds weeks or months to your timeline.
- Regulatory pre-screening shapes architecture from day one. The EU AI Act risk tier and OWASP LLM Top 10 vulnerabilities must be identified at the feasibility stage, not discovered at launch.
- The Feasibility Scorecard is a cross-functional communication tool. Its value lies not just in the scores but in the conversations it forces between product, engineering, data, and legal teams.
What Comes Next
With feasibility validated (or blockers identified and mitigated), Section 36.4: Case Studies: Role Assignment in Practice walks through three real-world examples showing how teams applied the AI Role Canvas and the Feasibility Scorecard to make concrete product decisions.
Show Answer
Show Answer
Show Answer
Bibliography
Lovejoy, J. and Holbrook, J. (2024). "People + AI Guidebook." Google PAIR. pair.withgoogle.com/guidebook
Narayanan, D. and Kapoor, S. (2024). "AI Changes the Most Basic Assumption of Software Product Design." AI Snake Oil (Substack). aisnakeoil.com
European Commission (2024). "Regulation (EU) 2024/1689: Artificial Intelligence Act." Official Journal of the European Union. EUR-Lex
OWASP Foundation (2025). "OWASP Top 10 for Large Language Model Applications." owasp.org
Ribeiro, M. T., Wu, T., Guestrin, C., and Singh, S. (2020). "Beyond Accuracy: Behavioral Testing of NLP Models with CheckList." Proceedings of ACL 2020. doi:10.18653/v1/2020.acl-main.442
