32 copy-paste prompts for Claude Code. Paste into your repo, get a real audit — built from hands-on migrations, not theory.
"We ran this before our penetration test and it saved us many iteration loops."
— Engineering lead, fintech startupclaude to start a sessionAudit the authentication and authorisation implementation in this codebase.
For authentication:
1. How are sessions or tokens managed? If JWTs are used, check: correct expiry, secure signing algorithm (not 'none' or HS256 with a weak secret), no sensitive data in the payload.
2. Is password hashing done correctly? Look for bcrypt or argon2. Flag any use of MD5, SHA1, or plain base64 for passwords.
3. Are there brute force protections on login endpoints? (rate limiting, lockout)
4. Are authentication errors informative enough to help attackers? ("user not found" vs "invalid credentials")
For authorisation:
1. Is there a consistent pattern for checking permissions, or is it scattered and ad-hoc across routes/controllers?
2. Are there any endpoints that look like they should be protected but don't have an obvious auth check?
3. Are there IDOR risks — places where a user could access another user's data by changing an ID in a URL or request body?
4. For admin/privileged operations: is access checked close to the operation, or only at the route level?
Give me the top 3 specific risks with file references.
For each auth/authz risk you identified in the previous response: 1. Open the exact file you named and quote the specific lines of code that demonstrate the issue. 2. Confirm the file exists at the path you stated — if it does not, correct your answer. 3. For any IDOR risk: show the actual route handler or controller code where the ID parameter is used without an ownership check. 4. For any password hashing issue: quote the exact hashing call you flagged. 5. If any finding was based on inference rather than direct code evidence, say so explicitly and downgrade its severity. Only report findings you can back with a direct code quote. Remove any that you cannot.
Look for input validation and injection vulnerability patterns across the codebase.
SQL injection: Are database queries built with string concatenation anywhere, or is parameterisation / prepared statements used consistently? Show me any examples of string-interpolated queries.
XSS: If there's a frontend, is user-supplied content ever rendered as raw HTML? Look for dangerouslySetInnerHTML in React, innerHTML assignments, document.write, or template engines with unescaped output (e.g. {{{ }}} in Handlebars, |safe in Jinja2).
Input validation: Is user input validated and sanitised at the boundary (API endpoint / form handler) before being passed into business logic, or does raw user input travel deep into the application?
File uploads: If the app handles file uploads, is there file type validation beyond just checking the extension? Is the upload directory outside the webroot?
Command injection: Are there any uses of exec(), shell_exec(), subprocess.run() or similar with user-controlled input?
For each issue: the exact pattern, file location, and the simplest fix.
For each injection risk you identified: 1. Quote the exact line(s) of code showing the vulnerability — the string concatenation, innerHTML assignment, or exec() call. 2. Confirm the file path and approximate line number. 3. For SQL injection: show the query construction code, not just the surrounding function. 4. For XSS: show the template or JSX where unescaped output occurs. 5. For any finding where you said 'look for X' but did not find a concrete example — remove it from the list. Final output: a verified list with one direct code quote per finding. No inferences.
Scan this codebase for hardcoded secrets and sensitive data exposure risks. 1. Are there any API keys, passwords, connection strings, private keys, or auth tokens hardcoded in source files — including test files, config files, seed scripts, and comments? 2. Are .env files committed to the repository? Check .gitignore carefully. 3. Are credentials ever passed as URL query parameters? (These end up in server logs, browser history, and referer headers.) 4. Is sensitive data written to logs? Look for logging of request bodies, full error objects, or user-supplied data that might contain passwords or tokens. 5. Are stack traces or internal error details ever returned in API responses to clients? 6. Is there a .env.example or equivalent? Is it up to date and does it document all required variables without containing real values? Note: do not output actual secret values if you find any. Name the type and location only.
For each secret or credential exposure you identified: 1. Confirm the file exists and quote the surrounding context (not the actual secret value — just enough to show the pattern). 2. For any .env file committed to the repo: quote the .gitignore entry that should exclude it, or confirm it is absent. 3. For any logging risk: quote the exact log statement. 4. For any finding you cannot verify with a direct code reference, remove it. Also confirm: is there a .env.example file? Quote its first 5 lines if it exists.
Check how HTTP security headers are configured in this application. Look in: Express middleware setup, Django settings (SECURE_* settings, SecurityMiddleware), Spring Security config, nginx/Apache config, Caddy config, any custom header middleware, or Next.js headers config. For each of the following headers, tell me whether it's set, and if so, what the value is: 1. Content-Security-Policy — prevents XSS and data injection attacks 2. X-Frame-Options or CSP frame-ancestors — prevents clickjacking 3. X-Content-Type-Options: nosniff — prevents MIME-type sniffing 4. Strict-Transport-Security (HSTS) — enforces HTTPS 5. Referrer-Policy — controls what's sent in the Referer header 6. Permissions-Policy — restricts access to browser APIs (camera, microphone, geolocation) For any that are missing or misconfigured: - What attack does the missing header enable? - What's the exact header line to add to fix it? Then give me a copy-paste block of all recommended headers I can add to the server config right now.
For each HTTP header finding: 1. Quote the exact middleware, config block, or settings file where you looked for the header. 2. If a header is present, quote the line that sets it. 3. If a header is absent, quote the section of config where it would normally be set, confirming it is not there. 4. Re-generate the copy-paste header block, but this time confirm each header value is appropriate for this specific framework and stack. Do not recommend headers that would conflict with existing config you found.
Assess the data handling practices in this codebase from a GDPR / privacy perspective. 1. What personal data is stored? Look at database models, schema files, and any data structures with fields like email, name, ip_address, user_id, phone, address, or similar. 2. Is there a clear mechanism for data deletion? If a user requested erasure of all their data, could we find and delete everything — including logs, caches, analytics, and backups? 3. Are there data retention policies in code, or is everything stored indefinitely by default? 4. Is personal data transferred to third-party services (analytics, error tracking, CRMs, ad platforms)? Are those transfers clearly necessary and minimal? 5. Is access to personal data logged — audit trails for who accessed what and when? 6. Is PII (personally identifiable information) appearing in places it shouldn't — logs, error reports, analytics events, or search indexes? Top 3 areas most likely to create a GDPR compliance issue.
For each GDPR risk you identified: 1. Quote the database model, schema definition, or data structure that contains the personal data field you flagged. 2. For any deletion mechanism concern: quote the relevant deletion function or confirm its absence by searching for 'delete', 'destroy', 'remove' in the relevant model/service file. 3. For any third-party data transfer: quote the SDK initialisation or API call. 4. Remove any finding that cannot be tied to a specific code location.
Look at our CI/CD pipeline configuration. Check .github/workflows/, .gitlab-ci.yml, Jenkinsfile, Dockerfile, docker-compose.yml, or any equivalent CI config files you can find. For each pipeline / workflow found: 1. How many manual steps or human approvals are required before code reaches production? 2. What quality gates are present before deploy? (tests, security scan, lint, type check, coverage threshold) What gates are missing? 3. What does the rollback process look like — is it automated, documented, or does it exist only as tribal knowledge? 4. Are environment configs and secrets handled safely, or are there any hardcoded values? 5. How are different environments managed — is there a clear dev / staging / prod progression, or does code go straight to prod? Final question: if a developer pushed a breaking change right now and all tests passed, what is the most likely path for that bug to reach production users undetected?
For each pipeline gap you identified: 1. Quote the relevant section of the CI config file (GitHub Actions step, GitLab job, Jenkinsfile stage) that demonstrates the missing gate or manual step. 2. If a quality gate is absent, quote the jobs or steps block where it should appear but does not. 3. For the rollback process: quote any rollback steps present, or confirm by searching for 'rollback', 'revert', 'undo' in the pipeline config and reporting what you find. 4. For any hardcoded value: quote the exact line. Correct any pipeline file path that does not exist in this repo.
Find all Dockerfiles and container configuration in this project. For each Dockerfile: 1. What base image is used — is it pinned to a specific digest or using :latest / a floating tag? 2. Is this a multi-stage build? If not, the final image is probably carrying build tools and dev dependencies into production. 3. Is the application running as root inside the container? (Look for USER instruction — absent means root.) 4. Are any unnecessary ports exposed? 5. Is the layer ordering optimised — dependencies installed before source code, so rebuilds are fast? 6. Is the base image on a currently supported OS / runtime version? For any docker-compose or Kubernetes manifests: - Are resource limits (CPU, memory) defined? - Are health checks configured? - Are there hardcoded credentials in environment variable definitions? Prioritised fix list with effort estimate for each.
For each Dockerfile finding: 1. Quote the exact FROM line for the base image. 2. Quote the USER instruction — or confirm it is absent (meaning root). 3. For layer ordering issues: quote the COPY and RUN lines in the order they appear. 4. For any exposed port concern: quote the EXPOSE instruction. 5. For any docker-compose finding: quote the relevant service block. Confirm the file path of each Dockerfile you reviewed.
Find any infrastructure-as-code in this repo. Look for Terraform (.tf files), Pulumi, CDK, CloudFormation, Ansible, or Kubernetes manifests. If IaC exists: 1. Is it complete — could you fully recreate the production environment from these files alone? What appears to be missing? 2. Are configurations parameterised or are values like instance sizes and replica counts hardcoded? 3. Are provider versions pinned to avoid unexpected upgrades? 4. Is there any sign of configuration drift — places where a comment says "TODO: add this to IaC" or manual steps in a README? 5. Is there a clear separation between environments (dev/staging/prod)? If no IaC exists: - What infrastructure components can you infer are being managed manually from the codebase? - What's the risk profile if the person who manages that infrastructure is unavailable?
For the IaC assessment: 1. List every .tf, .yaml (Kubernetes), or equivalent file you found and its path. 2. For any completeness gap: quote the resource or component that appears to be managed outside IaC — infer from README references, scripts, or hard-coded endpoint URLs. 3. For any hardcoded value concern: quote the specific line. 4. If no IaC was found: quote the lines from README or scripts that reference manual infrastructure steps. Remove any gap assessment that is inference without evidence.
Look at how the application is instrumented for observability — logging, metrics, tracing, and alerting. Logging: 1. Is there structured logging (JSON output) or plain text strings? Structured logs are queryable; plain strings are not. 2. Are log levels used appropriately (ERROR for errors, DEBUG for verbose output that's off in prod)? 3. Are errors logged with enough context to diagnose from the log alone — or just "error occurred"? Metrics & tracing: 4. Are there application metrics beyond what the hosting platform provides? (request latency, error rates, queue depths, business metrics) 5. Is distributed tracing set up across services if applicable? (OpenTelemetry, Datadog APM, etc.) Alerting: 6. Is there evidence of alerting configuration in the codebase or infra config? Final question: if the main API started returning 500 errors at 2am right now, how long would it take the team to know, and what would they be working with to diagnose it?
For each observability finding: 1. Quote an example log statement that demonstrates the logging format (structured or plain text). 2. For any error-handling concern: quote an actual error handler or catch block. 3. For metrics/tracing: quote the instrumentation setup code, or confirm its absence by searching for 'opentelemetry', 'datadog', 'prometheus', 'metrics' and reporting what you find. 4. Answer the 2am question with specific evidence: what files would an on-call engineer look at, and what information would be available to them?
Analyse the top-level structure and architecture of this codebase. 1. How is the code organised — by technical layer (controllers/models/views), by domain/feature, or something else? 2. Are there clear module or service boundaries? Can you identify where one domain ends and another begins — e.g. is "billing" cleanly separated from "user management"? 3. Look at the import graph: are there any modules that most of the codebase imports from? Name them — these are your most fragile shared dependencies. 4. Are there circular dependencies? List the most significant. 5. Is this effectively a monolith, a modular monolith, or a service-oriented architecture? Be direct. 6. If we wanted to extract one domain into its own service, which would be easiest and why? Which would be hardest? Honest verdict: if the team needed to onboard a new senior engineer and have them making changes safely within 2 weeks, is the codebase structure currently set up to allow that?
For the architecture assessment: 1. Quote the top-level directory structure output (equivalent to `ls src/` or similar). 2. For any circular dependency claim: name both modules and quote the import line in each that creates the cycle. 3. For the 'most central shared module': quote a sample of import statements from 3 different files that import it. 4. For the extraction difficulty assessment: quote the module's public interface or export list to justify your answer. Revise your architecture verdict if evidence does not support it.
Find the most complex and problematic files in this codebase. Look for: - Files over 500 lines of logic (not counting blanks/comments) - Functions or methods over 50 lines - Functions with deeply nested conditionals (4+ levels) - Files importing from a very high number of other modules - "God objects" — classes or modules handling too many unrelated responsibilities For the top 5 most problematic: 1. Name the file and describe what it actually does 2. What makes it complex (size / nesting / too many responsibilities)? 3. What would break, and how widely, if a bug were introduced here? 4. What's the first refactoring move — the smallest safe change that starts improving it? Also: find functions that clearly do multiple things but are named as if they do one. List the worst 3 examples.
For each complexity hotspot: 1. Quote the function signature and opening lines of the most complex function in the file. 2. Give the actual line count of the file (use your file reading capability). 3. For nesting depth claims: quote the nested block to show the actual depth. 4. For 'god object' claims: quote the class or module export list showing the breadth of responsibilities. Remove any file from the top 5 if you cannot provide a direct code quote supporting the claim.
Look for meaningful code duplication across the codebase — not stylistic repetition, but duplication that carries real bug risk. 1. Business logic appearing in multiple places — validation rules, pricing calculations, permission checks, data transformations that should live in one canonical location. 2. Utility functions reimplemented multiple times (date formatting, error handling, string manipulation, API response shaping). 3. Copy-pasted blocks that differ only in variable names or minor parameter changes. 4. Similar API handlers or database queries that could be abstracted into a shared function. For each significant duplication: - Show where it appears (files, rough line numbers) - What's the bug risk? (e.g. "if the pricing logic changes in A, it won't change in B, causing silent incorrect calculations") - What's the simplest abstraction that consolidates it? Focus on the 5 duplications with the highest risk of causing a real production bug.
For each duplication finding: 1. Quote the duplicated block from location A and the corresponding block from location B, side by side. 2. Highlight the specific lines that differ between the copies. 3. For business logic duplication: confirm the logic is semantically equivalent — not just visually similar. 4. Remove any duplication finding where you cannot produce quotes from both locations. Re-assess the bug risk for each item based on the actual code, not the description.
Help me find dead code, unused dependencies, and technical clutter in this project. 1. Exported functions, classes, or modules that don't appear to be imported anywhere in the codebase. 2. Feature flags or toggles that are permanently set and could be cleaned up. 3. Commented-out code blocks — especially ones that look old (alternative implementations, "just in case" backups). 4. In dependency manifests (package.json, requirements.txt, go.mod, etc.): packages that don't appear used in the source. 5. TODO / FIXME / HACK / XXX comments — list them all and estimate how long they've been there if context allows. 6. Deprecated internal functions or APIs that still have callers. Split your findings into two lists: - SAFE TO DELETE: removals that are clearly dead and carry no risk - INVESTIGATE FIRST: things that look unused but might be called dynamically or externally
For the dead code findings: 1. For each exported-but-unused item: confirm it is not imported anywhere by searching the codebase for its name. Quote the export line. 2. For each unused dependency: confirm it does not appear in any source file (search for import/require of the package name). Quote the package.json/requirements entry. 3. For commented-out blocks: quote the first and last line of the block with the file path. 4. Move any item to 'INVESTIGATE FIRST' if your search was not exhaustive.
Identify the knowledge concentration risk in this codebase — modules where the bus factor is effectively 1. Look for: 1. Complex, business-critical modules with no inline documentation, no README, and unusual or clever patterns that would take significant time to understand. 2. Code that appears to have been written by someone who understood the problem deeply but didn't explain their thinking — "write-only code". 3. Systems with bespoke integrations, custom protocols, or unusual data formats that aren't documented anywhere. 4. Areas where the comments say "don't touch this" or equivalent. For each high-risk area: - Name it and describe what it does - Documentation quality: none / sparse / adequate - Consequence: what would a 1-week absence of the person who understands this look like in practice? - Minimum viable doc: what 3-5 things would need to be written down to meaningfully reduce the risk?
For each bus factor risk: 1. Quote a representative section of the code that demonstrates the complexity or lack of documentation. 2. For documentation absence: confirm by reading the file header and any associated README — quote what is there, or confirm nothing is. 3. For 'write-only code' claims: quote a specific function that demonstrates the pattern. 4. Revise your consequence assessment based on actual code complexity, not assumed importance.
Identify the 5 most business-critical code paths in this codebase — things like payment processing, authentication flows, data writes, core business logic calculations, or primary customer-facing features. For each critical path: 1. What tests exist that cover it — unit, integration, or E2E? 2. What are the obvious failure scenarios with no corresponding test? 3. Is the happy path tested but edge cases ignored? What edge cases are missing? 4. Are there any critical paths with zero test coverage at all? Then answer directly: which single untested failure scenario, if it happened in production tonight, would be most damaging to the business? That's the test to write first. Tell me exactly what it should test and what framework/pattern to use given the existing test setup.
For each critical path coverage gap: 1. Name the specific test file(s) that do or don't cover this path, and quote a representative test (or confirm no test file exists). 2. For the path you identified as most dangerous: quote the actual business logic code it exercises. 3. For the test you recommended writing: confirm the test framework and pattern used in the existing test suite, and adjust your recommendation to match. 4. If coverage tooling config exists (jest.config.js, .coveragerc, etc.), quote the coverage threshold setting.
Review the existing test suite for quality issues. Look in test/, __tests__/, spec/ or files matching *.test.* or *.spec.*. Specifically look for these anti-patterns: 1. Tests that only assert a function was called — mock-heavy tests with no assertion on what the function returned or what state changed. 2. Tests that re-implement the logic they're testing, meaning they'll pass even if the logic is wrong. 3. Tests with no assertions, or with assertions that can never fail (assert(true), expect(x).toBeDefined() where x is always defined). 4. Excessive mocking — tests that mock so much of the environment they can't catch real integration failures. 5. Flaky tests — anything with setTimeout, sleep(), or external service dependencies without mocking. 6. Test names that don't describe behaviour: "it works", "test1", "should be correct". For each pattern: give a specific example from the codebase and describe which class of bug it would fail to catch. Rough estimate: what percentage of the current test suite is providing genuine safety net value vs. maintenance overhead?
For each test anti-pattern: 1. Quote the specific test that demonstrates the pattern — show the actual test code, not a description. 2. For mock-heavy tests: quote the mock setup and the assertion to show the assertion tests only mock invocation. 3. For always-passing assertions: quote the exact assertion line. 4. Revise your 'percentage providing genuine value' estimate based on the specific examples you've found, not a general assumption.
Look at how integration and end-to-end testing is set up in this project. 1. Are there integration tests that test multiple components working together, or is everything unit-tested in isolation? 2. Are there E2E tests? (Playwright, Cypress, Selenium, or equivalent.) If so, what user flows do they cover? 3. Are external service integrations (APIs, databases, queues) tested with real dependencies in any environment, or always mocked? 4. Are there contract tests that verify the interface between services — so a breaking change in one service fails a test in the other before it reaches production? 5. What's the most critical user flow with no E2E test coverage? Scenario test: if our last deployment broke the primary data flow between two core services (e.g. orders not reaching fulfilment), would our test suite have caught it before going live? Walk through why or why not.
For the integration test assessment: 1. List every integration or E2E test file you found with its path. 2. For any E2E test coverage claim: quote the test description string (it/describe/test name) that covers the flow. 3. For the cross-service scenario: trace the actual code path — name each file involved from the entry point to the data destination. 4. Confirm whether a test database or test environment is configured by checking for test-specific config files.
Assess how the test suite is integrated into the development and deployment workflow. 1. Do tests run automatically on every pull request? Is it enforced — can a PR be merged with failing tests? 2. How long does the full test suite take? Is there parallelisation, or does everything run sequentially? 3. Are tests separated into fast (unit) and slow (integration/E2E) tiers, so fast feedback is prioritised? 4. Is test output actionable — does a failure tell you immediately what broke and why, or does it require investigation? 5. Are any tests skipped, marked as expected failures (xfail, xit), or disabled in CI config? List them. 6. Is code coverage reported and tracked over time? Is there a minimum coverage threshold that blocks merges? Bottom line: if a developer was under pressure and wanted to skip tests, how easy would it be — and what would stop them?
For the CI test enforcement assessment: 1. Quote the PR/merge request configuration that shows whether tests are required or optional (branch protection rules, CI required checks). 2. Quote the CI job that runs tests — show the actual run command. 3. For any skipped or disabled tests: quote the skip annotation or xfail marker. 4. For coverage configuration: quote the coverage threshold setting, or confirm it does not exist. 5. Answer the bypass question with a specific sequence of steps based on the actual config you found.
Read the dependency manifest(s) in this project (package.json, requirements.txt, Pipfile, Cargo.toml, go.mod, build.gradle, pom.xml — whatever applies). For the full dependency list, assess: 1. Which dependencies are significantly behind their current major version? 2. Which runtime or framework version (Node, Python, Java, Go, etc.) are we on, and is it still receiving security patches? 3. Are there any dependencies that are no longer maintained — abandoned npm packages, archived GitHub repos, packages with no commits in 2+ years? 4. Which dependency, if it published a breaking update or was compromised, would cause the most widespread damage? Give me a tiered output: RED — update immediately (EOL, known CVE, or abandoned) AMBER — plan an update this quarter (significantly behind, no security patches) GREEN — fine as-is, review in 6 months
For each dependency in your RED and AMBER tiers: 1. Quote the exact version string from the manifest file (package.json, requirements.txt, etc.). 2. For EOL runtime/framework claims: confirm the EOL date from your training data and state your confidence level. 3. For abandoned package claims: describe the evidence (last release date if known, or GitHub archive status). 4. Remove any item from RED/AMBER that you cannot support with version evidence from the manifest. Re-check: are any version ranges specified (^, ~, >=) that could mean the installed version is actually newer?
Identify every major framework and language version in use in this codebase. For each: 1. What version are we on? 2. What is the current stable version as of your knowledge cutoff? 3. Is our version still receiving security patches? 4. What practical friction does being on this version cause day-to-day? (Missing APIs we're working around, slower build tools, patterns that modern versions have improved) 5. What would migrating to the current version require? Minor configuration change, significant code changes, or a substantial refactor? Then: which single framework or runtime version gap is causing the most day-to-day friction for developers? What would it concretely take — in engineer-days — to close that gap?
For the framework version assessment: 1. Quote the exact version declaration for each framework from the manifest or config file. 2. For each friction point you described: quote a code pattern or workaround in the codebase that demonstrates that friction exists. 3. For the migration estimate: identify at least 3 specific code patterns that would need to change, and quote an example of each. 4. If you stated a version is EOL, confirm the support end date from your training data and flag if uncertain.
Assess whether the core tech stack choices in this codebase still fit the current product needs. Database: Is the data model and query pattern a good fit for the database being used? Signs of mismatch: storing JSON blobs in a relational DB to avoid schema changes, doing relational queries in a document store, or a large number of N+1 query workarounds. Caching: Is there a caching layer? Does it look appropriately placed for the read/write patterns visible in the code? Async & background jobs: How are background tasks handled? Does the approach look appropriate for the current load, or are there signs of strain (polling where events would be better, synchronous operations that should be async)? API layer: REST, GraphQL, gRPC, or a mix? Is the choice appropriate or are there workarounds — e.g. massive REST responses being filtered client-side because there's no query language? For each area: does the current choice look like it was right for an earlier stage but is now causing friction?
For the stack fit assessment: 1. For any database mismatch claim: quote the schema definition or model and the query pattern that demonstrates the mismatch. 2. For caching concerns: quote the cache configuration or confirm its absence by searching for 'redis', 'memcached', 'cache' in config files. 3. For async/background job concerns: quote the job definition and the dispatcher/enqueuer code. 4. Revise any 'early stage but now friction' claim that you cannot support with a quoted code example.
Find every place in this codebase where we call an external API or depend on a third-party service. For each external integration: 1. Name the service and what we use it for. 2. Is there error handling and a graceful degradation path if the service is unavailable? 3. Is there retry logic with exponential backoff? A circuit breaker? 4. Is rate limit handling implemented, or would a surge in our traffic cause us to hit limits silently? 5. If this integration failed completely right now, what would users be unable to do? Rank the integrations by their failure impact. Then: which single integration failing would be most damaging, and how well-defended are we against that failure? What would a minimum viable resilience improvement look like for it?
For each integration resilience concern: 1. Quote the HTTP client initialisation or SDK setup for each integration you listed. 2. For error handling absence: quote the try/catch or equivalent block, or show the call site with no error handling. 3. For retry logic: search for 'retry', 'backoff', 'attempt' near each integration and quote what you find. 4. Confirm the integration ranking by re-stating what each integration is used for, based on the code you read.
Assess the type coverage and type quality in this codebase.
If TypeScript:
1. Is strict mode enabled in tsconfig.json? If not, what flags are missing?
2. What percentage of function signatures appear to use 'any' or leave types implicit?
3. Are there significant uses of type assertions (as X) that bypass type safety?
4. Are shared data shapes defined as interfaces/types, or are they inlined and repeated?
If Python:
1. Are type hints present on function signatures? What's the rough coverage?
2. Is mypy or pyright configured? What's the current error count if you can infer it?
3. Are dataclasses, Pydantic models, or TypedDicts used for structured data, or are plain dicts passed everywhere?
If Java/Go/other typed language:
1. Are generics used appropriately, or are there raw types / interface{} / any patterns that lose type information?
AI readiness verdict: on a scale of 1–5, how confidently could an AI agent understand the data flowing through this codebase from types alone? What's the single change that would improve that the most?
For the type coverage assessment: 1. Quote the tsconfig.json strict settings, or the mypy/pyright config if Python. 2. For 'any' usage claims: quote 3 specific examples with file paths. 3. For untyped function signatures: quote 3 examples. 4. Revise your AI readiness score (1–5) based on the specific evidence you've now quoted — not a general impression. 5. For the improvement recommendation: quote the specific file or config change needed.
Assess how readable and self-describing this codebase is — specifically from the perspective of an AI agent trying to understand it with no prior context. 1. Naming quality: Are function, variable, and class names self-explanatory? Give examples of the best-named and worst-named things in the codebase. 2. Abbreviations and jargon: Are there domain-specific abbreviations or internal jargon that appear without explanation? List the ones that appear most frequently. 3. Context density: Do functions and modules have enough context in their naming and structure that you can understand what they do without reading the implementation? 4. File and module discoverability: If an agent needed to find "where billing logic lives" or "where user permissions are enforced", how easily could it find the right file from the directory structure and naming alone? 5. Comment quality: Are inline comments explaining 'why' (the hard part) or just 'what' (redundant)? Are there any areas where the logic is so non-obvious it needs a comment but doesn't have one? Rate the overall "AI navigability" of this codebase: 1 (agent will frequently make changes in the wrong place) to 5 (agent can navigate confidently from names alone).
For the naming and discoverability assessment: 1. Quote your 3 best-named and 3 worst-named functions or modules with their file paths. 2. For each abbreviation or jargon term you flagged: quote 2 usage sites in the codebase. 3. For the 'find billing logic' test: describe the actual files you found when searching for billing/payment logic — was your prediction correct? 4. Quote 2 examples of inline comments — one that explains 'why', one that only explains 'what'. Revise your navigability score based on this evidence.
Assess how safely an AI coding agent could make changes to this codebase — specifically looking at modularity and change isolation. 1. Are there clear module boundaries that define what is public API vs. internal implementation? Or is everything accessible from everywhere? 2. Can you make a change to one module with confidence that it won't affect unrelated modules? Identify the areas where this is least true. 3. Are there "action at a distance" patterns — global state, singletons, or event systems where a change in one place has non-obvious effects somewhere else? 4. Are interfaces or abstract types used to decouple implementations, or are concrete types passed directly everywhere? 5. If an AI agent were asked to "add a new field to the user model", how many files would it likely need to touch, and how hard would it be to find all of them? Final assessment: what is the single architectural pattern in this codebase most likely to cause an AI agent to make a plausible-looking but incorrect change?
For the modularity and change-safety assessment: 1. For any global state claim: quote the variable declaration and 2 sites that mutate it. 2. For the 'add a field to the user model' test: actually trace this — name every file that imports or uses the user model, and quote the import line in each. 3. For interface/concrete type coupling: quote a function signature that takes a concrete type where an interface would be safer. 4. Revise the 'most dangerous pattern for an AI agent' based on the evidence you've traced.
Assess how well set up this codebase is for AI-assisted development with tools like Claude Code, Copilot, or Cursor. 1. Test coverage as a feedback loop: can an AI agent run tests locally to verify its changes are correct? Are tests fast enough for this to be practical? 2. CLAUDE.md or equivalent: is there a project-level instructions file that tells an AI agent about conventions, how to run the project, and what areas to be careful with? 3. Linting and formatting: are there configured linters and formatters (ESLint, ruff, golangci-lint, etc.) that an agent could run to verify code style? 4. Type checking: is there a type checker that can be run as a validation step? 5. Local development environment: can the project be run locally in a single command? Is the setup documented? Complex local environments significantly reduce AI agent effectiveness. Give me a score out of 10 for "AI agent readiness" and a list of the top 3 changes that would make AI-assisted development most effective on this codebase today.
For the AI workflow readiness assessment: 1. Run the test suite command you identified — quote the test run command from package.json scripts or equivalent. 2. Confirm whether CLAUDE.md, .cursor/rules, .github/copilot-instructions.md, or equivalent exists — quote the first 10 lines if it does. 3. Quote the linter and formatter configuration (eslint config, ruff config, .prettierrc, etc.) or confirm absence. 4. Quote the local dev startup command from README or package.json. Revise your score out of 10 based on this verification.
Analyse the frontend component structure. Look in src/components/, src/views/, src/pages/, or wherever UI components are defined. 1. Are there "mega-components" — components over 300 lines that handle data fetching, business logic, and rendering all in one place? 2. Is there a clear separation between presentational components (purely visual, receive props) and container/smart components (handle data and logic)? 3. Are components reused, or is copy-paste the pattern for minor variations? 4. Is there prop drilling — data passed through many component layers that should be in context or a state store? 5. Are component names accurate to what they actually do? For the worst 3 components: name them, describe what they currently do, and describe what they should ideally be split into. Then: what is the component that, if refactored, would have the most positive downstream effect on the rest of the frontend?
For each component concern: 1. Quote the component file's import section and the opening of its render/return block for each flagged component. 2. For size claims: give the actual line count. 3. For prop drilling: quote the prop chain — show the parent passing the prop, and the child receiving it, across at least 2 levels. 4. For the 'highest leverage refactor': quote the component's current exports and describe the proposed split in terms of specific new file names.
Analyse how application state is managed in this frontend. 1. What state management solution(s) are in use? (Redux, Zustand, Context, MobX, Pinia, Jotai, etc.) Is more than one approach used inconsistently? 2. Is server state (API data) mixed with UI state in the same store? This conflation is a major source of complexity — server state should be managed separately (React Query, SWR, TanStack Query). 3. Is the state shape well-organised, or has it grown into a flat object containing everything the app has ever needed? 4. Are there race conditions — places where two async operations update the same slice of state and the order isn't guaranteed? 5. Could a new developer understand the state model by reading the store definition alone? For the most complex part of the state: describe what it does, what could go wrong with it, and what a simpler approach would look like.
For the state management assessment: 1. Quote the store definition (Redux slice, Zustand store, etc.) — specifically the shape of the most complex state slice. 2. For server/UI state conflation: quote a specific example where API data and UI state (loading, modal open, etc.) are stored in the same reducer or store key. 3. For race condition claims: quote the two async operations and the shared state they both update. 4. For the simplification proposal: quote the current implementation and describe what the simplified version would look like concretely.
Look at the routing and navigation structure of this frontend application. 1. Map the route structure — how many distinct routes are there? Is the URL structure logical and predictable (can a user guess the URL for a given page)? 2. Are there routes with complex conditional rendering — the same URL showing different UIs depending on user state, permissions, or feature flags? 3. Is navigation state in the URL (querystring, path params) or held in memory? Where does this cause problems when users share links or refresh? 4. Are there multi-step flows (onboarding, checkout, wizards)? How is progress state managed across steps — and what happens if a user navigates back or refreshes mid-flow? 5. Are there any obvious accessibility issues in navigation? (No focus management on route change, no skip-to-content link, no ARIA landmark roles on main navigation) From a user's perspective: based on the route structure and conditional logic you can see, what is the most confusing navigation experience this application currently produces?
For the routing and navigation assessment: 1. Quote the route definition file — show the full route table or router configuration. 2. For conditional rendering complexity: quote the specific route or component that has the most conditional logic. 3. For multi-step flow concerns: quote the state persistence mechanism used between steps. 4. For accessibility issues: quote the route change handler (or confirm it does not manage focus).
Look at the frontend build configuration and code for performance issues. 1. Build config (webpack.config.js, vite.config.js, next.config.js): is code splitting enabled? Are there obvious large vendor bundles that could be split? 2. Are heavy libraries imported in full when only specific functions are needed? (e.g. import _ from 'lodash' vs import debounce from 'lodash/debounce') 3. Is lazy loading used for routes, large components, and heavy third-party widgets? 4. Are images handled through the build pipeline (optimised, WebP conversion, responsive sizes)? 5. Are there any synchronous operations or render-blocking scripts in the critical path? 6. Without running a bundle analyser, estimate from the dependency list: which 3 packages are likely contributing the most to bundle size? What is the single change most likely to produce a meaningful improvement in page load time for a user on a mid-range device?
For the frontend performance assessment: 1. Quote the relevant section of the build config (code splitting config, output config). 2. For full-library import claims: quote the exact import line. 3. For lazy loading absence: quote 3 route definitions that are not lazily loaded. 4. Confirm your top 3 bundle size contributors by quoting their entries in package.json with their version numbers.
Based on everything you've seen in this codebase during our session, give me the top 3 tech debt items meeting ALL of these criteria: - Completable by 1–2 engineers in a single sprint (≤10 days) - Addresses a real risk: security, production stability, or significant developer velocity loss - Does not require a large architectural change as a prerequisite - Will have a noticeable positive impact if done For each of the 3: 1. Name it specifically — not "improve test coverage" but "add tests for the payment webhook handler in /src/webhooks/stripe.ts" 2. What exactly needs to be done? (3–5 bullet points) 3. Why this one? What's the concrete risk of not doing it? 4. Effort estimate: hours or days? 5. Who should own it? (role, not name) Format as three numbered items. No softening, no caveats. Just the three things.
For each of your 3 quick wins: 1. Quote the specific code, file, or config that the fix applies to — confirm it exists at the path you stated. 2. Revise your effort estimate if the code you've now read is more or less complex than you initially assessed. 3. For any finding that turns out not to exist when you look directly at the file, remove it and replace with the next highest-value item. Final output: 3 verified, evidence-backed items only.
Based on your full review of this codebase, answer this question directly: What is the one issue that, if it caused a production incident or security breach in the next 30 days, would be most damaging to the business? Give me: 1. The specific issue — name the exact file, system, or pattern 2. The failure scenario: what would have to happen for this to blow up in production? 3. Business impact: what would a user or customer experience? Rough cost estimate? 4. Minimum viable fix: the smallest change that meaningfully reduces the risk (could be done this week) 5. Proper fix: what the complete solution looks like if given 2–4 weeks Don't soften this. If something looks genuinely dangerous, say so clearly.
For the single biggest risk: 1. Quote the specific code that creates the risk — the exact lines, not a file-level description. 2. Trace the failure scenario step by step through the actual code: entry point → vulnerable code → impact. 3. For the minimum viable fix: quote what the code should look like after the fix. 4. If you cannot trace the failure scenario through actual code, reassess whether this is the biggest risk or if you should nominate a different one you can evidence.
Based on your complete review of this codebase, produce a 90-day tech debt reduction plan. PHASE 1 — Weeks 1–2 (Stabilise & Secure) Fastest wins that reduce active risk. Things doable without architectural changes. PHASE 2 — Weeks 3–6 (Improve Foundations) Structural improvements that make everything else faster: dependencies, test coverage, removing the worst complexity hotspots. PHASE 3 — Weeks 7–12 (Modernise) Bigger refactors and migrations that need planning but reduce long-term maintenance burden. For each phase: - 3–5 specific tasks (named files / systems / patterns — not categories) - Team size and rough time needed - Measurable outcome: how will we know this phase worked? Close with: if we could only complete ONE of these three phases due to resource constraints, which delivers the most value and why?
For the 90-day plan: 1. For each task in Phase 1: confirm the file or system exists as you described it. 2. For each measurable outcome: state the current baseline (e.g. current test coverage %, current number of RED dependencies) based on what you found in this session. 3. Revise any effort estimate where the code you've read during this session suggests your initial estimate was off. 4. For the 'single phase' recommendation: justify it with 3 specific findings from this session, not general principles.
SolveFast Labs offers a free 1-hour tech debt assessment. We'll run these prompts with you, interpret the output, and give you a prioritised fix list — no slides, no pitch, no obligation. We've done this across Java-to-Spring migrations, Python modernisations, and greenfield Go builds.