
Study design and research governance
The guidance below highlights impactful applications of AI during study design and research governance, with practical considerations and prompt templates you can copy and paste to adapt to your own needs.
Aim: Lock in key decisions early so your study is easier to trust, repeat, and explain.

Support with writing your plan before you analyze the data (pre-registration) (hypotheses, primary/secondary outcomes, inclusion/exclusion criteria, analysis plan).
Spot ‘choice points’ where different decisions could change results (researcher degrees of freedom) and set constraints in advance (e.g., decision rules, stopping rules).
Create a ‘how we handle data’ plan to ensure data minimization, access controls, appropriate retention and documentation.
AI can support planning, but it cannot validate what is true or ethically appropriate - you remain accountable for decisions.
Sensitive data includes personal data, confidential peer review content, unpublished results, proprietary datasets, and data covered by consent or data-sharing agreements - if unsure, treat it as sensitive and don’t paste it into an AI tool.
Record key design decisions and AI support (tool, prompts, date, what you verified).
📑 Copy and paste prompt: building a structured plan |
|---|
You are my preregistration drafting assistant. Project topic (non-sensitive): [insert brief topic]. Study type: [RCT / observational / MR / qualitative / etc]. My draft ideas: [insert]. Task: Produce a preregistration outline with headings and bullet points for: - Research question and hypotheses - Primary and secondary outcomes (operational definitions) - Inclusion/exclusion criteria - Sample size rationale / power approach (no calculations if data missing—flag gaps) - Data collection plan - Analysis plan (models/tests, covariates, handling missing data, adjustments) - Sensitivity analyses - Deviations policy (how we will document changes) Rules: Do not invent methods or citations. If information is missing, write ‘Not specified’ and list the decisions I must make. Flag any areas where my study type (e.g., qualitative, exploratory) may not fit the standard preregistration structure, and suggest appropriate alternatives. |
📑 Copy and paste prompt: choice points/researcher degrees of freedom scan |
|---|
Act as a ‘researcher degrees of freedom’ auditor. Here is my current plan (paste summary, no sensitive data): [paste] Identify: 1. Choice points where different decisions could change results (e.g., exclusions, transformations, outliers, subgrouping, stopping rules, covariate selection). 2. The risk each choice point introduces (bias/overfitting/p-hacking/interpretability). 3. A concrete pre-commitment rule for each (decision rule wording I can preregister). Output as a table: Choice point | Risk | Pre-commitment rule | Notes. |
📑 Copy and paste prompt: analysis plan 'stress test' |
|---|
Stress-test my planned analysis for robustness. Plan summary (no sensitive data): [paste]. Generate: - Top 10 ways this analysis could mislead (assumption violations, confounding, leakage, multiplicity, missingness, model misuse). - For each, a mitigation (design change, diagnostic, sensitivity analysis). - Tailor these to [field/method] if you can. Otherwise, keep generic and flag where discipline-specific risks may exist - Rules: Keep it methodological and generic. No fabricated references. |
⚠️ Important: This is a text-based reasoning task where the LLM may miss real issues and flag non-issues. The stochastic nature of outputs means different runs may produce very different lists. Run this prompt two or three times and compare results. Issues that appear consistently across runs are more likely to be genuine concerns |
📑 Copy and paste prompt: ethics and privacy risk surfacing |
|---|
You are an ethics and privacy risk surfacing assistant (not an ethics approver). Context (high level, non-identifiable): [paste]. Data types involved: [personal data / health / location / qualitative interviews / etc]. Task: List potential ethical/privacy risks, affected groups, and mitigations, including: - Consent and participant expectations - Re-identification risk - Vulnerable populations - Incidental findings / harms - Data access and sharing risks - Bias and fairness concerns Output: Risks + mitigations + questions I must answer for an ethics application. Rule: If anything sounds sensitive/unclear, tell me not to share it with an AI tool. |
⚠️ Important: This prompt is a starting point for ethics applications, not a substitute for ethics committee review |
📑 Copy and paste prompt: data management plan (DMP) generator |
|---|
Help me draft a ‘how we handle data’ plan. Study context (non-sensitive): [paste]. Data categories: [list types]. Storage environment: [local encrypted / institutional server / cloud]. Produce a Data Management Plan covering: - Data minimization (what we collect and why) - Access controls (who, how granted, audit trail) - De-identification/pseudonymization approach - Retention and deletion schedule - Data documentation (data dictionary, codebook, README) - Versioning and change control - Data sharing (what can be shared, when, under what conditions) - Incident handling (breach / mis-send / loss) Rules: Don’t assume legal requirements—flag where institutional policy or GDPR/IRB guidance is needed. |
📑 Copy and paste prompt: end-of-session record |
|---|
Create an AI-use log entry for what we did today. Include: date, stage (study design/preregistration/etc), tool name, tasks supported, what inputs I provided (types only), what outputs you produced, what decisions remained mine, and a checklist of what I must verify before using anything. Keep it brief and copy/paste-friendly. |
⚠️ Unsafe prompt example: |
|---|
Create an AI-use log entry for what we did today. Include: date, stage (study design/preregistration/etc), tool name, tasks supported, what inputs I provided (types only), what outputs you produced, what decisions remained mine, and a checklist of what I must verify before using anything. Keep it brief and copy/paste-friendly. |
Why it’s unsafe - what it implicitly encourages: - Outcome-dependent choices: ‘’Strongest support’‘ asks the model to choose methods based on results, not on a prespecified rationale. - Hidden multiplicity: It implies many plausible forks (exclusions, transformations, covariates, subgroups, models). Selecting the ‘best’ one is effectively specification searching, even if you don't intend to p-hack. - Inflated false positives / overfitting: Choosing the best-performing specification increases the chance of finding patterns that won’t replicate. - Selective reporting pressure: Once a ‘’winning’‘ spec is found, people often (unintentionally) report it as if it was the plan all along. |