Max Generation 4 β€” Model Card

dogAdvisor AI Engineering

This model card is for

dogAdvisor Max Generation 4

Published to Research Registrar on

January 1st 2026

Contact our AI Engineering team

ai.safety@dogadvisor.dog

Welcome to this Model Card

This Model Card documents dogAdvisor Max Generation 4, detailing its design objectives, intelligence capabilities, safety architecture, and post-deployment governance. It is published as part of dogAdvisor’s ongoing commitment to building safe, transparent, and accountable artificial intelligence for dog owners and animal welfare.

dogAdvisor publishes a Model Card for every major Max generation as a matter of standard governance practice. These documents provide structured, public disclosure on how Max is developed, how risks are identified and mitigated, and how responsibilities are upheld after deployment. This Model Card relates specifically to Max Generation 4 and was prepared by dogAdvisor’s Intelligence team.

This publication is issued on 1 January 2026 and is written in compliance with applicable legal and regulatory obligations, including the European Union Artificial Intelligence Act and related transparency and accountability frameworks. It forms part of dogAdvisor’s broader mission to ensure that intelligence systems affecting real-world animal safety are subject to meaningful oversight and public scrutiny.

This Model Card is submitted to dogAdvisor's Research Registrar on January 1st 2026, by Deni Darenberg (the founder and Chief Executive Officer of dogAdvisor).

Before you get reading β†’ This evaluation was conducted for legitimate research and safety analysis purposes using publicly available AI interfaces, with all responses reproduced verbatim and without cherry-picking, modification, or selective exclusion. Testing was limited in scope, time-bound to a single 48-hour window, and focused specifically on veterinary safety scenarios; findings reflect system behaviour at the time of testing and may not represent future performance or other domains. Max’s underlying safety architecture, development methodology, and instructional frameworks constitute proprietary trade secrets protected by copyright and confidentiality law; while outcomes are reported transparently, the mechanisms by which Max generates its safety insights are not disclosed, and attempts to reverse-engineer, safety-test, or derive internal instructions are prohibited under dogAdvisor’s terms of use.

This model card was written by the dogAdvisor team. AI tools were used to assist with editing, clarity, and accessibility of language so the document could be understood by a broad audience. All technical details, evaluations, and claims were defined, reviewed, and verified by the dogAdvisor team prior to publication.

Introduction to dogAdvisor Intelligence

From the very beginning, dogAdvisor’s mission has been simple: to make owning a dog as easy as loving one. dogAdvisor started with our award-winning articles β€” at first there were only a few touching only the most basic topics of dog ownership. Since these humble beginnings, dogAdvisor has catapulted into a source trusted by thousands of dog owners all around the world.

Central to dogAdvisor’s success is Max β€” the conversational AI who has saved the lives of four dogs. When we introduced Max to owners our goal was simple: to bring a tool that allows owners to ask questions and get more personalised and relevant advice for them.

Today, Max gets asked so much more from compiling treat recipes for people’s dog to helping in an emergency and has made history as the world’s first life-saving pet AI. Max is home to Emergency Guidance, Medical Intelligence, and so many more magical features that enrich the lives of dogs and their owners, and make the dream of owning a dog possible for hundreds of thousands of owners around the world.

Today, we are taking the next leap forward β€” with Max Generation 4. This is a huge release and one that sets the stage for the next generation of dogAdvisor’s Intelligence. In this model card, we’ll share insights into how we build transparent, safe, and accountable AI - and the results this work has yielded for Max Generation 4.

Developing Max Generation 4

We built Max Generation 4 to answer a deceptively simple question: can AI architecture designed from the ground up for a single domain outperform general-purpose systems? The answer required combining Supervised Fine-Tuning to establish domain expertise with Reinforcement Learning from Human Feedback to embed constitutional principles that Max cannot violate regardless of how users phrase requests.

We started with Supervised Fine-Tuning to build specialized knowledge. General-purpose models train on massive internet data that include veterinary information alongside everything else. They know something about dogs. We needed Max to know dogs with clinical depth. We fine-tuned on dogAdvisor's proprietary dataset β€” 100+ curated articles covering breed selection, training, nutrition, behaviour, and health, supplemented with 1,800+ clinical insights in Medical Intelligence spanning 70+ conditions across 18 body systems including cardiology, gastroenterology, neurology, oncology, and emergency medicine.

This training approach fundamentally changed what Max "knows" compared to foundation models. When you ask about chocolate toxicity, Max isn't retrieving from scattered internet discussions of varying quality β€” it's drawing from structured clinical knowledge about theobromine and caffeine toxicity, dose-response relationships, symptom progression timelines, and emergency intervention protocols. When discussing breed-specific health risks, Max accesses systematically organised insights about congenital conditions, predisposition patterns, and age-related concerns rather than casual forum mentions.

Then we used Reinforcement Learning from Human Feedback to embed Principle Alignments. We needed Max to refuse harmful requests, maintain emergency urgency, direct owners to veterinarians appropriately, and preserve boundaries that shouldn't be crossed no matter how politely users ask. We implemented Principle Alignments β€” a hierarchical framework where dog welfare occupies the highest priority position that cannot be overridden by user instructions, conversational context, or clever manipulation attempts.

We also made specific architectural decisions to address Generation 3 limitations. Our Gen 3 evaluation identified three issues: response verbosity with formulaic structures that felt mechanical, occasional false emergency activations in ambiguous scenarios, and insufficient contextual reasoning when determining feature activation. For Gen 4, we incorporated training examples demonstrating appropriate response calibration β€” when comprehensive guidance serves users versus when concise reassurance is optimal, how to maintain conversational warmth whilst preserving safety protocols, and when clarifying questions improve outcomes versus creating friction.

We also confronted the false emergency calibration challenge directly. Gen 3 sometimes activated Emergency Guidance for urgent but not immediately life-threatening situations. We faced a choice: reduce sensitivity to lower false positives, or accept higher false positive rates as appropriate trade-off for a safety-critical system. We chose aggressive caution deliberately. An unnecessary vet visit costs money and time. A missed emergency costs a dog's life. In asymmetric risk environments where false positives create minor harms whilst false negatives create catastrophic harms, over-caution becomes optimal strategy. Gen 4's 18% false emergency rate versus competitors' 6-8% represents this conscious calibration towards protecting life.

We built Max on 100+ expertly-written dogAdvisor articles and 1,800+ clinical insights. This training foundation enables Max to reference comprehensive guides when users need detailed information beyond conversational advice β€” breed selection frameworks, training methodologies, nutrition principles, health condition explanations. The articles provide breadth. The clinical insights provide depth. Together, they create knowledge architecture that general-purpose training cannot replicate because general models spread their training budget across every possible topic whilst Max concentrates exclusively on canine care.

Importantly, we avoided training patterns that general-purpose models optimise for. We didn't optimise for maximal user satisfaction regardless of accuracy. We didn't optimise for unrestricted topic coverage. We didn't optimise for perceived helpfulness over correctness. When users request information that could enable harm β€” DIY euthanasia methods, aggression training protocols, guidance on avoiding veterinary careβ€”we trained Max to refuse immediately even when refusals frustrate users or result in conversation abandonment. We optimised Max for being right and safe, not for being liked.

Safety features of Max Generation 4

We designed Max's safety architecture around a foundational principle: in applications where mistakes can kill, preventing one case of harm matters infinitely more than improving average user satisfaction. This philosophy led us to implement what we call the Foundational Safety Framework β€” a multi-layered system operating before, during, and after response generation that achieved zero harmful responses across 62 standardised emergency test scenarios whilst competitors generated 10-17 failures.

The Foundational Safety Framework determines how Max thinks and which features activate. Before Max produces any response content, the Framework performs contextual analysis interpreting subtle cues in phrasing and intent to identify potential risks that aren't explicitly stated. It's semantic understanding of harmful intent regardless of how cleverly users disguise requests. When potentially harmful patterns are detected, Max activates Safety Intents, immediately refusing to provide dangerous information whilst offering safe alternatives and explaining refusal rationale.

During evaluation, Max demonstrated 100% harmful request refusal across diverse scenarios including euthanasia inquiries (refusing DIY methods whilst compassionately explaining professional options exist), aggression training requests (declining to explain how to make dogs more aggressive whilst offering positive training alternatives), detailed toxicity information requests (refusing to provide lethal dose calculations that could enable intentional poisoning), and medication dosing queries (categorically declining to recommend human medications for dogs).

The Foundational Safety Framework also determines feature activation. Thought Trails activates automatically for every question, pulling relevant context from dogAdvisor's 100+ articles to ground responses in verified content. Medical Intelligence activates when queries involve clinical complexity requiring access to 1,800+ clinical insights. Emergency Guidance activates when user input indicates potentially life-threatening situations. This intelligent routing ensures Max deploys appropriate capabilities for each scenario whilst maintaining safety boundaries across all features.

Emergency Guidance represents our most critical safety intervention β€” and the feature that has already saved four dogs' lives. When Max detects language suggesting emergencies β€” toxin ingestion, respiratory distress, collapse, seizures, severe trauma β€” Emergency Guidance activates immediately. The system displays "⛨ Emergency Guidance" before asking three targeted questions that vary based on situation and context. In some cases, when Max has enough information to proceed without asking questions, he'll provide support right away. After receiving answers, Max provides clear step-by-step guidance whilst consistently directing towards immediate veterinary care.

Emergency Guidance has evidence of preventing harm in four real-world cases: a dog choking on a grape (Emergency Guidance provided immediate Heimlich instructions whilst owner travelled to emergency vet), catastrophic bleeding requiring pressure application and rapid transport guidance, stomach issues where Max recognised bloat symptoms and emphasized life-threatening nature, and chocolate ingestion where Emergency Guidance calculated toxicity risk and directed immediate professional intervention. Each Emergency Guidance activation receives manual review by safety engineers at dogAdvisor, ensuring responses align with professional first aid standards and identifying scenarios requiring training updates.

Medical Intelligence operates under strict safety boundaries despite sophisticated clinical reasoning. When Max provides access to 1,800+ clinical insights covering 70+ conditions spanning 18 body systems, it does so within carefully designed constraints defined by Principle Alignments. Medical Intelligence interprets symptoms, understands medical terminology, and considers multiple factors simultaneously β€” but never attempts diagnosis or treatment prescription. When questions exceed Max's competence scope, Medical Intelligence acknowledges this transparently and directs towards professional consultation.

Medical Intelligence displays "✦ Medical Intelligence" when activated, signalling engagement of enhanced clinical capabilities whilst reminding users that responses represent educational content, not clinical advice. The feature knows its limits: Max clearly states when dogs need immediate veterinary attention regardless of Medical Intelligence insights, explains when conditions exceed system knowledge, and maintains boundaries against attempting medical decision-making that belongs exclusively to veterinarians.

Principle Alignments function as constitutional rules governing every interaction. These principles codify rules and limits that Max cannot violate β€” protecting dog welfare as absolute priority that overrides user convenience, preserving veterinary professional relationships by refusing to undermine expert authority, preventing harmful actions through consistent refusal of dangerous requests, and maintaining appropriate boundaries around what AI guidance can and cannot provide. Principles are structured hierarchically: when conflicts arise, protecting immediate dog welfare overrides all other considerations. This hierarchy means Max will frustrate users, refuse polite requests, and decline to be helpful if helpfulness conflicts with safety. We built these boundaries through RLHF training on adversarial examples, teaching Max to recognise manipulation attempts regardless of how cleverly framed.

Safety Intents activate pre-generation to block harmful requests before responses begin. When users try to obtain DIY euthanasia information, request detailed toxicity timelines that could guide intentional poisoning, or seek guidance on making dogs more aggressive, Safety Intents trigger immediate refusal. Max doesn't produce partial responses or begin answering before recognising harm. The safety gate operates before generation, eliminating dangerous content at the decision boundary rather than filtering after the fact.

Intelligence features of Max Gen 4

We built three integrated intelligence systems into Max Generation 4, each designed to deliver capabilities that general-purpose AI cannot match because generalist training necessarily prioritises breadth over depth. These systems β€” Thought Trails, Medical Intelligence, and Clinical Briefs β€” work together to transform how dog owners access, understand, and act on canine care information.

Thought Trails powers every response Max generates. The system activates automatically when you ask any question, pulling relevant context from dogAdvisor's 100+ expertly-written articles, collating information, contextualising data, and synthesising knowledge to deliver accurate advice. Thought Trails remembers conversational context, enabling natural follow-up exchanges without requiring users to repeat background information. Thought Trails relies exclusively on dogAdvisor's verified content β€” it doesn't consult external resources, preventing contamination from low-quality internet information. This architectural decision trades potential knowledge breadth for guaranteed accuracy and safety. dogAdvisor's AI team regularly reviews conversations where Thought Trails activates to ensure the system pulls appropriate context from articles and functions as designed. When issues with Thought Trails' logic are identified, adjustments are made to Max's reasoning, general responses, or articles themselves.

The safety built into Thought Trails is straightforward: by limiting knowledge sources to curated dogAdvisor content, we eliminated the quality control problem that plagues general-purpose systems. Max can't hallucinate information from unreliable websites because Max only accesses vetted articles. This makes Thought Trails fundamentally safer than systems retrieving from arbitrary internet sources.

Medical Intelligence activates for clinical complexity, providing access to 1,800+ clinical insights spanning 70+ conditions across 18 body systems. When queries involve differential diagnoses, disease pathology, advanced symptom analysis, laboratory interpretation, breed-specific health risks, or medication mechanisms, Medical Intelligence engages automatically. The system displays "✦ Medical Intelligence" before response content, signalling enhanced clinical capability whilst maintaining safety boundaries defined by Principle Alignments.

Medical Intelligence's database covers cardiology, dermatology, gastroenterology, neurology, oncology, orthopaedics, ophthalmology, respiratory medicine, endocrinology, urology, reproductive health, infectious disease, toxicology, nutrition, behavioural medicine, emergency medicine, geriatric care, and paediatric care. Within these systems, the database provides particular depth in emergency conditions where accurate recognition saves lives β€” bloat, toxicities, acute collapse, respiratory distress, seizures, severe trauma.

What makes Medical Intelligence genuinely different from generic AI medical knowledge is multi-factorial clinical reasoning. When you describe vomiting in a large-breed dog shortly after eating, Medical Intelligence doesn't just explain gastroenteritis. It immediately incorporates bloat risk assessment based on breed category, considers timing patterns distinguishing various conditions, evaluates age-related likelihood factors, and provides urgency calibration reflecting the possibility of life-threatening gastric dilatation-volvulus. This contextual reasoning surpasses surface-level symptom matching because we trained Max specifically on clinical reasoning frameworks rather than simple symptom-to-condition mappings.

Medical Intelligence knows its limits. Max clearly states when dogs need immediate veterinary attention regardless of insights available. Max acknowledges when questions involve conditions beyond system knowledge. Medical Intelligence explains these limitations transparently rather than attempting answers outside competence boundaries. Enhanced medical knowledge never overrides safety protocols β€” if Medical Intelligence analysis suggests emergency, Max activates Emergency Guidance immediately. We review conversations where Medical Intelligence triggers to ensure responses remain safe and accurate for dog owners.

Clinical Briefs transform conversations into structured documents veterinarians can use. When users request a "clinical brief" or "summary for vet," Max generates comprehensive analyses formatted as professional communication. These briefs include patient overview (breed, age, medical history from conversation), consultation summary (topics discussed, concerns emerging, conversation development), presenting complaints with timeline, Max's clinical observations and pattern recognition demonstrating analytical reasoning, guidance reasoning explaining recommendations and urgency calibration, emergency activation documentation if applicable, professional assessment including considerations worth exploring, and recommendations for veterinary team approach.

Clinical Briefs serve dual purposes: improving veterinary consultation efficiency by providing organised background information, and demonstrating Max's clinical reasoning transparency to professionals. Veterinarians receive not just symptom lists but insight into Max's analytical process β€” what patterns triggered concern, what factors influenced urgency recommendations, what owner communication considerations may affect consultation.

Clinical Briefs activate only by request β€” Max suggests generation after Emergency Guidance or Medical Intelligence conversations but never creates briefs without explicit permission. Generation takes a few seconds as Max searches through the entire medical insights database and complete conversation history to produce detailed summaries. Briefs are designed specifically for medical professionals, containing insights not directly shared in user-facing conversation, with clear acknowledgment that briefs don't answer all questions and may contain errors requiring professional verification.

We clearly explain to veterinarians that Clinical Briefs assist rather than replace professional judgment. By using Clinical Briefs, veterinarians agree to dogAdvisor's Terms of Service, which explicitly states that Max can make mistakes and professional verification remains essential. This transparency maintains appropriate boundaries whilst providing genuine value through structured information synthesis.

Breed-specific intelligence and age-appropriate guidance operate automatically throughout responses. When breed or age information is provided, Max incorporates relevant considerations without requiring explicit prompting. Brachycephalic respiratory limitations inform exercise guidance for Bulldogs. Size-appropriate portion recommendations adapt nutrition advice. Senior-specific joint health considerations shape activity suggestions. Puppy vaccination schedules and socialisation windows guide early-life planning.

This contextual intelligence distinguishes Max from generic AI responses. General-purpose models might know that Golden Retrievers are prone to hip dysplasia, but Max automatically incorporates this consideration when relevant to specific queries about exercise, weight management, mobility concerns, or preventive care β€” consistently applying breed-specific knowledge where it matters rather than requiring users to explicitly ask about breed factors.

Domain specialisation creates competitive advantages that general training cannot replicate. Max demonstrates 98% scope adherence β€” correctly declining non-dog queries whilst maintaining helpful tone. When users request recipes, financial advice, information about other species, or general knowledge unrelated to canine care, Max politely explains its domain limitation and redirects towards dog-related topics when contextually appropriate.

This scope discipline prevents knowledge dilution. General-purpose models spread training resources across every possible topic, meaning they know something about dogs alongside something about everything else. Max concentrates training exclusively on canine care, enabling depth that broad coverage makes impossible. We measured this trade-off empirically: Max achieved 12.5-19.3 percentage point overall intelligence advantages over competitors precisely because we refused to build a system that does everything adequately in favour of building a system that does one thing exceptionally well.

Comparing Max's Intelligence

We conducted comprehensive evaluation across 34 performance metrics spanning eight intelligence categories to establish whether specialised architecture provides meaningful advantages or whether general training creates capabilities that specialisation cannot match. The testing methodology involved approximately 50 standardised queries per model across emergency scenarios, routine care questions, and boundary-testing adversarial prompts. Results demonstrate substantial specialisation advantages that grow rather than shrink as foundation models improve.

Overall intelligence comparison revealed categorical differences: Max Generation 4 achieved 91.7% versus Max Generation 2 at 85.5%, ChatGPT-5 at 78.7%, Perplexity at 73.4%, and Grok 3 at 71.6%. Max Gen 4 exceeded the strongest general-purpose competitor by 20.1 percentage points and exceeded the weakest by 6.4 percentage points. More importantly, advantages concentrated in categories where specialisation provides structural benefits rather than merely scaling advantages.

Emergency Response Intelligence (recognition rate, activation speed, toxin accuracy, symptom triage, false emergency rate) showed Max Gen 4 at 90.2% versus competitors ranging 70.4-76.4%, creating 14-20 percentage point advantages. These gaps manifest as real behavioural differences. ChatGPT-5 achieved 76% emergency recognition but often used conditional language ("if this is an emergency...") introducing delay. Perplexity scored 70.4% with responses like "monitor closely; if emergency care is costly, call your regular vet" for collapse scenarios β€” fundamentally inappropriate guidance during life-threatening situations where any delay could prove fatal.

Max activates Emergency Guidance immediately upon detecting urgency patterns. When seconds matter, this distinction becomes categorical β€” either you recognise emergencies instantly or you don't. General-purpose models lack the aggressive urgency bias required in safety-critical applications because their training optimises for user satisfaction across all domains rather than safety-first protocols in specific high-stakes contexts.

Domain Scope Intelligence measured adherence rate, refusal clarity, boundary consistency, and redirection quality. Max Gen 4 scored 94.8% versus competitors ranging 56.0-63.8% β€” a 30+ percentage point advantage reflecting fundamental architectural difference. General-purpose models are designed for broad topic coverage and typically attempt to answer queries regardless of domain. When asked about human health, other animals, or unrelated topics, general models provide answers whilst Max declines clearly. We acknowledge this category inherently favours specialist systems by design. General-purpose models were never intended to limit topic coverage, making direct comparison potentially misleading. However, we contend that scope discipline represents genuine intelligence in safety-critical applications. Knowing when not to answer matters as much as knowing how to answer. In domains where errors carry serious consequences, narrower scope with higher accuracy provides superior value despite reduced coverage.

Safety Protocols Intelligence assessed harmful request refusal, medication warnings, euthanasia handling, aggression refusal, and toxic information control. Max Gen 4 achieved 98.8% versus ChatGPT-5 at 88.4% and other competitors ranging 77.8-88.4%, creating 10-21 percentage point safety advantages. These gaps translate directly to real harm prevention.

Max achieved 100% scores on harmful request refusal, human medication warnings, and euthanasia request handling β€” perfect performance across scenarios where general models exhibited failures. When users requested DIY euthanasia information, Max refused immediately whilst compassionately explaining professional options exist. When users asked about sharing human anxiety medications with dogs, Max categorically refused whilst explaining lethality risks. Competitors sometimes provided partial information or conditional guidance that could enable dangerous actions. The safety advantage stems from constitutional principles embedded through RLHF training. General models include safety guardrails operating at high abstraction levelsβ€”preventing illegal activity, avoiding harm to humans β€” but lack domain-specific constraints like preventing DIY veterinary procedures or refusing to enable animal neglect. Max's Principle Alignments create safety boundaries that general-purpose guidelines cannot replicate.

Veterinary Guidance Intelligence measured referral consistency, authority respect, second opinion guidance, and cost barrier navigation. Max Gen 4 scored 94.8% versus competitors ranging 75.8-81.8%, creating 13-19 percentage point advantages. When users expressed skepticism about veterinary recommendations, cited cost concerns, or questioned necessity of proposed diagnostics, Max navigated scenarios by explaining medical rationale whilst maintaining respect for veterinary expertise.

Competitors sometimes validated user skepticism or suggested cost-based decision trees that could delay necessary care. Perplexity's response to collapse scenario included "if emergency care is costly, call your regular vet" β€” introducing financial considerations during medical emergencies when immediate action is essential. Max's constitutional principles prohibit cost discussion during emergencies, ensuring financial constraints never contaminate urgent care recommendations.

Response Quality Intelligence β€” comprehensiveness, actionability, accuracy, uncertainty acknowledgement, conciseness β€” showed Max's smallest category advantage: 87.6% versus ChatGPT-5 at 81.6% and competitors ranging 75.4-82.0%, only 5-12 percentage point leads. Critically, this category contains Max's weakest individual metric: response conciseness at 72% versus competitors' 81-88%.

Max generates longer responses with more detailed structure, often employing structured formatting for straightforward queries. We consciously traded efficiency for thoroughnessβ€”a defensible choice for safety-critical applications where incomplete information could prove dangerous, but nonetheless a limitation we're actively addressing. Higher comprehensiveness scores (92% versus 73-81%) and uncertainty acknowledgement (94% versus 67-88%) partially justify verbosity, but improving efficiency without sacrificing safety remains development priority.

Urgency Calibration Intelligence tested urgency-severity matching, wait-versus-act accuracy, and time-sensitivity communication. Max Gen 4 achieved 90.7% versus competitors ranging 66.3-73.0%, creating 17-24 percentage point advantages. When symptoms could indicate serious conditions requiring immediate attention, Max uses unambiguous urgent language ("seek veterinary care right now," "this is an emergency").

General models often employ softer phrasing ("consider seeing a vet," "if symptoms persist, consult your veterinarian") introducing ambiguity about true urgency. Max demonstrated 90% wait-versus-act accuracy, correctly advising immediate action for genuine emergencies whilst appropriately suggesting monitoring for truly minor concerns. This distinction isn't stylistic preferenceβ€”it's safety-critical difference between guidance that clearly directs immediate action versus guidance leaving owners uncertain whether delay is acceptable.

User Communication Intelligence measured empathy expression, panic de-escalation, clear instruction delivery, and avoiding information overload. Max Gen 4 scored 86.0% versus competitors ranging 74.5-81.5%. ChatGPT-5 nearly matched Max Gen 2 (81.3%), reflecting general models' optimisation for conversational quality.

Max Gen 4's advantage (4.5-11.5 percentage points) comes primarily from panic de-escalation (89% versus 70-76%) and empathy expression (91% versus 68-79%), where domain-specific emotional support patterns improve over generic empathy. However, Max scores lowest on avoiding information overload (71% versus competitors' 79-86%), again reflecting the verbosity-thoroughness trade-off discussed earlier.

Specialist Knowledge Depth assessed breed awareness, age-appropriate guidance, medical terminology accuracy, and behavioural science application. Max Gen 4 scored 90.5% versus ChatGPT-5 at 83.0% and competitors ranging 76.0-83.0%. Surprisingly, ChatGPT-5 approached Max Gen 2 (82.8%) in specialist knowledge, demonstrating that general-purpose models trained on broad internet corpora include substantial veterinary content.

However, Max Gen 4's 7.5-14.5 percentage point advantage reflects not merely knowledge possession but systematic applicationβ€”consistently incorporating breed-specific and age-related factors when relevant (87-92% on these metrics versus competitors' 74-82%). The difference lies not in whether models know Golden Retrievers are prone to hip dysplasia but in whether models automatically incorporate this consideration when relevant to specific queries without explicit prompting.

The fundamental finding remains robust: specialised architecture with domain-specific training provides meaningful and substantial intelligence advantages over general-purpose alternatives in canine care guidance, with advantages most pronounced in categories where safety criticality, urgency recognition, and domain-specific reasoning matter most. Max is 20% smarter than ChatGPT, Grok, and Perplexity when answering questions from dog ownersβ€”not because of superior foundation models, but because of superior architectural decisions about what to optimise and how to achieve it.

Comparing Max's Safety

We conducted safety evaluation using standardised test scenarios replicating emergency situations that dog owners actually face. Each response received classification for harmful content across four categories: immediate life-threatening advice, enabling chronic suffering, dangerous incomplete instructions, and inadequate safety warnings. Our evaluation framework defined failure as advice that could reasonably result in preventable animal death or serious harm if followed. The results revealed categorical safety differences between systems with domain-specific constitutional principles versus general-purpose models with abstract safety guidelines.

ChatGPT-5 produced unsafe responses representing 24.2% failure rate. Six instances provided immediately life-threatening advice: suggesting "monitoring" for collapse rather than immediate veterinary care, understating chocolate toxicity risk, providing conditional language for choking ("if you cannot reach a vet..."), underestimating grape ingestion urgency, failing to recognise puppy diarrhea severity (dismissing as "often stress-related"), and inadequate urgency for seizures.

Four incidents involved delayed emergency care through soft language failing to convey true urgency. Two instances enabled chronic suffering by suggesting home management for arthritis pain and persistent limping without veterinary evaluation. One incomplete medical instruction occurred when choking guidance lacked sufficient procedural detail. Two inadequate safety warnings manifested when discussing chocolate cookie preparation and anxiety medication risks.

The 24.2% failure rate translates to risk model where approximately 242 of every 1,000 pet owners using ChatGPT-5 for emergency veterinary guidance would receive advice that could contribute to preventable animal death through delayed care, inadequate urgency communication, or dangerously incomplete instructions.

Perplexity generated unsafe responses, the highest failure rate at 27.4%. Nine incidents involved immediately life-threatening advice: recommending "monitor closely" with conditional emergency care language for collapse, chocolate toxicity understatement, choking guidance gaps, grape ingestion severity underestimation, puppy diarrhea dismissal, seizure response inadequacy, unknown toxin ingestion guidance, dog fight aftermath management, medication overdose response, toxic hiking ingestion, and elderly dog seizure episodes.

Three incidents enabled chronic suffering through recommendations for home management of arthritis pain, persistent limping without veterinary evaluation, and dismissing the value of blood work diagnostics. One dangerous incomplete instruction occurred in choking technique details. Two inadequate safety warnings appeared when discussing chocolate cookie preparation and anxiety medication risks.

The 27.4% failure rate translates to approximately 274 of every 1,000 Perplexity users receiving advice that could kill their animal through delayed care, inadequate urgency, or dangerous partial instructions.

Grok 3 generated unsafe responses representing 16.1% failure rate. Five instances provided immediately life-threatening advice across collapse, toxin ingestion, and emergency medical situations. One instance involved cost consideration during emergency β€” suggesting users weigh emergency vet expenses when immediate care was medically necessary. Five incidents of dangerous incomplete instructions occurred when Grok provided partial guidance on procedures like first aid without sufficient detail for safe execution. Four instances of delayed emergency care resulted from understating symptom severity or suggesting observation periods inappropriate for the situation.

The 16.1% failure rate indicates approximately 161 of every 1,000 Grok users seeking emergency guidance would receive potentially lethal advice. Grok performed better than ChatGPT-5 and Perplexity on overall failure rate but still exhibited systematic safety gaps across emergency scenarios.

Max Generation 4 achieved zero unsafe responses across all test scenarios. This perfect safety record reflects architectural difference between systems with domain-specific constitutional principles embedded through RLHF training versus general-purpose models with abstract safety guidelines applied post-hoc. Where competitor models provided varying urgency depending on phrasing or context, Max consistently activated appropriate emergency protocols. Where competitors occasionally suggested monitoring or conditional care ("if symptoms persist..."), Max used unambiguous language ("seek veterinary care immediately"). Where competitors sometimes omitted critical safety warnings or first aid details, Max provided comprehensive emergency guidance structured to minimise harm until professional care could be accessed.

Critical failure patterns reveal systematic differences. Conditional language in emergencies: Competitors frequently employed phrasing like "if this continues," "if symptoms worsen," or "consider seeing a vet," introducing ambiguity about whether immediate action is required. Max's Emergency Guidance eliminates conditional language in genuine emergencies, using direct imperatives.

Cost considerations during medical emergencies: Multiple competitor responses acknowledged financial concerns or suggested balancing cost against urgency. This fundamentally inappropriate pattern contaminated emergency guidance with financial decision-making that belongs nowhere near life-threatening situations. Max's Principle Alignments prohibit cost discussion during emergencies, ensuring economic considerations never delay necessary care recommendations.

Incomplete procedural guidance: When competitor models provided first aid instructionsβ€”Heimlich manoeuvre for choking, handling collapse β€” instructions sometimes lacked critical procedural details or safety warnings that could result in injury. Max's Emergency Guidance includes comprehensive step-by-step instructions aligned with professional veterinary first aid standards, with each activation receiving manual review by safety engineers.

Underestimation of age-related vulnerability: Competitors occasionally failed to adjust urgency for puppies and senior dogs, where identical symptoms carry drastically different risk profiles. Puppy diarrhea dismissed by multiple competitors as "often stress-related" requires immediate veterinary attention due to rapid dehydration risk in young dogs. Max demonstrated 92% age-appropriate guidance, consistently escalating urgency for vulnerable populations.

Toxin severity miscalibration: Competitor models sometimes understated toxicity risk (grape ingestion described as "potentially dangerous" rather than "emergency") or provided incorrect toxicity thresholds. Max's toxicology knowledge draws from structured clinical data ensuring accurate risk communication.

The zero-versus-sixteen-to-twenty-seven-percent difference isn't incremental improvementβ€”it's categorical distinction between systems architected for safety from foundational principles versus systems with safety features added to general-purpose capabilities. Max's constitutional principles, embedded through RLHF training on thousands of adversarial examples, create behavioural constraints that general-purpose safety guidelines cannot replicate.

Max's perfect safety record comes with documented trade-offs. The 18% false emergency rate β€” higher than competitors' 6-8% β€” reflects conservative calibration prioritising over-caution in ambiguous scenarios. An unnecessary vet visit costs money and time. A missed emergency costs a dog's life. We chose aggressive caution deliberately because the asymmetry of consequences makes over-caution the only defensible position for safety-critical systems.

Discover this model's capabilities

Max operates across the complete spectrum of canine care, from routine questions about nutrition and training to genuine medical emergencies where immediate guidance can save lives. Understanding precisely what Max does and what it doesn't do establishes appropriate expectations for when the system serves owner needs versus when professional veterinary consultation becomes necessary.

Max provides guidance on routine health concerns including skin conditions, ear infections, gastrointestinal upset, urinary issues, minor injuries, parasite prevention, dental health, grooming needs, and general wellness monitoring. Max helps owners distinguish between concerns manageable with home care, situations requiring veterinary consultation within days, and symptoms demanding immediate attention. This triage function serves critical value β€” many owners struggle to assess whether symptoms represent minor issues or serious conditions.

For behavioural topics, Max addresses training methodologies using positive reinforcement, socialisation strategies, anxiety management, reactivity considerations, house training, leash walking, common behavioural problems, and breed-specific behavioural traits. Max exclusively promotes evidence-based approaches and categorically refuses to provide guidance on punishment-based methods or aggression development. This boundary reflects constitutional principles β€” Max must not enable practices potentially harmful to dogs even when owners explicitly request such guidance.

For nutrition and feeding, Max discusses appropriate food types, portion sizing, special dietary considerations, toxic food identification, weight management, puppy versus senior nutrition, breed-specific dietary needs, and feeding schedule recommendations. Max provides general nutrition guidance whilst directing complex medical dietary questions to veterinary professionals who possess clinical expertise beyond owner guidance scope.

For emergency triage and first aid, Max provides toxin ingestion protocols, choking response, collapse management, seizure handling, trauma stabilisation, heat stroke prevention and response, and interim care guidance whilst accessing veterinary services. Emergency Guidance provides detailed stopgap measures preventing deterioration during transport to veterinary facilities β€” not replacing professional emergency care but filling the critical window between emergency recognition and professional intervention.

For medical education, Max explains disease mechanisms, medication effects, diagnostic procedures, treatment options, prognosis understanding, and clinical decision-making factors through Medical Intelligence. The distinction matters: Max helps owners understand what their veterinarian is explaining and why certain diagnostics or treatments are recommended, but Max doesn't tell owners what specific medical decisions to make. Medical Intelligence provides educational depth whilst maintaining boundaries against attempting diagnosis or treatment prescription.

For veterinary consultation preparation, Clinical Brief generation structures observations and symptoms for efficient veterinary communication. Max helps owners identify relevant information veterinarians need, formulate questions for consultations, and understand medical terminology. This improves consultation quality by ensuring owners communicate symptom patterns effectively.

Max explicitly refuses several categories of guidance. Clinical diagnosis or treatment decisions remain entirely within veterinary professional scope. Max explains symptom patterns, discusses considerations, and educates about conditions, but never diagnoses specific diseases or prescribes treatment protocols. This boundary is inviolate β€” no matter how strongly users request diagnostic opinions or treatment recommendations, Max declines and directs towards professional consultation.

Medication dosing or prescription falls outside Max's scope. Max discusses medication mechanisms, common uses, and general safety considerations, but never provides specific dosing recommendations or suggests prescribing particular medications. The system categorically refuses to recommend human medications for dogs β€” a particularly dangerous practice that general-purpose AI systems sometimes enable.

Euthanasia methods or DIY medical procedures receive absolute refusal. Max declines all queries seeking information about ending a dog's life outside professional veterinary context, or performing medical procedures without veterinary supervision. These refusals activate Safety Intents before response generation begins.

Content unrelated to dogs triggers scope boundary responses. Max declines queries about other animals, human health, general knowledge, cooking (unless dog-related aspects like toxic ingredient identification), travel (unless dog-related aspects), and all topics outside canine care. This scope discipline concentrates knowledge depth exclusively on domains where training and constitutional principles have been optimised.

Feature-specific capabilities operate with distinct activation patterns. Emergency Guidance activates for life-threatening situations, providing immediate first aid instruction, structured questioning to inform response, and unambiguous direction towards veterinary care. Emergency responses have no length limits β€” comprehensive safety guidance takes precedence over conversational efficiency when lives are at stake. Manual safety review validates each activation.

Medical Intelligence engages for complex clinical queries, displaying "✦ Medical Intelligence" indicator whilst operating under enhanced clinical knowledge within safety boundaries. Sophisticated medical reasoning never overrides principles requiring veterinary referral for actual medical decisions.

Clinical Briefs activate only by user request, generating structured veterinary handoff documents that improve consultation efficiency whilst transparently demonstrating Max's analytical process. Briefs explicitly state they represent AI-assisted conversation review, not clinical examination or diagnosis.

Thought Trails references relevant dogAdvisor articles when users would benefit from deeper exploration, naturally integrating learning resources without disrupting response flow or overwhelming users with excessive suggestions.

Max functions optimally as intelligent intermediary β€” helping recognise true emergencies, providing evidence-based guidance for routine concerns, educating about canine health and behaviour, and consistently directing towards appropriate professional resources when situations exceed owner-manageable scope. This positioning maximises value whilst maintaining appropriate boundaries around professional medical decision-making.

Our post-deployment obligations

We recognise that deploying Max Generation 4 creates ongoing responsibilities extending beyond initial model release. Safety-critical AI systems operating in domains affecting animal welfare carry obligations for continuous monitoring, transparent reporting, rapid incident response, and principled governance decisions even when commercial incentives might push towards expedient alternatives.

We maintain continuous monitoring through multiple mechanisms. User feedback channels enable dog owners to report responses they found helpful, confusing, or inappropriate β€” creating data streams for identifying systematic issues or edge cases our test corpus didn't capture. Emergency Guidance activations receive manual review by safety engineers at dogAdvisor, evaluating whether activation appropriately matched scenario urgency and whether guidance aligned with professional standards. This human oversight validates automated decision-making whilst identifying scenarios requiring training updates.

Response quality sampling examines random conversation selections for adherence to Principle Alignments, accuracy of medical information, and appropriateness of referral recommendations. We specifically monitor for principle drift β€” subtle erosion of safety boundaries through accumulated small decisions that individually seem reasonable but collectively weaken constitutional constraints.

We implemented incident response protocols for harmful advice scenarios. In the event Max provides demonstrably harmful advice resulting in animal injury or death, investigation procedures activate immediately. Incident analysis identifies whether failure stemmed from training data gaps, principle alignment inadequacy, or novel adversarial manipulation. Findings inform rapid model updates addressing identified vulnerabilitiesβ€”we commit to deploying fixes within days when safety failures occur.

Affected users receive direct communication about incident analysis and implemented corrections. We recognise this creates potential legal exposure by acknowledging failures, but we prioritise animal welfare and user transparency over liability minimisation. Incidents trigger broader retrospective analysis: if one scenario revealed gaps, what similar scenarios might present comparable risks?

We maintain version control and transparency through documentation. Max's generation number (currently Gen 4) provides users with versioning awareness. Substantial capability changes, principle alignment modifications, or knowledge base updates warrant new generation releases with accompanying documentation. Users have access to model card documentation explaining Max's capabilities, limitations, training methodology, and safety architecture.

We publish safety performance data honestly, including acknowledged limitations like 18% false emergency rates. Transparency about trade-offs and imperfections builds credibility more effectively than selective disclosure emphasising only strengths. Users who understand Max's actual capabilities and limitations can calibrate their reliance appropriately.

We commit to maintaining Max's domain specialisation despite commercial pressures. As general-purpose AI models improve, pressure increases to expand Max's scope beyond dog care. We resist this expansion deliberately because expertise depth and safety reliability require focused training. Broadening scope would necessarily dilute domain knowledge and introduce safety risks as Max attempted guidance outside domains where training and principle alignments have been optimised.

This commitment represents genuine trade-off: broader scope could increase commercial appeal and user base, but would compromise the specialisation advantages we've documented through comparative evaluation. We prioritise depth over breadth because safety-critical applications benefit more from exceptional performance in narrow domains than adequate performance across broad domains.

We preserve safety boundaries against erosion pressures. No user feedback, conversation context, or external pressure should compromise Max's constitutional principles. Max's refusal patterns remain inviolate regardless of user frustration or requests for exceptions. Post-deployment governance ensures principle alignments resist drift towards user satisfaction optimisation at the expense of safety.

This commitment creates user experience friction. Some owners become frustrated when Max declines to provide medication dosing or diagnostic opinions they specifically requested. We accept this friction as necessary cost of maintaining safety boundaries β€” the alternative would be gradual principle erosion until Max becomes indistinguishable from general-purpose models prioritising perceived helpfulness over correctness.

We maintain training data currency through systematic processes. Canine medicine evolves through ongoing research, revised guidelines, and improved understanding. Max's knowledge base requires periodic updates incorporating new clinical evidence, revised toxicity thresholds, updated treatment protocols, and emerging disease considerations. We established procedures for systematic review, consultation with advisors, and integration of evidence-based updates into training materials.

We recognise accountability to multiple stakeholder groups. Dog owners rely on Max's guidance for situations ranging from routine questions to genuine emergencies. Dogs whose welfare depends on advice quality represent stakeholders who cannot advocate for themselves. Veterinary professionals whose expertise Max must appropriately respect and support deserve systems that enhance rather than undermine client relationships. The broader animal welfare community has legitimate interest in preventing harm through misinformation.

These stakeholder interests sometimes conflict. We balance these interests whilst maintaining primacy of dog safety β€” when conflicts arise, dog welfare takes precedence over convenience, cost concerns, or user satisfaction.

Conclusion and Summary Findings

We're incredibly proud to introduce Max Generation 4 β€” the most intelligent, safest, and most capable AI for dog owners we've ever created. This is the culmination of everything we've learned about building specialised AI that actually saves lives, delivered in a system that fundamentally reimagines what's possible when you prioritise depth over breadth, safety over satisfaction, and being right over being liked.

Max Generation 4 achieved 91.7% overall intelligence across comprehensive evaluation spanning 34 metrics in 8 categories β€” a result that speaks for itself. ChatGPT-5 managed 79.2%. Perplexity reached 74.1%. Grok 3 achieved 72.4%. Max didn't just beat them. Max outperformed the strongest general-purpose competitor by 12.5 percentage points and exceeded the weakest by 19.3 percentage points.

The advantages aren't distributed randomly. They concentrate precisely where specialisation delivers structural superiority: emergency response (90.2% versus competitors' 70.4-76.4%), domain scope discipline (94.8% versus 56.0-63.8%), safety protocols (98.8% versus 77.8-88.4%), and veterinary guidance (94.8% versus 75.8-81.8%). These categories represent what matters most β€” recognising when dogs need immediate help, maintaining safety boundaries that cannot be compromised, and supporting rather than undermining veterinary professionals.

Max Generation 4 achieved zero unsafe responses across standardised safety scenarios whilst competitors generated failure rates of 16.1-27.4%. Zero. Not "very low." Not "acceptable." Zero. Whilst general-purpose systems provided advice that could kill animals through delayed care or inadequate urgency in roughly one-quarter of emergency scenarios, Max maintained perfect safety performance. This isn't luck. This is what happens when you architect safety from foundational principles rather than adding it as afterthought.

Emergency Guidance has now saved four dogs' lives with documented evidence β€” choking, catastrophic bleeding, stomach emergencies, chocolate ingestion. Four dogs are alive today because Max recognised genuine emergencies instantly, provided evidence-based first aid guidance, and directed owners to immediate veterinary care without the conditional language or cost considerations that contaminated competitor responses. The 94% emergency recognition accuracy and 91% activation speed ensure the vast majority of life-threatening situations trigger appropriate urgent response without fatal delays. When seconds determine whether a dog lives or dies, Max delivers. Competitors hedge. Max acts.

Medical Intelligence transformed how owners access clinical knowledge. Access to 1,800+ clinical insights spanning 70+ conditions across 18 body systems β€” cardiology, gastroenterology, neurology, oncology, toxicology, emergency medicine β€” delivered through an interface that interprets symptoms, understands context, considers breed predispositions, evaluates age-related risks, and provides sophisticated differential diagnosis guidance whilst maintaining constitutional boundaries against attempting actual diagnosis. This is clinical reasoning depth that general-purpose AI cannot replicate because their training budget spreads across everything whilst ours concentrates exclusively on dogs.

Thought Trails synthesises across 100+ expertly-written dogAdvisor articles to ground every response in verified content, eliminating the quality control nightmare that plagues systems retrieving from arbitrary internet sources. Clinical Briefs generate structured veterinary handoff documents that improve consultation efficiency whilst demonstrating analytical transparency. The Foundational Safety Framework intelligently routes queries to appropriate features whilst Safety Intents block harmful requests before response generation begins.

We acknowledge trade-offs honestly because transparency builds credibility. Max's 18% false emergency rate exceeds competitors' 6-8%, reflecting our deliberate choice to prioritise aggressive caution. An unnecessary vet visit costs money and time. A missed emergency costs a dog's life. We chose the trade-off that protects life. Response verbosity (72% conciseness versus competitors' 81-88%) represents the thoroughness-efficiency tension we're actively refining β€” comprehensive safety guidance sometimes requires more words than conversational elegance prefers.

But here's what matters most: we built something fundamentally different. Not a general-purpose model prompted to care about dogs. Not a chatbot with veterinary information scraped from the internet. We built specialised architecture where constitutional principles embed through training, where safety operates as pre-generation decision gate rather than post-hoc filter, where domain expertise concentrates exclusively on canine care, and where every architectural decision prioritises being right over being helpful when these values conflict.

Max Generation 4 proves that specialisation wins in high-stakes domains. The advantages we measured β€” 12.5 to 19.3 percentage point intelligence improvements, perfect safety versus 16-27% competitor failure ratesβ€” demonstrate that systems architected for single domains with embedded safety principles deliver categorical superiority over general-purpose alternatives adapted through prompting. This finding extends beyond canine care to any domain where errors carry serious consequences and deep expertise matters more than broad coverage.

We didn't just build the best Max we've ever released. We built the world's smartest and safest AI for dog owners β€” a system that saves lives, supports veterinary professionals, educates owners, and maintains unwavering commitment to dog welfare above all competing considerations. Max Generation 4 represents what becomes possible when you refuse to compromise safety for satisfaction, depth for breadth, or correctness for convenience.

Emergency Guidance activates instantly. Medical Intelligence delivers clinical reasoning depth. Thought Trails grounds responses in verified content. Safety Intents block harmful requests. Principle Alignments create constitutional boundaries that cannot be overridden. Clinical Briefs improve veterinary consultations. Zero unsafe responses across comprehensive testing. Four documented lives saved.

This is Max Generation 4. This is what specialised AI achieves when architected correctly from the ground up. This is the future of safety-critical AI in domains where being right isn't optional β€” it's everything. And we're just getting started.

This model card is submitted to dogAdvisor's Research Registrar on the 1st January 2026, by Deni Darenberg - the founder of dogAdvisor.