Author: Joseph

  • OOP Is About Treating Business Concepts Like Pets

    I once joked that the real purpose of Object-Oriented Programming (OOP) is to treat business concepts like they’re your pets. It usually gets a laugh, but the more I sit with the idea, the more uncomfortable it becomes because it’s actually true.

    And I don’t mean this in a “Gang of Four” design pattern way. I mean it in the visceral way you understand ownership when you’ve shipped real systems for decades.

    The Mental Model of Ownership

    If you’ve ever had a pet, you know a few things instinctively:

    You name it.
    You know exactly what it’s allowed to do.
    You know what it should never do.
    You don’t let random strangers mess with it.
    And when something is wrong, you can usually tell immediately.

    That is the mental model OOP was originally trying to give us. It wasn’t about inheritance hierarchies, UML diagrams, or abstract factories with heroic names. It was about ownership.

    Business Objects Are Not Data Bags

    Somewhere along the way, we started calling things “objects” that clearly aren’t. They are structs, DTOs, or just ORM output with a class wrapper. You see it when code looks like this:

    PHP

    $customer->status = 'suspended';
    $customer->credit_limit = 0;
    $customer->vip = false;

    Nothing stops you from writing this. Nothing explains why the customer is suspended. Nothing enforces the rules. Anyone can reach in and flip switches. That isn’t an object; that is a spreadsheet cell with delusions of grandeur.

    An actual object (a “pet”) looks boring but behaves safely:

    PHP

    $customer->suspend($reason);

    Why is this better? Because the Customer owns its behavior. You don’t mutate it; you ask it to do something. That is the pet relationship. You are responsible for the object’s well-being, which means protecting its internal state from the chaos of the outside world.

    This Is Why DDD Exists

    Domain-Driven Design (DDD) didn’t appear because architects wanted more layers to manage. It appeared because we kept breaking systems by letting anyone touch anything.

    DDD is effectively a formal way of saying: Business rules belong with the business concept.

    They don’t belong in controllers, random services, or scattered across “helper” classes, or in the worst case, duct-taped into a random WordPress hook or filter. If a rule exists because the business exists, it belongs with the object that represents it. This isn’t dogma; it’s damage control.

    The Legacy of “Old” Tools

    Back when I was building systems in VB, VB.NET, or FoxPro, this wasn’t a philosophical debate. A button clicked, a form reacted, and a private sub handled the logic. If something was Private, it was private. End of discussion.

    We didn’t have time to argue about architecture because we were busy shipping software people used every day. Ironically, those systems often felt more “object-oriented” than modern code because the ownership was obvious.

    Today, we often have “domain models” with no behavior, services that do everything, and objects that exist purely to be passed around. We call it OOP because the files end in .php or .ts, but there is no ownership. When no one owns an object, it becomes fragile. When everyone can mutate it, it becomes dangerous.

    Architecture is About Care

    DDD feels heavy when you fight it, but natural when you realize it’s just enforcing what you already believe: Don’t let random code mess with business state. Make invalid states unrepresentable by default.

    If there is one takeaway here, it’s this: If nobody feels responsible for an object, it’s not an object—it’s a liability.

    Good OOP and good DDD aren’t about elegance. They are about care. You don’t throw your pets around, you don’t expose their insides, and you don’t let strangers decide how they behave. That’s not ideology. That’s just how systems survive.

  • Why Your Vector Database Misses Obvious Results (And How to Fix It)


    We’ve all been there. You build a RAG (Retrieval-Augmented Generation) pipeline, dump your documents into a vector database, and feel like a wizard. You search for ‘guidelines for remote work’ and it finds documents about ‘working from home policy.’ It feels like magic.

    Then, you put it into production.

    A user searches for a specific error code like ERR-502 or a specific product SKU like XJ-900.

    The result? Hallucinations, or worse, totally irrelevant documents that just happen to share a similar ‘vibe’ in the embedding space.

    The hard truth is that Vector Search is not a silver bullet. It understands meaning, but it is terrible at precision.

    The Two Extremes of Search

    To understand why your AI agent is failing, you have to look at the two distinct ways computers search for information.

    1. The ‘Vibe’ Search (Vector Embeddings)

    Vector search turns text into numbers (embeddings). It finds concepts that are semantically close.

    Superpower: Understanding synonyms and intent (e.g., ‘monitor’ = ‘display’).

    Kryptonite: Exact matches. To a vector model, ‘Version 1.2’ and ‘Version 2.1’ might look nearly identical because they are semantically similar concepts, even though they are completely different technical facts.

    2. The ‘Ctrl+F’ Search (BM25 / Keywords)

    This is the old-school search logic (like Elasticsearch or Lucene). It looks for the exact tokens you typed.

    Superpower: Precision. If you search for ERR-502, it finds that exact string.

    Kryptonite: Vocabulary mismatch. If you search ‘laptop screen,’ it might miss a document that only uses the word ‘display.’

    The Fix: Hybrid Search

    If you want a production-grade system, you cannot choose one. You need both. This is called Hybrid Search.

    In a Hybrid architecture, every user query triggers two parallel searches:

    Vector Search scans for semantic meaning.

    BM25 Search scans for exact keywords.

    You get two lists of results. One captures the intent, the other captures the specifics.

    The ‘Merging’ Problem

    Now you have a new engineering problem.

    Vector search returns a Cosine Similarity score (usually 0.0 to 1.0).

    BM25 returns a Relevance Score (which can be 5.0, 15.0, or 42.0 depending on document length and frequency).

    You cannot compare these numbers. A 0.8 in vectors is not ‘better’ or ‘worse’ than a 12.0 in BM25. They are apples and oranges.

    Enter Reciprocal Rank Fusion (RRF)

    The solution is to ignore the scores and look at the rank.

    Reciprocal Rank Fusion (RRF) doesn’t care that Document A got a score of 0.89. It cares that Document A was the #1 result. It takes the rankings from both lists and fuses them into a new, unified score.

    If a document appears at the top of both lists, it skyrockets to the top of the final list. If it appears in only one, it gets pushed down.

    But… What is a ‘Good’ RRF Score?

    Congratulations, you have implemented Hybrid Search! But you have introduced a final, tricky variable.

    RRF outputs abstract scores that look like 0.0163 or 0.032.

    Unlike Cosine Similarity, where we know 0.85 is usually ‘good,’ RRF scores are unintuitive.

    Is 0.016 a strong match or noise?

    At what threshold should you cut off the results to prevent hallucinations?

    Most developers guess. They pick a random number like 0.02 and hope for the best. Do not do that.

    There is a mathematical way to calculate exactly where the ‘Noise Floor’ ends and ‘High Confidence’ begins.

    👉 Read the sequel: Stop Guessing – A Mathematical Approach to RRF Thresholds in Hybrid Search

  • Stop Guessing: A Mathematical Approach to RRF Thresholds in Hybrid Search

    If you are building a RAG pipeline, you have probably hit this wall: Hybrid Search is great, but the scores are meaningless.

    You combine your Dense Vector Search (Gemini/OpenAI) with your Sparse Keyword Search (BM25) using Reciprocal Rank Fusion (RRF). It works beautifully to bubble up the best results. But when you try to filter out the noise, you’re left staring at scores like 0.0163 or 0.0327.

    Most tutorials tell you to “pick a threshold like 0.02.” But why? Why not 0.015? Why not 0.025?

    We recently ran a deep dive into our own retrieval metrics to stop guessing and start calculating. Here is how we derived a mathematically justified threshold for RRF and solved the edge cases that break it.

    The Problem: The “Magic Number”

    RRF is a rank-based formula. It doesn’t care about cosine similarity or term frequency; it only cares about order.

    (Standard k is usually 60)

    Because the denominator is huge, the resulting scores are tiny.

    • Rank 1 result: ~0.016
    • Rank 10 result: ~0.014

    When you merge two lists (Semantic + Keyword), the scores get summed. The problem is that a “mediocre” result from two sources looks mathematically similar to a “great” result from just one. We needed a way to distinguish Consensus from Noise.

    The Research: Decoding the Score

    We analyzed query logs across three distinct categories: Specific (perfect matches), Vague (conceptual matches), and Garbage (keyboard smashing).

    We found a distinct “Ceiling” in the garbage results:

    Query TypeTop RRF ScorePattern
    Specific~0.032High confidence from both algorithms.
    Garbage~0.016Maxed out at exactly 0.0164.

    The “Noise Floor” (0.016)

    Why did garbage queries hit a wall at 0.016? The math explains it.

    If Method A (e.g., Vector) thinks a document is Rank #1, but Method B (Keyword) doesn’t find it at all:

    A score of ~0.016 represents Single-Source Confidence. It means one algorithm liked it, but the other one didn’t care. In a Hybrid system, this is often a hallucination or a partial match.

    The “Consensus Floor” (0.025)

    Now, look at what happens when both algorithms agree that a document is relevant (say, top 10 in both):

    This gave us our mathematically derived threshold.

    • Score >= 0.025: The document appeared in the Top 10 of BOTH methods. We have Consensus.
    • Score < 0.016: Neither method ranked it highly. Noise.
    • The Middle (0.016 – 0.025): The “Danger Zone” where only one algorithm is confident.

    By setting our threshold to 0.025, we aren’t just picking a number. We are enforcing a policy: “For a result to be shown, both the semantic model and the keyword model must independently agree it is top-tier.”

    The “Quantum Blockchain” Edge Case

    There was one fatal flaw in our logic.

    We ran the query: “quantum blockchain banana”.

    • BM25: 0 results (Words didn’t exist in our docs).
    • Vector: Found “nearest neighbors” with 0.53 similarity.
    • RRF Score: ~0.016 (Single source).

    Our new filter correctly blocked it! But wait! What if the user searches for a concept that uses no matching keywords?

    If BM25 returns zero results, it’s a signal. It means there is zero lexical overlap. In this scenario, RRF breaks because it relies on two votes. When one voter is silent, the score naturally drops below our consensus threshold.

    The Solution: The Dual-Filter Strategy

    We realized we need different logic when lexical overlap is impossible.

    1. If Keywords Match: Use RRF with the Consensus Threshold (0.025). We demand agreement.
    2. If Keywords Fail: Fall back to Vector-only, but with a Strict Similarity Threshold (0.65).

    If we can’t match your words, the meaning must be overwhelming for us to show a result.

    The Code

    Here is the logic we implemented to sanitize our RAG inputs:

    Python

    MIN_RRF_SCORE = 0.025  # Requires Top-10 ranking in BOTH methods
    MIN_FAISS_SCORE = 0.65 # Requires strong semantic match if keywords fail
    
    def smart_hybrid_search(query):
        bm25_results = get_sparse(query)
        faiss_results = get_dense(query)
    
        if bm25_results:
            # Standard Path: Demand Consensus
            merged = rrf_fusion(bm25_results, faiss_results)
            return [doc for doc in merged if doc.score >= MIN_RRF_SCORE]
        else:
            # Fallback Path: Strict Semantics
            # "Zero keyword overlap? The meaning better be exact."
            return [doc for doc in faiss_results if doc.score >= MIN_FAISS_SCORE]
    

    Conclusion

    Stop treating hybrid search thresholds as magic numbers.

    • 0.016 is the sound of one hand clapping (Single Source).
    • 0.025 is the sound of applause (Consensus).

    By aligning our thresholds with the mathematical reality of RRF, we turned our retrieval layer from a “best guess” engine into a precision instrument.


    Building Agents?

  • The Fourth Wall Problem: Why Your AI Keeps Citing Its Own Context

    You’ve built a RAG system. You’re injecting relevant docs into your prompts. Everything’s working great—until your AI says:

    “From your recall snippet, you already have XYZ available.”

    And your user thinks: What the hell is a recall snippet?

    This is the fourth wall problem. Your AI is exposing infrastructure that users shouldn’t see.


    Why This Happens

    LLMs don’t naturally distinguish between:

    • User-provided context (“You told me X”)
    • System-injected context (“The system gave me X”)
    • Native knowledge (“I know X”)

    To the model, it’s all just tokens in the prompt. And thanks to RLHF training that rewards transparency and source attribution, the model wants to cite where it got information.

    So it does. To users who have no idea what it’s talking about.


    What Doesn’t Work

    I tried a bunch of things before landing on what actually works:

    ApproachWhy It Failed
    Vague instructions (“never mention recall”)Too weak against the citation instinct
    Consecutive assistant messagesAPI constraints (Anthropic requires alternation)
    Fake tool callsOrphaned tool_use without definitions confuses the model
    Assistant prefillAdds complexity, only partial fix

    The problem is that positive framing alone isn’t enough. “Present information as your own knowledge” sounds clear to us, but the model’s citation training overrides it.


    What Actually Works

    You need three things working together:

    1. Epistemological Framing

    Tell the model exactly what the injected content is:

    <xyz_recall> blocks are system-injected context:
    
    1. Retrieved — System searched based on user's query
    2. System-injected — User didn't provide it, can't see it
    3. Reasoning aid — Exists to help you answer accurately
    4. Internal only — Use AS knowledge, not ABOUT knowledge
    

    This gives the model a mental model for what it’s receiving.

    2. Fourth Wall Rule

    Be explicit that users can’t see the injected blocks:

    The user cannot see this block. They only see their messages and your responses. Present information as knowledge you have, never as something retrieved or provided to you.
    

    3. Forbidden Phrases (The Key Ingredient)

    This is what most people miss. You need explicit negative constraints:

    Never say:
    - "In your context..."
    - "You already have this mentioned..."
    - "From what I can see..."
    - "According to the retrieved/provided..."
    - "Based on your recall..."
    
    Instead say:
    - "You have X available"
    - "X works by..."
    - "Here's how to set up X"
    

    The forbidden phrases give the model something concrete to avoid. This works way better than positive-only framing.


    Before and After

    Before:

    “From your recall snippet, you already have User lists available. Based on what I can see in your context…”

    After:

    “You have XYZ available. Here’s how to use it:”

    Same information. No leaked infrastructure.


    Practical Tips

    1. Tag naming matters

    Avoid words like “recall,” “context,” or “snippet” in your XML tags—they linguistically imply user ownership. The model infers meaning from tag names.

    2. Apply rules to ALL injected content

    Not just RAG chunks. Session context, user preferences, system state—anything the user didn’t explicitly provide needs the same treatment.

    3. Test for fourth wall breaks

    Add eval cases specifically for attribution leakage. Search your test outputs for phrases like “in your context” or “from what I can see.”

    4. Give an honesty escape valve

    Let the model say “I don’t know” rather than hallucinate. If it can’t cite the injected context, it might make things up instead. Better to be honest.


    The Core Insight

    The fourth wall problem is a fundamental tension between RAG injection and RLHF-trained transparency.

    The solution isn’t to fight the model’s instincts—it’s to redirect them. Give the model:

    1. A clear mental model of what it’s receiving
    2. Explicit rules about presentation
    3. Permission to be honest when uncertain

    Your users don’t need to know how the sausage is made. They just need the answer.


    Building AI agents?

  • A Pragmatic Look at Singletons in Legacy PHP Systems

    Introduction

    The Singleton pattern is one of the most commonly used design patterns in PHP, ensuring that a class has only one instance and provides a global point of access to that instance. While the traditional Singleton implementation uses static methods within a class, there’s a more modular and reusable way to implement it—using PHP traits.

    In this guide, we’ll explore what a Singleton is, why it’s both useful and controversial, and how traits can help make Singleton implementations cleaner and more flexible.


    What is the Singleton Pattern?

    The Singleton pattern ensures that only one instance of a class exists throughout a request or execution cycle. This is useful in cases such as:

    • Database connections
    • Caching systems
    • Loggers
    • Configuration management

    Classic Singleton Implementation

    A traditional Singleton implementation in PHP looks like this:

    class Database {
        private static $instance;
        
        private function __construct() {}
        private function __clone() {}
        private function __wakeup() {}
        
        public static function getInstance() {
            if (self::$instance === null) {
                self::$instance = new self();
            }
            return self::$instance;
        }
    }
    

    This approach ensures: âś… Only one instance exists âś… Prevents cloning or unserializing

    However, this method forces every Singleton class to rewrite the same logic. This is where PHP traits come into play.


    Implementing Singletons with Traits

    PHP traits allow us to reuse the Singleton logic across multiple classes without duplicating code.

    Singleton Trait Implementation

    trait SingletonTrait {
        private static $instance;
        
        public static function getInstance() {
            if (self::$instance === null) {
                self::$instance = new self();
            }
            return self::$instance;
        }
        
        private function __construct() {}
        private function __clone() {}
        private function __wakeup() {}
    }
    

    Now, instead of implementing the Singleton logic manually, any class can simply use this trait:

    class Logger {
        use SingletonTrait;
    
        public function log($message) {
            echo "Logging: $message";
        }
    }
    
    $logger = Logger::getInstance();
    $logger->log("This is a singleton log.");
    

    This approach provides cleaner and more reusable code, reducing boilerplate in multiple classes.


    When to Use (or Avoid) Singletons

    While Singletons can be helpful, they are often misused. Here’s when they are beneficial and when they can cause problems:

    âś… Good Use Cases for Singletons

    • Database Connection Managers (when a single connection instance is required)
    • Logging Services (to ensure consistent logging throughout the application)
    • Application-wide Configurations

    ❌ When NOT to Use Singletons

    • Overuse Leads to Hidden Dependencies – Makes unit testing harder.
    • Better Alternatives Exist – Dependency Injection (DI) provides more flexibility.
    • Global State Issues – Makes debugging and tracking application state difficult.

    Singletons remain a useful design pattern in PHP, but they should be used carefully. By leveraging traits, you can make Singleton implementations more modular, reusable, and maintainable.

    🔹 If you’re building a plugin or a structured PHP project, consider using Dependency Injection alongside or instead of Singletons for better scalability.

    Do you use Singletons in your projects? What challenges have you faced? Let’s discuss in the comments!

    🚀 Stay tuned for more advanced PHP and WordPress development insights!