This post will be a multi-part installment because I want to include notes about how each model performed. To start off let’s look at the overall leaderboard and how our top LLM scored.
FINDINGS
While I think LLMs can be used to generate ideas or compile a list of potential features or technologies, the models aren’t sophisticated enough to understand the context of the problem and how to maximize user value.
This leaves a lot of gaps that humans inevitably need to fill. That’s to be expected. Anyone who has built an app from scratch will tell you that gathering feedback early and often is the best way to create the most useful app.
Novel concepts might be inspired by humans interacting with LLMs, but it’s doubtful it could come up with anything compelling on its own. Could you use LLM as a brainstorming tool, absolutely!
MY EXPERIENCE
The accuracy of output was mostly consistent, some models went into more detail than others. This is probably a result of token limits and other model parameters. The same for completeness of response. Each model attempted to follow the process of first principals toward and MVP.
But none of the models really handled the concept of first principals reasoning. I suspect the results from this test were influenced by the vagueness of my prompt. Each model confidently proclaimed its fundamental truth and proceeded with solutioning. The models didn’t show their work and it’s difficult to determine if the code skipped a step.
Likewise, none of the models identified the assumptions to be validated by the MVP, which is a huge part of what we want to learn from end user testing. A good app MVP focuses on user experience or behavior. A great app focuses on making that interaction as simple and effective as possible.
While I appreciate that the LLM models were able to call out features and technologies that might solve a user’s problem, I find this type of thinking one dimensional. One model tried to create user stories but didn’t describe the user value.
TEST RESULTS
Claude 3 Sonnet
ChatGPT 3.5
Gemini
CoPilot
Grok
HUMAN BASELINE
WHAT IS IT?
The idea of first principals is a powerful problem-solving approach that involves breaking down complex issues into its most basic and fundamental parts.
HOW IT WORKS
Starting Point:
Instead of accepting existing assumptions or conventional wisdom, first principles thinking begins with a blank slate.
It encourages us to question everything we think we know about a given problem.
Deconstructing the Problem:
We break down the problem into its essential elements, separating facts from assumptions.
By doing so, we get to the fundamental truth underlying the issue.
Asking Powerful Questions:
We ask why repeatedly to get to the root cause.
For example, if the problem is related to energy efficiency, we might ask:
Why do we use certain materials in construction?
Why do we follow specific design standards?
Why do we rely on existing technologies?
Creating New Solutions from Scratch:
Armed with the fundamental truth, we construct new solutions.
We look beyond existing solutions and explore novel approaches.
Unlocking Creativity and Innovation:
First principles thinking encourages us to think like scientists and inventors.
It allows us to combine ideas from unrelated fields and create innovative solutions.
HUMAN REASONING
THE PROBLEM
I do not know what cleaning products I have and where they are located.
TRUTHS
When I deconstruct the problem to the fundamental truth I discover that the purpose (or category) of the product is more important than the location. Cleaning products are used in support for a cleaning activity which impacts both the type of cleaner used and the most convenient storage location.
AN EXAMPLE
Heavy degreasers, glosses & polishes and glass & rubber cleaner in the garage
Mild degreasers, dish soaps, countertop cleaner, disinfectant wipes in the kitchen
Toilet and grout cleaner, tile mopping solution in the bathroom
It doesn’t make a lot of sense to store bathroom supplies in the garage, and I typically don’t want to search in the house for a heavy cleaner I’m using outside.
The other truth is that I sometimes need a specific cleaner for a specific job, but I’m not sure if I have the right product for the job. Keeping an accurate and up to date list is equally as important as knowing where the product is stored.
Manual data entry is inconvenient and cumbersome to maintain, so products that are added or removed need to be updated seamlessly.
SOLUTION
The novel solution that I arrived at was to construct a system that encourages users to track cleaning products as they use them. Focus on the workflow for initially adding the item and use the location tracking as a tag or metadata that provides basic search capability.
ASSUMPTIONS
The first assumption that must be validated through user testing is that users behavior tends to use similar categories of products during a single cleaning session. For example, cleaning a bathroom I will use those cleaning supplies and only those supplies. Thus, my cleaning session involves only those items.
A second assumption is that a user would mark an item as empty if the cleaning product were completely used up, or if the quantity was too low. Both of assumptions can be proven or disproven through end user testing and would validate the app’s workflow.
CLAUDE 3 SONNET
PROMPT
REPORT CARD
NOTES
Accuracy :: Skipped the problem deconstruction (or didn’t show work)
Completeness :: Created a pitch for app, offered app specifics
Creativity :: Suggested Bluetooth beacon tags to map storage locations and an indoor navigation to guide
RESPONSE
THE PRODUCT
CleanTrack - A smart cleaning supply ecosystem
At the core of CleanTrack is an intuitive mobile app that serves as your centralized command center. Through the app, you can effortlessly catalog every cleaning product you own by simply scanning barcodes or snapping photos. The app’s advanced image recognition will automatically identify the product.
But CleanTrack goes far beyond a mere inventory list. Using low-cost Bluetooth beacons, you can “tag” storage locations like cabinets, closets, or shelves. The app will then map the locations of all your products, guiding you straight to what you need with turn-by-turn directions.
CLEANTRACK’S MVP
Basic mobile app for iOS/Android to manually catalog products
Bluetooth beacon tags to map storage locations
Indoor navigation to guide you to specific cleaning supplies
This barebones MVP allows us to validate the core concept of smarter cleaning supply tracking and navigation. We can gather feedback, analyze usage patterns, and iterate.
POTENTIAL EVOLUTION
Auto reordering based on supply levels and usage rates
Integration with smart home systems for voice control
Augmented reality visualization of where products are stored
Suggestion engine for cleaning routines and supplies needed
The market for household cleaning products is massive. By solving a seemingly simple but prevalent pain point, CleanTrack can quickly gain traction and a loyal customer base hungry for this utility. From there, we can introduce premium features and expand into commercial/industrial markets.
NEXT
In Part 2 we’ll look at ChatGPT 3.5 and Gemini.