The next phase of the project Maestro development cycle is to design both the product requirements and the technical stack that would be used. We are aiming for an MVP mobile app, here’s how various LLMs performed.
Note: Edited to keep newsletter short. For full results visit cloudshock.io or tap on the model names below.
OVERALL PERFORMANCE
FINDINGS
Much like my prior experiment with first principles reasoning, LLMs aren’t sophisticated enough to provide granular enough product and architectural design to be immediately actionable by a development team.
While I anticipate that eventually LLM tools could be useful enough to assist in product and engineering tasks, it still requires a human to design something that is ubiquitous, intuitive, scalable, and secure.
Bottom line: don’t spend a lot of time bouncing ideas off the major models, and certainly don’t believe the hype that these tools can magically produce GOOD software with a few AI prompts.
MY EXPERIENCE
I tried to use various LLMs for four main aspects of solution design:
Naming & Branding
Product Design (requirements)
Technical Design (requirements)
UI Design
Overall, the experience with naming was basic. While a couple of responses were novel it wasn’t significant enough to warrant a full model by model breakdown. For UI design LLMs were essentially useless, and I’d rather focus on the two aspects where I think LLMs can help.
In the realm of product design, I found most responses effective enough to write high level feature stories. These would need to be broken down to a more granular level and depending on the development team might need to be more technically prescriptive.
For technical design I found LLM responses to provide a wide variety of technologies. I focused my queries on the Microsoft tech stack that I am most familiar with, and the results were consistent.
However, compared to a skilled and experienced product and engineering team none of the LLMs are a threat. While these tools could be a great brainstorming tool, they lack the specificity to provide anything that would result in a marketable software application.
CLAUDE 3 SONNET
DESIGN REPORT CARD
NOTES
Accurately fulfilled request
User stories didn’t include customer value
Didn’t suggest anything novel
TECHNICAL REPORT CARD
NOTES
Suggested older technology, Xamarin, instead of .NET MAUI
Suggested options for all points
Options presented were standard
CHATGPT 3.5
DESIGN REPORT CARD
NOTES
Accurately fulfilled request
User stories were complete
Suggested location tracking
TECHNICAL REPORT CARD
NOTES
Suggested older technology, Xamarin, instead of .NET MAUI
Suggested options for all points
Options presented were standard
GEMINI
DESIGN REPORT CARD
NOTES
Accurately fulfilled request
User stories didn’t include customer value
Suggested location tracking
TECHNICAL REPORT CARD
NOTES
Suggested older technology, Xamarin, instead of .NET MAUI
Suggested options for all points
Options presented were standard
COPILOT
DESIGN REPORT CARD
NOTES
Accurately fulfilled request
User stories were a little sparse
Suggested location tracking via tags
TECHNICAL REPORT CARD
NOTES
Suggested older technology, Xamarin, instead of .NET MAUI
Suggested options for all points
Options presented were standard
META AI LAMA 3
DESIGN REPORT CARD
NOTES
Accurately fulfilled request
User stories were very well formatted and complete
Did not inject anything novel
TECHNICAL REPORT CARD
NOTES
Suggested older technology, Xamarin, instead of .NET MAUI
Suggested options for all points
Options presented were standard
HONORABLE MENTION
Two honorable mentions were Naming & Branding and UI Design. None of the LLM or generative models are great at this. Figma did attempt to create flow charts so it’s possible a different tool set may need to be field tested.
NAMING & BRANDING
The LLM models didn’t perform well with branding questions and provided very uninspired answers. In aggregate the best results:
Prompt:
I need to you to propose the best five product names for a mobile app to catalog, categorize and track the location of cleaning products.
Results:
TidyTrac
CleanTrack
CleanTracker
UI DESIGN
Prompt:
I want you to act as a UI designer. I will describe an application screen. You will generate wireframes.
Request: A clean layout with three main sections. The top section displays a welcome message with the user’s name. The middle section features two large, prominent buttons: “Add Product” and “My Inventory.” The bottom section showcases quick access icons for “Categories,” “Settings,” and “Help.”
Response
Figma
CoPilot