Testing AI's Limits: A Real Estate Developer's Experiment

Dec 17, 2024

My friend

writes insightful pieces about AI on his Substack, Robots for The Rest of Us. He recently published a post with the central thesis that “2024’s genAI won’t work as your replacement, but it will do fine as your assistant.” It inspired me to try and create a challenge to investigate how the tools available today would be useful for someone researching and working in real estate and development. I also enjoy a good intellectual debate with David, so I decided to put his hypothesis to the test with a real-world challenge.

A skilled artisan takes time to test and evaluate a new tool before incorporating it into their daily routine. They ask questions like: What does this tool do well? When should I use it? When will it save me time? With this mindset, I considered how to assess the usefulness of some of today’s free tools. AI, the technology zeitgeist of the 2020s, is beginning to permeate all aspects of life, much like the internet did in the 1990s. Resistance is futile. The challenge is to avoid investing too much effort into any nascent technology, as it may be very different very soon.

In my experience, research is the only real use case for AI that I have found helpful as a real estate developer and operator. Having conducted significant research and analysis in my professional career, I quickly learned that analysis has little value if your data sources are unreliable. As they say, "garbage in, garbage out." Even with a simple calculator, if the user does not realize they made an error inputting the problem, they can end up with false confidence, assuming the calculator did the math correctly.

Don't get me wrong, I've found some AI platforms to be fantastic copy editors and graphic artists, especially for bloggers and content creators. Nearly every image I use is AI-generated, and every blog post is AI-copy-edited. We all know that art is subjective, so supervising AI-generated images is as simple as swiping left or right. Copy editing is a bit more complex, but essentially the AI checks my work, and I approve it. In this test, we chose a task that required us to verify the AI's work. If AI is truly an assistant, it should be able to handle a variety of low-level tasks, freeing up my time for more important work.

Unlike the errors possible when using a calculator, the challenges AIs face are much more complex. There are many more places for the AI to make mistakes, requiring even greater vigilance than a simple calculator. Just because you got a smart-sounding solution with a descriptive answer doesn’t mean it is either smart or correct. This challenge was inspired by one of those r/showerthoughts moments while sitting on the toilet. For some reason, I became curious about the water cost of flushing a toilet and the possible operational savings of low flow toilets.

Not knowing what critical data would be needed to generate the estimate, I figured a location, number of residents, number of apartments, and number of toilets would probably be needed in some combination. Here’s the exact prompt:

Given a building in Brooklyn with 5 toilets, 4 apartments, and 10 residents, what would be the expected annual savings of replacing all the 3 gallon per flush toilets with 1.6gpf toilets.

If I assigned this to a human analyst, I would instruct them to find the cost of water in NYC and an estimate of a person’s average toilet usage. From there, they should be able to figure out the math on their own. Once you have all the numbers, this is no more than 8th or 9th grade algebra. While I was planning on creating increasingly challenging problems, the fact that none of the platforms got this right was a bit disappointing. We’ll save the harder problems for future posts.

Here are the results by platform, presenting the primary errors of each.

Bing: “According to NYC DEP the cost of water is approximately $0.014 per gallon”

Claude: “rate of $0.00465 per gallon (NYC rates)” but used the incorrect rate in the calculation

Gemini: made a dumb mistake calculating the per gallon water rate and…

…also made a mistake calculating the average number of flushes per toilet per day.

Grok: “Assuming a cost of about $0.011 per gallon for combined sewer”

ChatGPT: “The average cost of water in Brooklyn (or NYC) is around $4 per 1,000 gallons”

In 2024 the NYC DEP rate for residential water is $12.61 for 100 cubic ft (748 gallons) which translates to $0.017 per gallon. As I reviewed the responses, they all came back with the same amount of water savings since they all used the same estimates. Bing underestimate the cost by 16%, Claude underestimated the cost by 72% , Gemini overestimate the cost by 1000%, ChatGPT underestimated by 68%, and Grok underestimated the cost by about 34%. What’s worse is Gemini’s flushes per day calculation somehow includes the number of apartments. While it knows there are 10 residents and uses and average of 5 flushes per day, Gemini somehow calculates 12.5 flushes per day instead of 50, for the entire building. Lastly, four of the five models (Bing, Gemini, ChatGPT, and Grok) used an average of 5 daily flushes per day while Clause used 20.

If we assume 5 flushes per person per day is the correct average, the correct answer should be $430.73 in savings. Instead, here are our result in order of '“correctness”: Bing $357 .70, ChatGPT at $511, Grok at $280.95, Gemini $5429.37, and Claude at $8,687.

Given this relatively simple and tiny test, I would have to say that I am far from agreeing that “…it will do fine as your assistant.” With all the billions invested and ongoing media circus, there are still some very basic middle-school level problems that need solving. If I had an assistant whose sources I constantly need to verify, made simple data translation errors looking up data, and created incorrect algebraic translations for simple word problems, I would fire them and do all the work myself. My guesstimate is that at this time, the free versions of these platforms are operating at the level of a 6th or 7th grader. If you can’t trust the sourcing of the underlying data, the application of that data correctly, or the correctness of basic computations, it is hard to see how any of today’s free AI platform products can add any value to objective quantifiable problems.

While AI tools have shown promise in assisting with research and analysis in real estate, they still have significant limitations. As we saw in our challenge, even relatively simple tasks can result in varying degrees of accuracy across different platforms. Our test underscores the importance of human oversight and verification when using AI as an assistant. Reflecting on David's thesis that "2024's genAI won't work as your replacement, but it will do fine as your assistant," it's clear that while AI can be a valuable tool, it is still has a long way to go. Maybe it’s more of a high school intern than an assistant at this point.

The journey of integrating AI into our workflows is ongoing, and continuous evaluation and adaptation are crucial. For a future challenge, we may try to dive deeper into the capabilities of AI by testing its ability to analyze market trends, predict future real estate values, or some other fun stuff. Maybe I’ll ask my friend David to help me come up with that next challenge. (This is me asking 😉) Stay tuned as we explore the boundaries of what AI can achieve in the real estate industry. And as always, I welcome your thoughts and experiences—feel free to share them in the comments or reach out for personalized advice.

If you've found value in these insights and want to continue your journey of real estate wisdom, we invite you to subscribe to The Property Alchemist.

Don't let your real estate dreams remain just dreams. Empower yourself with the knowledge and insights that can turn your investment visions into concrete reality. Subscribe to The Property Alchemist today and take the first step towards becoming a master of real estate alchemy. Your next successful project is just a subscription away!

Discussion about this post

Ready for more?