Can ChatGPT Plan Your Retirement?
How AI and large language models are revolutionizing financial advice
The rapid development and use of large language models (LLMs), including ChatGPT, is revolutionizing almost every industry, including financial services. As these models are increasingly used to provide trusted financial advice, what are the opportunities the technologies offer—and what are the ethical dimensions to consider when evaluating them?
Andrew Lo, PhD ’84, Charles E. and Susan T. Harris Professor at the MIT Sloan School of Management, director of MIT's Laboratory for Financial Engineering, and a principal investigator at MIT'S Computer Science and Artificial Intelligence Lab, shared his thoughts on these and other pressing questions about AI and finance recently at an online event hosted by the Harvard Griffin GSAS Office of Alumni Relations. Professor Lo’s most recent research focuses on evolutionary models of investor behavior in financial market dynamics, quantifying the financial costs and benefits of impact investing, applying financial engineering and data science to develop new funding and payment models for hard tech sectors such as biotech and fusion energy, and developing applications of large language models and generative AI for providing trusted advice. He has received numerous awards, including being named one of Time Magazine's 100 most influential people in the world. His most recent book is The Adaptive Markets Hypothesis: An Evolutionary Approach to Understanding Financial System Dynamics.
Please note: The following interview has been edited for clarity and correctness. It has also been abridged, omitting Professor Lo’s opening remarks. They are included in the video above.
When it comes to financial advice, where are LLMs strong, and where are they weak?
So this is from my experience. So please don't take it as generic research that is a final pronouncement about LLMs. But from my perspective, having used it quite a bit in a variety of contexts, where I think LLMs are really strong, is things like explaining the logic of diversification, in behavioral coaching, in scenario exploration. Comparing different portfolio structures, stress testing assumptions, and summarizing various kinds of rules and regulations, including tax rules. I think they're pretty good at that.
Where they are weak is in the work of implementing those ideas and scenarios. Like, for example, doing precise tax optimization with regard to Medicare or when you should start taking Social Security; state-specific legal advice, actuarial precise calculations, trying to understand real-time regulatory changes, and ultimately, bearing legal responsibility. They don't.
So, I think that with those latter sets of tasks, you've got to approach this with a pound of salt and double and triple check. It's odd because LLMs are obviously very sophisticated pieces of software. But in my experience, they're actually pretty bad at basic math, like arithmetic and calculating percentages, even. So, you have to understand the strengths and weaknesses so that you can get the most out of these pieces of software.
For those of us who maybe are less savvy financially, I think there are two questions. One is prompt engineering. Are there tricks of the trade that can generate better advice or more sound advice, at least for a general audience? And then how do you evaluate those responses if you are not as financially savvy as a financial advisor would be?
I think that there's a real art and science to prompt engineering. In fact, there is a job description now called prompt engineers. I used to think a prompt engineer was an MIT student who showed up to class on time, but now it's a job description that a number of LLM companies are hiring in droves.
So, let me give you an example of a bad prompt and a good prompt, and specifically with regard to retirement. Well, here's a bad prompt. How should I retire? It's just too generic. And garbage in, garbage out. A better prompt would be the following. Assume you are a fee-only fiduciary advisor. Here are my goals. Constraints, tax bracket, state, assets, risk tolerance, and timeline. Provide me with: number one, base case strategy; number two, key assumptions; three, risks; four, what could invalidate this plan; five, what information you are missing. And in particular, what are you uncertain about? Hit return on that one.
The idea behind a good prompt is that it contains enough detail that the LLM can actually provide you with the appropriate information. If you don't provide that detail, they'll just give you generic gobbledygook that will not be particularly helpful.
The other trick is that typically, we don't know all of the questions that we want to ask beforehand. But it does actually make sense to spend a little bit of time away from the computer and making a list of your questions. Just like when you go see a lawyer or an accountant or your doctor, it makes sense to put together a list of questions. Because at the spur of the moment, you may not remember all the things you want to ask. But here's an even more important trick.
Where I think LLMs are really strong is things like explaining the logic of diversification, in behavioral coaching, in scenario exploration, comparing different portfolio structures, stress testing assumptions, and summarizing various kinds of rules and regulations, including tax rules. I think they're pretty good at that.
After you go through, I don't know, half an hour of back and forth with an LLM and you finally, after all of that, you get the results that you're looking for, one very helpful trick that I use is I will then ask the LLM, can you please tell me what prompt I should have used in order to generate the final answer that I was really looking for? And so, you'll have the LLM teach you how to use prompts to get the most out of it. And once you have that idealized reverse-engineered prompt, save it in a file and use it next time that you're asking a similar question.
So, there's a lot to learning how to provide the right prompt. And always, always ask the LLM, what are you uncertain about? What information are you missing? Because you want to understand the limitations of what they come up with.
If you've gotten bad advice from AI, who's liable? What's the risk there? What about current regulators like the Securities and Exchange Commission or FINRA (Financial Industry Regulatory Authority). Do they have the appropriate expertise, or are they building that to oversee AI-specific regulation?
The regulators definitely are trying to build expertise. But as typical with all regulators, they're going to be one or two steps behind the industry. Because as a regulator, you don't want to try to regulate in anticipation of what's going to happen. You want to regulate after some innovation has occurred and you understand what the regulatory oversight should be. Otherwise, you're going to be engaging in regulatory overreach. It's kind of like that science fiction movie about the pre-crime bureau. You don't want to be arresting people before they make, before they commit a crime.
So the issue is that the regulators are behind. But it turns out that now there may be five steps behind as opposed to one or two, because the field of AI is just moving so quickly. It's hard for anybody to keep up, myself included. But the truth is that right now, there is no legal liability. Obviously, machines and inanimate objects cannot be legally liable. Although, from the law's perspective, corporations, which are not humans, are nonetheless considered persons, so they bear some legal liability. But the bottom line is that we ultimately put legal liability on the shoulders of humans. And so in the case of LLMs, right now, they're tools. They're not fiduciaries. And we have to keep that in mind.
So, when we use a tool, we need to make sure that we are not abusing it. We're not using the tools outside their particular parameters. If you're trying to use a hammer to saw a piece of wood, you're not going to get very far, and you could hurt yourself as well. So it's really critical to understand it. But what's missing . . . is we need some kind of legal guardrails to deal with AI and to deal with AI responsibility and legal liability. So the AI laws, the AI fiduciary doctrine, have not yet been written. And that's a pretty big gap that we need to work on.
The problem is that we're so excited about the technology, and we're making so much progress on it that nobody wants to spend time dealing with the less fun stuff of trying to see what goes wrong. They're just exploring all the cool things that can go right. And so somebody needs to deal with those issues. Regulators, as I said, they're understaffed and overworked, and they're having a hard time keeping up. But lawmakers. They need to get into the action and start inserting themselves to try to develop these kinds of guardrails.
We need some kind of legal guardrails to deal with AI and to deal with AI responsibility and legal liability. So the AI laws, the AI fiduciary doctrine, have not yet been written. And that's a pretty big gap that we need to work on.
We're all aware of disinformation campaigns, election manipulation claims, algorithmic and cultural biases, and other instances where AI might be used for more nefarious purposes. Is there a way to audit or constrain what an LLM is learning so that we can weed out hallucinations, harmful biases, disinformation? Is that something that's being discussed as these tools are being built?
Oh, without a doubt. It's certainly being discussed. It's being discussed in academia and it's been discussed in academia over the course of the last couple of decades. The issue is not, do we understand the problem. But the question is, are we able to do something about it?
Now, we can certainly pass laws that require foundation model companies to be able to take that into account. The problem is, it's very easy for somebody to create their own foundation model that does not have those guardrails in place. And there's not a whole lot that can be done about it. If you've got a computer and you've got some data, you can create your own LLM. And it may not be as powerful as those of the big companies, but they can do a lot of damage if used in the wrong way. So I'm not sure what the answer is, but I do know that we need to spend more time and resources thinking about it in the same way that we think about other kinds of abuses, whether it's drug abuse or any other kind of crimes that could be committed using these very, very powerful tools.
How can you now test ChatGPT tools to see if they meet their requirements about trust . . . competence, reliability, alignment, accountability, and data security? Are there ways to evaluate and test these tools independently now to understand what information they're giving us, how reputable it is, and what risk there might be to us?
The way you test it is very much along the lines of how you test for human talents and abilities. You ask questions that you know the answer to. That you know the correct answer to. And you see whether or not the other party gives you that correct answer. And that's a limited way of testing. There is no perfect way of guaranteeing accuracy and viability. But once you develop enough experience, you can then get a sense of exactly what the boundaries are.
So in the case of financial advice, I gave you my own example where I asked ChatGPT 3.5 to give me some advice, and I clearly saw that there were some gaps. And as I ask it more sophisticated questions, I am able to judge whether or not it can answer. So, for example, there are certain cases in financial engineering, very complicated options, hedging strategies, where I've asked LLMs to come up with the right answer. And they can do it some of the time, but not all of the time. You can tell that it makes errors in what it does.
It's obviously harder for somebody who doesn't have the expertise to do that. So, you do need to rely on human experts, but that just means that we need to create some kind of rating system for certain kinds of expertise. In the case of humans, before you're allowed to become a financial advisor, you have to take a certification examination: the Series 65, or in other cases, the Series 7, the Series 24. There are exams that humans have to pass in order to demonstrate competence. We need to apply those same standards, but perhaps, raise those standards, because LLMs are good at spitting back what we put into them.
What we want to understand is are they getting the narratives correctly? Are they getting that cause and effect relationship? And in many cases, they do. LLMs are really quite impressive in that respect, but they're not perfect. So if we can come up with testing strategies and then certification to be able to demonstrate that this LLM on this date with this particular data set has demonstrated competence in these areas, I think that will go a long ways towards giving us comfort that we ultimately can rely on them for certain things.
How should we think about what we gain from sharing information in these systems, particularly if it's personal information, versus what we can get out of these systems? What are the risks around data security and personal information?
Yeah, so that's exactly the right way to put it. You're actually saying that there's a tradeoff between privacy and specificity. You want to be able to get good responses. In order to do that, you have to give good data. But the more data you give, the more you are vulnerable to that kind of exposure. So I think it's really always going to be some kind of a balance that you have to strike.
The way that I've approached it is, first of all, I will never put in any kind of personal identifying information like a Social Security number. You know, anything about my specific finances, the details, my bank account information. Nothing like that. But short of that, the more information you can give about what you're looking to do, what your constraints are, what your current resources are, the more information you can give, the more precise answers you will get.
So I would use a kind of a progressive approach. First, provide broad information and see what answers you get, and then provide a little bit more information and see what answers come back. So it's kind of like a gradual process where there's give and take. And, at some point, you will get to the point where you've got the information that you really need without having to give up all the information, all the personal information that you're concerned about.
The other answer that I would give is that over time, I'm hoping that the United States, as well as other countries, will come up with better laws for data privacy. Europe has been a leader in this respect. They've had all sorts of laws for the GDPR (General Data Protection Regulation) standards for dealing with various kinds of data.
Now, that's a two-edged sword, though, because while consumers may feel good about that protection, a lot of the innovators are frustrated that they are being hampered by these laws. And it's certainly more expensive to comply with these kinds of data privacy laws. So I think that's a societal bargain that we need to renegotiate every once in a while to make sure that the social contract is still acceptable to all parties concerned.
What does this all mean for the financial services industry? Will AI completely replace the human financial advising system that's familiar to all of us, if there is trust or good communication, or this element of relationship-building empathy embedded in these new systems?
So with most AI, it is not the case that it will replace humans entirely. In fact, the most effective AI is AI that is used in concert with humans. One of my colleagues, Regina Barzilay, wrote a paper years ago demonstrating that for radiologists who are in the business of detecting breast cancer, there was a big push to use AI to improve that process because radiologists are not 100 percent accurate. And so why not use various kinds of large language models and machine learning algorithms to improve that? And what they found was that radiologists may have a hit rate of something like, I don't know, 85 percent. I'm making up the numbers, I don't remember them off the head, but let's say it's 85 percent. And she was able to show that AI algorithms could match that accuracy.
However, if you take a radiologist and give him or her access to the AI and let them collaborate, then it turns out that the accuracy goes up to 95 percent. And so I think that that's really how most AI seems to be working, particularly in the context of these large language models. Left to their own devices, they cannot replace a human entirely. But what they can do is to replace certain, particularly automatable aspects of their jobs. And what that means is that those humans can be that much more productive. So that implies that the very best uses of AI will be with individuals who can harness their power. And it does mean that there will be less people employed in that particular profession, because you don't need as many workers. If one worker can now do the job of five, using LLMs to support them.
But at the same time, humans have to retain certain aspects. So AI can scale things like the monitoring of your clients, personalization, as we saw, simulation, cost reduction and coaching. But humans have to retain moral accountability, legal duty, and contextual judgment and ethical boundaries. So I don't think we'll ever replace humans completely, but there will be some pretty big shifts in employment, because now the people that can make use of AI are going to be much more employable than those Luddites who say, no, no, no, I never use AI. I'm just a straight, old-fashioned kind of a financial advisor. There may be a market for that, but that market is going to get smaller and smaller over time.
Can we leverage AI to reduce financial illiteracy in the US and potentially narrow the financial literacy gap between the top 1 percent and the rest of us?
This is exactly what [my PhD student] Jillian Ross and I are working on. We started out not with the goal of creating an LLM for financial advice for the most wealthy. Frankly, I think we're quite far away from that because the complexity of high-net-worth individuals with regard to financial planning is still beyond most LLMs. But what we realized was that for people in lower categories of wealth, the problems that they're facing are generally easier for LLMs to handle. But more importantly, these are individuals who are not being served by the financial institutions because they just don't move the needle for them in terms of revenues. And so the so-called "unbanked" and the individuals that are below a level of wealth that are of interest to financial institutions, those are individuals that could really benefit from a financial LLM that can help them make some basic financial decisions.
So what we're hoping to do over the next few years is to develop a version of a financial advisor that satisfies the fiduciary standards for those individuals who are not being served by the major financial institutions of the industry. And that really will democratize finance, because those are individuals who basically need the advice most since nobody else is catering to their needs. And where a small amount of advice early on in their lives can make a really big difference for their retirement.
We're hoping to release open source software that will allow the access to that kind of information. And we're actually partnering with financial institutions to help us with that effort because their view is that it's not competitive to their mandate because they're not servicing these people anyway. And yet, they can use these foundation models that we're developing to develop better tools for their financial advisors and clients. So it's a win-win for all parties concerned.
Can you imagine into the future, in the near term, or is it longer away where we feel fully trusting of these models because they have more human elements baked in than previous versions of these tools?
Let me give you two answers. The first answer is, yes, absolutely. There will come a time soon where we will feel trusting of these large language models. And that's dangerous. Because feeling like you can trust somebody does not mean that you should trust somebody. And so that's the caution that I want to highlight.
Unless you work with these LLMs and you realize that they have these important gaps, you won't know that the advice that they're giving you may not be trustworthy. That's why when you're engaging in these conversations with LLMs, it is always important to ask, what information don't you have? What is the risk? What's the uncertainty behind the advice that you're giving me? Always ask how sure they are. Because the way that large language models provide responses, almost always, they come back sounding very authoritative. And there are people, I'm sure, we've all met people who come across as being extremely trustworthy and extremely authoritative. And it's only later that we find out sometimes, to our great regret, that they had no idea . . .
With most AI, it is not the case that it will replace humans entirely. In fact, the most effective AI is AI that is used in concert with humans.
Like Bernie Madoff or someone who was trusted.
That's right. So I think we need to be very cautious about this. But the way to be cautious is not to avoid using them. The way to be cautious is to use them and try to understand what their mistakes are, and to talk to other humans who have used them and share our experiences in dealing with this very important set of tools.
As a final question, what does the future of financial advice look like, and how do we best equip ourselves to meet that future?
Well, I think it's both exciting and also distressing, depending on the side of the ledger you're looking at. I think that AI is going to revolutionize finance. It already has in many ways, by automating a number of tasks that used to be impossible to automate things like reading balance sheets, income statements, the footnotes in various regulatory filings. And summarizing large reams of disclosure documents. We now have the ability to do that in a very scalable way.
I think the future of AI in finance will not be completely automated AI bots making decisions on our behalf. But it will be leveraged financial investments and tools and services and products that are all imbued with AI in some form or another. So even something as simple as entering into a trade for buying and selling stock, or engaging in some kind of 401(k) plan. We will now have more powerful tools to be able to do that. But at the end of the day, we're going to have to think about this notion of legal liability, moral accountability, and ethical boundaries as uniquely human elements. And given that we're talking about something as personal as our financial wealth, this obviously applies to accountants, to medical doctors, and other advisors.
Because of that personal aspect to what we're doing, there will always have to be humans involved. We can prepare ourselves by keeping up with the role of AI in all of these contexts, and running experiments, seeing what works, more importantly, seeing what doesn't work, and publicizing that. I wish we had some kind of a bulletin board, website, or chat where we could actually engage with all of the bad cases that we come across with large language models. All of the exceptions, all of the problems that we run into. Because sharing that kind of information is going to be critical in developing the next generation of tools to improve them.
Eventually, I think we all have to care about AI and ethics. We have to spend time talking to our legislators, our regulators, and giving them feedback that we need to put guardrails in place before it's too late, before we end up engaging in some technologies that we cannot undo and will provide permanent damage to our society. So I think that this is both an exciting time, but it's also a very stressful and challenging time, and all of us need to be part of this new world.
Get the Latest Updates
Join Our Newsletter
Subscribe to Colloquy Podcast
Simplecast