How to save the planet? Just add data, AI, and a little financial selfishness.
On gambling our future, changing mindsets with changing data, and a novel use of Large Language Models to understand causality.
Welcome back to my series of key takeaways from the DSC Adria Data Science conference. Part one talked AI strategy and trends, part two dived into building safer, better LLM systems, and today, we have they conference keynote by Luis Seco, Director of the Risk Lab at the University of Toronto. I was both fascinated and inspired by Seco’s vision for how data science, LLMs and multi-modal models can help us tackle global ethical and environmental issues, and I tried my best to capture his message for a broader audience.
So here are my key takeaways: there are quite a few, but trust me, they all add up to spell out Seco’s key thesis on how data, AI and a global conscience can combine to address the greatest challenges of our time.
As data changes, so does decision-making, and that should give us hope:
Seco started out by outlining how data projects have changed in recent decades. The 20th century was characterised by small datasets, lots of equations, and problems that existed in physical spaces. Even with the moon landing, for example, the dataset was small enough to be hand written. Data projects in the 21st century, by contrast, have lots of data, no equations, and the problems exist in a mental space.
Over the same time, our decision-making processes have changed, too, and the evolution of data and attitudes in the investment world is a prime example:
In the 1930s, investors made decisions based on tiny amounts of high quality data, such as a few carefully computed metrics for any given company.
With the arrival of portfolio theory and time-series analysis in the 1950s, data-driven investing was born.
In 1990, Harvard dropped all of its tobacco-industry related stocks (known as ‘divesting’), due to concerns about the health hazards of smoking.
Investors in the 2000s relied heavily on numerical methods to assess potential moves.
Nowadays, investors are influenced not only by data, but also by things like news, social media trends and ethical considerations, in a manner similar to Harvard’s 1990 divestment decision.
In other words, people are now choosing their investments based on factors that aren’t easily encoded into a company’s financial reports: they’re also listening to their conscience. This style of investment may baffle some financial experts, if they’re still thinking with a 20th century mindset. Yet, it should also give us hope…
Data science techniques + financial incentives = a more sustainable future:
Seco moved on to the UN’s Sustainable Development Goals, which, altogether, aim to “end poverty, protect the planet, and ensure that by 2030 all people enjoy peace and prosperity.” These are real and complex problems. Moreover, unlike some of our greatest historical achievements (like the moon landing), there are no equations to help us this time. Instead, finance is at the core of many of these solutions: financial motives will be a key driver, and data science methods will be our toolkit.
It may sound like a cynical, pessimistic perspective to take, yet the following example makes it pretty hard to deny. Consider the simple decision-making exercise, shown below. In Game 1, you can choose between:
Option A: Receiving $900 guaranteed (that is, with 100% certainty)
Option B: Having a 90% chance of getting $1000 or a 10% chance of getting $0
Which would you pick?
Most people would choose A. They won’t gamble in a situation that has no risk (the worst that can happen is you get nothing, but you won’t lose anything either). In Game 2, you must choose between:
Option A: A 100% chance of loosing $900
Option B: A 90% chance of losing $1000 and a 10% chance of losing $0
In Game 2, most people choose Option B. They gamble, which is common in a situation of risk like this. If you give people the choice of definitely paying something now, or risking possibly paying more later, most people will choose the latter, even if the eventual payout is a lot more.
Bringing this back to the UN’s SGDs, our problem, as a society, is that:
climate change is Game 2. Rather than making guaranteed investments in tackling the issue now, we are risking paying huge sums to recover from climate catastrophe’s in the future, all because there’s a small chance that there’ll be no cost at all.
By contrast, a “machine” would probably be more objective, and choose Option B. And this is where data science can come to the rescue.
Unlike machine learning algorithms, humans are anything but emotion-less and objective. However, this doesn’t mean all hope is lost. In 1990, Harvard decided they didn’t want to accept the risks that smoking would cause massive, negative health implications for society, and they divested, likely causing themselves immediate financial losses, given the profitability of the industry at the time. Moreover, according to Seco, with financial disagreements, you simply trade. If you’re wrong, you simply loose money, and if you’re right, you make it. This is an advantage in finance compared to many other aspects of daily life: it’s very easy to take risks.
But Game 2 doesn’t even have to be a financial lose-lose situation (a phrase I despise having to say, since what’s a bigger win, both financially and in so many other, more important respects, than saving our planet?). There can be strong financial incentives for building a more sustainable future. Article 2.1.c of the Paris Accord even makes this explicit, saying that to achieve the goals of Article 2.1 and 2.2—such as stopping global temperatures from rising and switching to lower-greenhouse gas production methods—our best chance is to make financial flows in-line with these goals.
Better causal models can save our decision making. And our planet:
What we need, then, are new ways to understand investment opportunities from more ethical perspectives. Fortunately, this is happening now, and faster than ever before. Traditional data sources are being replaced by unstructured ones: Instead of numbers we can now deal with words, images, and more. There’s a lot more information in text data, for example, and LLMs are making it so much easier to unlock those insights (see Benn Stancil‘s “Avg(text)” for a very practical example). Soon we’ll have data from sensors, satellites, videos, and diverse other sources. These are already having huge impacts in agriculture, and they’re going to give us brand new ways to tackle the UN’s SDGs.
What makes Seco so sure of this? He demonstrated this with another question: “Do you know where most people die?”
Well, what do you think? I can tell you the answer is the same across all countries, cultures, and demographics, and by a huge margin. Are you ready to find out?
That’s right: it’s the hospital. But does that mean you should avoid going there if you’re sick or injured? Of course not. This is not a causal relationship.
Unfortunately, much discussion with climate change skeptics involves confusion about causality. On one side, people who don’t believe temperatures are rising as a result of human activity see cold, rainy, flooding weather as justification for their doubt. That is, they assign a causal relation which isn’t there. Meanwhile, climate scientists struggle to prove concrete causal relations, due to their complexity and to the fact that so much vital information is locked up in non-structured data, which current data science methods can’t handle. Even LLMs can’t deal with it (yet). To illustrate, Seco gave the example of telling an LLM “I had dinner at 6pm and went to a movie at 8pm,” and asking it, “Was I alive at 7pm?“. The model, remarkably, failed to understand such a basic A leads to B relation1. But once models like it finally do learn about A —> B relations, a lot of things will change.
Consider, for example, the way we measure and rank Co2 emissions per country. Looking at pure totals is obviously misleading, but dividing by population may be, too, due to the entirely different ways that individuals live in different parts of the world. We should consider additional factors, too, like total number of cars, total amount of developed landmass, and so on. Otherwise, countries will keep on setting their own emissions targets (using who-knows-what methodology), and the Paris accord will keep on talking about hitting those targets, but all that effort may be misguided.
LLMs and Multi-Modal Models can help us understand these complex causal relations:
If our current models for ranking Co2 emissions are too simplistic, Seco’s team propose an alternative: Emission benchmarking and chromodynamics. Full details can be found in the paper, “Towards Automating Causal Discovery in Financial Markets and Beyond,” which is well worth a read to see a novel use of LLMs. The basic idea involves support vector regression using features like population, surface area, and more. These feature variables are represented via Directed Acyclic Graphs (DAGs), and the relationships between those features are quantified using both numerical data and LLMs to explore causality. The result of their research include graphs by country showing who is producing more than their budget should be.
Now, if you’ve read this far, you may be thinking, “That’s interesting, but how on earth can we generalize this to other societal issues? How could I, as a data professional, take inspiration from all of this?
The message I take from all of this is that whatever the problem, no matter how daunting and divisive, we’ll have a better chance at solving it if we can put aside our differences and look at the data. And we can’t just look at that data using the same tools and mindsets we’ve used before: instead, we need new frameworks—ones which use data in creative, wholistic, novel, qualitative ways—to help us make decisions.
Emission benchmarking is just one possible approach, for one specific problem. But what it represents is a crucial goal. One which we should all be striving for: It’s about using data to find a way away from Game 1 or Game 2, to something more wholistic, nuanced, and ultimately, better for all of us.
Katherine
For more on LLMs’ weaknesses at learning A <> B factual relations, see the already famous paper, The Reversal Curse.