How to save the planet? Just add data, AI, and a little financial selfishness.
On gambling our future, changing mindsets with changing data, and a novel use of Large Language Models to understand causality.
Welcome back to my series of key takeaways from the DSC Adria Data Science conference. Part one talked AI strategy and trends, part two dived into building safer, better LLM systems, and today, we have they conference keynote by Luis Seco, Director of the Risk Lab at the University of Toronto. I was both fascinated and inspired by Secoâs vision for how data science, LLMs and multi-modal models can help us tackle global ethical and environmental issues, and I tried my best to capture his message for a broader audience.
So here are my key takeaways: there are quite a few, but trust me, they all add up to spell out Secoâs key thesis on how data, AI and a global conscience can combine to address the greatest challenges of our time.
As data changes, so does decision-making, and that should give us hope:
Seco started out by outlining how data projects have changed in recent decades. The 20th century was characterised by small datasets, lots of equations, and problems that existed in physical spaces. Even with the moon landing, for example, the dataset was small enough to be hand written. Data projects in the 21st century, by contrast, have lots of data, no equations, and the problems exist in a mental space.Â
Over the same time, our decision-making processes have changed, too, and the evolution of data and attitudes in the investment world is a prime example:
In the 1930s, investors made decisions based on tiny amounts of high quality data, such as a few carefully computed metrics for any given company.
With the arrival of portfolio theory and time-series analysis in the 1950s, data-driven investing was born.
In 1990, Harvard dropped all of its tobacco-industry related stocks (known as âdivestingâ), due to concerns about the health hazards of smoking.
Investors in the 2000s relied heavily on numerical methods to assess potential moves.
Nowadays, investors are influenced not only by data, but also by things like news, social media trends and ethical considerations, in a manner similar to Harvardâs 1990 divestment decision.
In other words, people are now choosing their investments based on factors that arenât easily encoded into a companyâs financial reports: theyâre also listening to their conscience. This style of investment may baffle some financial experts, if theyâre still thinking with a 20th century mindset. Yet, it should also give us hopeâŠ
Data science techniques + financial incentives = a more sustainable future:
Seco moved on to the UNâs Sustainable Development Goals, which, altogether, aim to âend poverty, protect the planet, and ensure that by 2030 all people enjoy peace and prosperity.â These are real and complex problems. Moreover, unlike some of our greatest historical achievements (like the moon landing), there are no equations to help us this time. Instead, finance is at the core of many of these solutions: financial motives will be a key driver, and data science methods will be our toolkit.
It may sound like a cynical, pessimistic perspective to take, yet the following example makes it pretty hard to deny. Consider the simple decision-making exercise, shown below. In Game 1, you can choose between:
Option A: Receiving $900 guaranteed (that is, with 100% certainty)
Option B: Having a 90% chance of getting $1000 or a 10% chance of getting $0
Which would you pick?
Most people would choose A. They wonât gamble in a situation that has no risk (the worst that can happen is you get nothing, but you wonât lose anything either). In Game 2, you must choose between:
Option A: A 100% chance of loosing $900
Option B: A 90% chance of losing $1000 and a 10% chance of losing $0
In Game 2, most people choose Option B. They gamble, which is common in a situation of risk like this. If you give people the choice of definitely paying something now, or risking possibly paying more later, most people will choose the latter, even if the eventual payout is a lot more.
Bringing this back to the UNâs SGDs, our problem, as a society, is that:
climate change is Game 2. Rather than making guaranteed investments in tackling the issue now, we are risking paying huge sums to recover from climate catastropheâs in the future, all because thereâs a small chance that thereâll be no cost at all.
By contrast, a âmachineâ would probably be more objective, and choose Option B. And this is where data science can come to the rescue.
Unlike machine learning algorithms, humans are anything but emotion-less and objective. However, this doesnât mean all hope is lost. In 1990, Harvard decided they didnât want to accept the risks that smoking would cause massive, negative health implications for society, and they divested, likely causing themselves immediate financial losses, given the profitability of the industry at the time. Moreover, according to Seco, with financial disagreements, you simply trade. If youâre wrong, you simply loose money, and if youâre right, you make it. This is an advantage in finance compared to many other aspects of daily life: itâs very easy to take risks.
But Game 2 doesnât even have to be a financial lose-lose situation (a phrase I despise having to say, since whatâs a bigger win, both financially and in so many other, more important respects, than saving our planet?). There can be strong financial incentives for building a more sustainable future. Article 2.1.c of the Paris Accord even makes this explicit, saying that to achieve the goals of Article 2.1 and 2.2âsuch as stopping global temperatures from rising and switching to lower-greenhouse gas production methodsâour best chance is to make financial flows in-line with these goals.
Better causal models can save our decision making. And our planet:
What we need, then, are new ways to understand investment opportunities from more ethical perspectives. Fortunately, this is happening now, and faster than ever before. Traditional data sources are being replaced by unstructured ones: Instead of numbers we can now deal with words, images, and more. Thereâs a lot more information in text data, for example, and LLMs are making it so much easier to unlock those insights (see Benn Stancilâs âAvg(text)â for a very practical example). Soon weâll have data from sensors, satellites, videos, and diverse other sources. These are already having huge impacts in agriculture, and theyâre going to give us brand new ways to tackle the UNâs SDGs.
What makes Seco so sure of this? He demonstrated this with another question: âDo you know where most people die?â
Well, what do you think? I can tell you the answer is the same across all countries, cultures, and demographics, and by a huge margin. Are you ready to find out?
Thatâs right: itâs the hospital. But does that mean you should avoid going there if youâre sick or injured? Of course not. This is not a causal relationship.
Unfortunately, much discussion with climate change skeptics involves confusion about causality. On one side, people who donât believe temperatures are rising as a result of human activity see cold, rainy, flooding weather as justification for their doubt. That is, they assign a causal relation which isnât there. Meanwhile, climate scientists struggle to prove concrete causal relations, due to their complexity and to the fact that so much vital information is locked up in non-structured data, which current data science methods canât handle. Even LLMs canât deal with it (yet). To illustrate, Seco gave the example of telling an LLM âI had dinner at 6pm and went to a movie at 8pm,â and asking it, âWas I alive at 7pm?â. The model, remarkably, failed to understand such a basic A leads to B relation1. But once models like it finally do learn about A â> B relations, a lot of things will change.
Consider, for example, the way we measure and rank Co2 emissions per country. Looking at pure totals is obviously misleading, but dividing by population may be, too, due to the entirely different ways that individuals live in different parts of the world. We should consider additional factors, too, like total number of cars, total amount of developed landmass, and so on. Otherwise, countries will keep on setting their own emissions targets (using who-knows-what methodology), and the Paris accord will keep on talking about hitting those targets, but all that effort may be misguided.
LLMs and Multi-Modal Models can help us understand these complex causal relations:
If our current models for ranking Co2 emissions are too simplistic, Secoâs team propose an alternative: Emission benchmarking and chromodynamics. Full details can be found in the paper, âTowards Automating Causal Discovery in Financial Markets and Beyond,â which is well worth a read to see a novel use of LLMs. The basic idea involves support vector regression using features like population, surface area, and more. These feature variables are represented via Directed Acyclic Graphs (DAGs), and the relationships between those features are quantified using both numerical data and LLMs to explore causality. The result of their research include graphs by country showing who is producing more than their budget should be.
Now, if youâve read this far, you may be thinking, âThatâs interesting, but how on earth can we generalize this to other societal issues? How could I, as a data professional, take inspiration from all of this?
The message I take from all of this is that whatever the problem, no matter how daunting and divisive, weâll have a better chance at solving it if we can put aside our differences and look at the data. And we canât just look at that data using the same tools and mindsets weâve used before: instead, we need new frameworksâones which use data in creative, wholistic, novel, qualitative waysâto help us make decisions.
Emission benchmarking is just one possible approach, for one specific problem. But what it represents is a crucial goal. One which we should all be striving for: Itâs about using data to find a way away from Game 1 or Game 2, to something more wholistic, nuanced, and ultimately, better for all of us.
Katherine
For more on LLMsâ weaknesses at learning A <> B factual relations, see the already famous paper, The Reversal Curse.