The Dark Side of Big Data

How the 'refi bubble' in the 2000s was my Cassandra moment in finance

Finding Fraud in a Flood of Data

While I’ve spent most of the space here describing positive experiences, not everything has been so rosy. And if I’m to be fair about what I’ve learned along the way, I should also include some of that here. I worked in the Artificial Intelligence Group at Countrywide Home Loans from 2004 to 2007. If you’re aware of recent financial history you’ll know why those years “matter” more than others. The peak of the refinance boom was late in 2003, when tax law and changes to regulations governing financial institutions created a (fool’s) gold rush in the real estate market. And to bookend my my time there, the financial crisis began to show outward signs in July of 2007. I use the phrase “outward signs” deliberately, as there were plenty of signs within Countrywide that all was not well in the world of finance.

I managed three groups of analysts that tested the main underwriting systems that provided more than 99% of mortgages funded by Countrywide. Much of the work surrounded changes to guidelines that allowed more loans to be underwritten. This was not something that CHL hid from its shareholders, regulating bodies or its employees. In many ways those regulations were undercut in fairly innocuous ways – much like the “boiled frog” analogy that’s often used. But those changes were within the boundaries set by Fannie Mae, Freddie Mac and the Fed so no one was the wiser. It was when I saw a new loan category which attempted to codify the “Friends of Angelo” loans that my eyebrows went up – and I wasn’t alone. All of these loans would have otherwise failed underwriting checks, or – would have been priced at a substantially different rate than their approval under these “guidelines”. When I (and a few others) balked at this change, executives reassured us that all of the governing bodies had signed off on it. So the work proceeded, and the result is now well known and in the public domain. But the main reason why I left Countrywide in early 2007 was what I saw behind the scenes in “servicing”. Usually underwriting groups don’t see what’s in the servicing database. This is system where loans have been bundled into mortgage backed securities – old business, and not very interesting. This is actually where Countrywide made most of its money. It had loans that other companies had underwritten – not just CHL loans. However, each loan that’s bought by the wholesale group also has to pass the same checks that CHL loans must go through – and those systems are part of my portfolio. Because of that, our group had a specific interest in keeping fresh batches of loan data for testing changes to the compliance and other underwriting systems under purview. One of the wrinkles of the system was that any loan with a credit report older than 90 days would automatically kick out with a “refer” decision. (Refer is the CHL colloquialism for “reject” but was seen as a more palpable term.) So we were in the habit of contacting the secondary marketing group which managed that database and get a swath of loans that recently went into servicing – and therefore still had known-good credit reports that would pass through the compliance engine without tripping over the 90-day rule check.

When we received a sample data set, the record count was often in the millions. This was considered “a thin slice of the pie”, since the full database contained 1 of 6 loans serviced in the US, the largest in the industry. I mention this because when we started getting feeds of data that had almost no prime loans in it (conforming loans was thought to constitute the bulk of loans that CHL bought), we thought the Secondary Marketing Group had either mis-attributed their data pull or was playing some kind of joke. To the contrary, we were assured by that group that the data we received was a general survey of what had recently gone into servicing. If there were no new prime loans there, then there would have been no A tranches in the bundled mortgage backed securities in which they were delivered. This, too is now matter of public record. And from my team’s perspective it got worse than that. When we started to process the “accepted” sub-prime loans through the compliance engine, most of them received a “refer” decision. We scratched our collective heads and assumed that it was because of changes to guidelines between the time they were underwritten/bought and the time that they went into servicing. So as an experiment, I set up a server with a version of rules that would have been used against a batch of loans at the time they were purchased by Countrywide. Again they all came back with a “refer” decision. I noticed that all of the loans came from one division within Countrywide, and began to suspect that certain servers used by that division were configured with a very old version of the rules engine that was known by the group to allow more loans to receive an “accept” decision. When I reported this back to the Secondary Marketing Group I was told that executives were looking into it, but someone that previously worked in CHL’s fraud division took me aside and assured me that nothing was going to come of it. At that point I started laying the groundwork to depart Countrywide, and as they say – the rest is history.

I later took a position at Western Asset Management Company, one of the largest fixed-income asset managers in the world. By looking at their industry position and the reputation of the company, I thought I had found a company with the proper restraint to avoid the problems I found at Countrywide. While the issues at WAMCO weren’t as broad, there were similar points of failure where “intelligent” systems originally designed to flag improper human behavior were ignored or bypassed by staff. It was 2008, while the credit crisis was still looming over the markets, it still hadn’t overflowed onto “main street”. I was managing QA for all four working groups at the company – front office, back office, web applications and reporting. “Reporting” centered on performance – the main mechanism by which company executives viewed trade activity, and secondarily, the system by which traders tracked their bonuses. I spotted a particular line-item for a trading desk having to do with a cross-trade that looked unusual. I assumed that counter-party or trade type was mis-labeled, and went to one of my staff in charge of testing the settlement system. Everything we found there was consistent with the line item in the report, which is both good news and bad news. This is something that would normally cause the settlements system to throw a warning or error, as a cross trade struck at an above-bid price seemed to be a pretty clear violation of compliance rules. I took my findings to the managers of the development groups – assuming that it was a technical mistake that would be quickly corrected. Instead I was told to “go back to QA” and was later informed that I would have no role in reviewing the reporting system. As it happened I was also tasked with improving the overall process of testing and release management across those same working groups, and from that point forward any suggestions I offered were summarily stonewalled. I suspected that I had upset the wrong people at a small company – a company with a “flat structure” on paper, but a tightly guarded (and unspoken) hierarchy that was manipulated to the advantage of long-time employees there. I was the new guy, and was making the wrong kind of waves. After I realized that there was no executive “appetite” to correct the issues there, I left the company. And like Countrywide, the result of Western Asset’s actions during the credit crunch of 2008 is now a matter of public record. With the release of Weapons of Math Destruction, it will become a form of pop culture sport to malign algorithmic systems. But my experience tells me that it’s not the computer software that’s at fault, but the avarice of the managers who willfully ignore or intentionally alter those systems to suit their own agenda. That’s the most important lesson I’ve learned from this episode of my career – algorithms don’t lie, it’s the lying liars and the lies they tell that are at issue. Whether results come from a computer or a person flipping beads on an abacus, either can be willfully misconstrued to serve someone’s greed. To blame big data is to blame the messenger.