Big Data and Credit Scoring

Yale law paper on regulating machine learning algorithms in credit scoring.

In this post, I would like to provide a quick overview of a publication that I was a part of. The work addresses the emerging use of machine learning algorithms to develop credit scores.


During my time as a masters student, I was always attracted to classes that tackled the intersection of seemingly varied fields. One of the classes I took was a technology and law class. The goal of the course was to bring technologists and lawyers together to tackle emerging questions in technology policy. The class combined computer science (CS) students from MIT with law students from Georgetown University. I took the class spring of 2015.

As part of this class. The entire class was divvied into groups of 2 CS students and 2 lawyers. Each group was then assigned a topic to tackle.

Software and CS Eats the World

The overall theme of the class is one that is increasingly apparent: software is eating the world. However, as software continues to eat the world, how can we still ensure that the emergent `world’ continues to conform to already articulated rules and regulations. My group examined the use of big data and machine learning algorithms in credit scoring.

The credit score is often regarded as a very important arbiter of credit. Often, it determines who has access to housing, employment, and of course capital. Traditionally, credit scores are typically computed using simple formulas. Consequently, you and I both know what controls how credit score. However, this situation is changing. Because of the interconnected nature of the world; we all have a Facebook, Instagram, Twitter, and so on. We are increasingly leaving digital footprints of our lives online. Consequently, one can leverage these new ‘digital footprints’ to measure the credit worthiness of an individual using machine learning methods. While this approach seeks to bring efficiency to the system, it also highlights several unintended consequences:

  • These algorithms might just learn a pattern of discrimination that is already encoded in data,

  • Given the nature of online data aggregation, it is highly likely that data gathered without care will be highly inaccurate,

  • It is quite easy to infer potentially sensitive information about individuals from digitally aggregate data as seen here,

  • Input data might something that an individual has no control over, i.e gender, race etc. Should such information be allowed to go into a credit scoring process?

In the publication that resulted from this class, we sought to tackle all of the concerns noted above and others. Further, We also sought to educate lawyers about the way machine learning works and how one might go about regulating entities that use these kinds of techniques.

What I learned from the class

I learned a great deal about writing, working with lawyers, and thinking through regulatory issues carefully. Most importantly, I met really cool Georgetown law students. As you’ll see if you take a look at the publication, it is quite well written (I know), and that’s because Mikella Hurley (first author on the article) is a fantastic writer and editor. She deserves all the credit for this work.

Most importantly though, I learned from this experience that I enjoy working at the intersection of different fields. If navigated carefully, the intersection of multiple fields is really where the magic happens.