Building a tick by tick database

fintech database

PUBLISHED: 28 OCTOBER 2019
AUTHOR: David Coker

Background

I’m Course Leader for the Masters degree in Investment, Risk & Finance at The University of Westminster, School of Finance and Accounting (SFA).

In summer of 2017 we were tasked with creating and deploying a database capturing US equity prices.  Most academic databases capture information at lower frequencies such as daily, or weekly. However, because this database would support a High Frequency Trading class, tick-by-tick data was necessary. As development progressed we determined this resource could also support classes such as Banking Technology and Risk Management. Named the SFA Fintech Research Database, it captures not only financial data, but also a wide variety of big data.

Like many IT projects development was iterative. We hit the deadline in stages rather than a single large scale rollout. After the initial deployment, a project review was conducted to determine lessons learned.  This information would help improve subsequent phases of database expansion and development.

Lessons learned

 This list illustrates selected problems we encountered and how they were overcome.

  • Live in the cloud

The first problem was the difficulty of rapidly expanding databases stored on traditional platforms. Although we conducted a database sizing exercise, estimating space needed for logs from Extraction, Transformation and Loading (ETL) proved problematic.  Our solution was to migrate to the cloud, where nimble providers allowed us to rapidly expand our resource footprint on demand.

  • Full skills fast development

Another problem arose due to the way traditional IT departments are organised: along functional skills. This proved problematic for our rapidly moving project, so we brought skills in-team.  Instead of dedicated resources for required Administrator functions — Linux, MySQL Database and Network – these skills were acquired by the development team, with consultations as necessary to legacy staff. Knowledge transfer was rapid and surprisingly effective.  No longer did we have to “open a ticket and wait” while our request was queued, staffed and executed. Initially the number of problems we could solve in department was small, but grew over time with experience.

  • Reuse your old code

Since we had aggressive development deadlines we resused existing code as much as possible. Rather than creating new Python programs to deliver ETL, we repurposed existing Perl programs with slight modifications rather than develop from scratch.

  • Leverage any free data

As a University on a limited budget we must utilise free data, wherever available.  We sourced RSS feeds containing a surprisingly rich set of information, which augmented purchased financial data. Because our choices are unique the database has a clear USP.

  • Students get a win-win

We augment class lessons with project work, so students help populate The SFA Fintech Research Database.  This is a win-win for both us as well as students.

Going forward

The SFA Fintech Research Database is being upgraded to capture richer data. This will allow us to support not only our original goals, but also analytics supporting a variety of business intelligence applications.

About David Coker

David is an expert in financial technology and capital markets and is happy to discuss a wide range of financial topics including cryptocurrencies, regtech and deregulation

Prior to joining Westminster Business School this academic year, David worked in financial services for 28 years with roles at Deutsche Bank as Vice President of Global Risk Management (in New York and London) and Moody’s where he was responsible for Professional Services in Europe, the Middle East and Africa.

David has an undergraduate degree in Math & Computer Science, an MSc in Quantitative Finance, an Executive MBA (outsourcing) and a PhD in Finance (dissertation High Frequency Trading).

For the latest fintech news and events sent straight to you inbox sign up to the FINTECH Circle newsletter