Based in Colorado, Director Intel is an aggregation platform of information related to companies and company directors listed on the Russel 3000 Index, based on data available with the U.S. Securities and Exchange Commission (SEC).
The company wanted to build a real-time aggregation platform for the Russell 3000 Index, an equity index of 3,000 of the most extensive US-traded stocks.
Future users of the platform would get accurate information sourced from the SEC, such as the valuation of a director’s stocks, contact information for the board, and the company’s Environmental, Social, and Corporate Governance (ESG) record. Similar information available in the old version of the platform was:
Director Intel chose TechVariable to develop the platform because of its technical expertise, offshore development capabilities, and proven fast turn-around. We worked on the initial scope for three months as planned. Director Intel has since decided to extend its engagement with us to strengthen the project further.
We have taken a microservice architecture approach since the client wanted to make it scalable in the future as needed. Also, the client gave the scope in a phased manner. we used the Django rest framework as a back-end web framework. For DB we have used PostgreSQL and for the front end, we have user react JS.
The crucial information processing happened using some NLP algorithms since most of the data was publicly available in textual format but highly unstructured. For example to extract meaningful entities we have used the Spacy custom pipeline as an entity extractor. Based on those entity values we tried to come up with a relevance score for each of the documents we have scraped from the web. The scoring algorithm is based on semantic analysis and word embedding. After finding high-scoring documents from the web, we use our custom-made document parser to extract and visualize relevant info in the front end.
There are some third-party services like iexcloud being used to get real-time information like stock price etc.
For the searching mechanism, we used Elasticsearch to reduce the load on the DB as well as to improve search results.
This module was responsible for extracting data from public sources. The data was in textual format and unstuctured.
This module was responsible for providing context to the data. This module scored the data based on relevency and associated the data with appropirate business entity.
This module is build on top of elastic database so that it can provide a robust searching and sorting functionality. We implemented fuzzy and phonetic search here.
1) Easy indexing using elastic search made the resources easily searchable.
2) Ability to share resources within the platform and outside via shareable links helped the reach of the platform.
3) Use of NLP to extract data from an unstructured format and showing as a report helped the client to visually understand the data at a glance.