Architecture data warehouse at Zitga

Hi everyone, I came back before a long time, recently i received a request build a new dataware-house system at Zitga(game-studio). Several criteria that the system needs to meet :

– Crawling data from multiple resource (bigquery, appsflyer, ironsrc, appstore, playstore, server-to-server ….).
– The system will replace the current google bigquery is usage.
– The size of system about 5TB -> 40TB.
– Build tool to support datSa analysts (query, build machine learning, dashboard).
– Data collection history.
– Open ready for integration with others system.

Zitga dataware house architecture

zitga

HDFS : distributed system storage, data on multiple nodes.
Hive : reading, writing, and managing large datasets residing in distributed storage using SQL.
Spark : Spark-SQL and SparkML to build machine learning model over Hive tables.
Hue : Open source SQL assitant for databases and data warehouses.
Tableau : BI system, analytics platform.
Crawl manager : Management, scheduling data collection from multiple resources.

Sizing of cluster

Screen Shot 2020-02-19 at 4.12.24 PM

********************************

Design resource : https://drive.google.com/file/d/1JCgx1AT6podIU3Ra5cZGkDdDwNs8a1je/view?fbclid=IwAR1ZeyiaSzn-FCZJpJDcOj11bxcDsRN2296mT0tc8gnQM2EN-CRZyPKiMVs

3 thoughts on “Architecture data warehouse at Zitga

  1. Thao June 4, 2020 / 3:11 am

    Hi bro Hieu, Im curious about your Crawl manager & Import Engine. May I know what kind of technical did u build these? Thanks.

    Like

Leave a comment