How to setup your data organization for success in the era of the full stack data engineers

Bharath Natarajan
5 min readAug 1, 2022
Photo by Kampus Prodution on Pexels

I am sure today’s data leaders will agree — Running a data team in a large organization or even a startup is not for the faint of heart (I am only half kidding). Challenges are starting to appear from every direction. Here are a few –

1. Talent Shortage

2. Exploding number of data sources and data volume

3. Increasing cloud cost

4. Complexity of the vendor landscape

5. Internal Organization Complexities — Shadow BI teams creating alternate architectures

6. Rising user expectations and demands

7. Running reliable data operations across multiple tools while adding new capabilities all the time

That was just a short list, I can keep going on. One of things you must get right to mitigate these challenges is to setup your organization for success.

Traditional data and BI teams were organized were around 3 distinct skill sets –

1. ETL Team

2. Business Intelligence Team

3. Business Analyst/Project Manager

Traditional ETL products like Informatica are very difficult to pick up and will take many years to get good at. This very specialized skill was highly valued and there usually was a team lead or manager taking care of the funnel for this team and support for production loads.

The BI team specialized on building the data model and the dashboards in any legacy BI tool like OBIEE, Business Objects, MicroStrategy, Cognos among others. Then there is a third group of team members who specialize on understanding the source system data and help with translating the requirements from the business functions into ETL and BI designs. This role is optional and mostly present in large organizations.

Traditional data organization worked for a long time

Fig. 1: Traditional Data Team Organization

This traditional data organization worked for almost 25 years from early 1990s till about 2015. It worked because number of source system in companies were limited — One or two ERPs and CRM systems with a few satellite applications. Business users were well trained to expect new requirements to take anywhere from 3–6 months for small changes and 9 months to 2 years for large new projects. From a design perspective star schema was the way to go and the data warehouse was designed for performance and storage optimization perspective on Oracle, Teradata and SQL Server.

With the traditional approach the project funnel was negotiated with the Director/Senior Manager of the data team and then business BI leads write a detailed functional requirement followed by a large project team building, testing and rolling out the solution over several months to years.

Rise of shadow BI teams

One of the fallouts of this approach was a huge development of shadow BI teams in the businesses who built their own solutions with data from the traditional BI tools as they could not wait for these large development turn arounds. This was primarily happening in the mid-2010s as the traditional data organization was not working any more to satisfy the increasing speed of the business data demands.

Emergence of the modern data stack and rise of the full stack data engineer

The modern data stack built around the cloud data warehouse like Snowflake, Redshift or Google Big Query started emerging as a viable option around 2018. Many easy-to-use tools for loading, transforming and visualizing data from these cloud data warehouses started to become mainstream. Think Fivetran, Matillion, Airbyte for loading, dbt, Snowflake Tasks etc. for transformation and Tableau, Power BI, Looker for visualization.

What this stack enables is that one developer can now fully build end to end solutions. Granted the developer has to be very talented but they can start with one strength like ELT/ETL, but they can certainly pick up the other skills like visualization etc. It is not the same as before with the hugely complex tools like Informatica, Cognos/MicroStrategy/OBIEE etc., and Oracle/Teradata databases where you need years of experience to be good at. What is essential to get started is very good knowledge of SQL and to some extent Python. This is the era of the full stack data engineers.

The modern data organization

What the rise of the full stack data engineer enables is that the data team can now be organized around business needs rather than around technical skills like ETL or BI tools. For example, if the company is large enough there will be data requirements from multiple functions like finance, supply chain/manufacturing, HR, sales, marketing, service, product, and engineering. There will be even sub-teams within each group like for example finance might have Tax, AP, AR, Revenue, Controllership, Audit/Compliance sub-teams having different data requirements. In addition, while data tools started becoming easier to use, the number of source systems, enterprise architecture and business data needs for each of these functions have increased in complexity. Hence the traditional organization needs to be reworked.

Fig 2: The modern data organization

The modern full stack data engineers need to be good with the data tools, but they also need to be more of functional experts than they were ever before. They need to be embedded within the business teams and have a seat at the table providing their input on how to solve a given business problem from data perspective for product, engineering, sales, or other functions. This is how companies who are leading edge in data expertise like Facebook organize their data teams.

This is a totally different method of engagement from the past where after all the business decisions which affect data are made and a requirements document comes to the data team which they take and deliver in 6–9 months by which time the business would have already changed. With the modern organization approach, the data engineer can learn more about the source systems whether it is payments, product usage, ERP, CRM at a deeper level, understand the business processes, understand the KPIs which drive the business. The modern full stack engineering team will be able to deliver continuous improvements and value to the functions they work with rather than wait to be told what to build.

In conclusion, Companies who want to make data driven decisions need to empower their data engineering organization by giving them a seat at the table during the entire decision making process. With easier to use and powerful data tools and the rise of the full stack data engineer, it is critical to setup the data organization correctly for success.

--

--

Bharath Natarajan

Analytics and Intelligent Automation Architecture, Tools and Best Practices. https://spockanalytics.com