If you were given 1 TB of data and asked to sort it using 1000 computers, how would you do it. This is a Google senior interview question, and you are currently watching the optimum solution. In this video, we will do the full 2-hour interview stages together and get hired!
Outline of this interview question video:
* 0:00 Overview
* 2:22 Problem Visualization
* 4:40 Design Document
* 5:22 Problem Definition
* 5:35 Requirements Analysis
* 9:20 System Design
* 14:35 Complexity Analysis
* 15:17 Implementation
* 18:23 Tests
* 19:12 Discussion
* 22:02 Conclusion
First off, let’s check out the agenda for this video. We will start by visualizing the problem. Then we will continue with a concise design document. This is probably going to be on a whiteboard in a real interview, so we will keep it short. In the document, we will define the problem, do requirements analysis, do system design using a simple diagram, and finally do the complexity analysis. Then we will actually code and implement this system. Then we’ll add some tests. And we will close it off with a discussion with the interviewers.
I will take you through the entire 2-hour interview in sections and do every section with an in-depth analysis. You are about to see the most in-depth late-stage senior interview analysis on the internet. So, sit back, relax, and turn up your brain to the max.
Now let’s move onto the problem in hand. In this question, you are given 1 TB of data sitting in a database, and 1000 computers each with 1.5 GB of RAM. And you are asked to sort this data as fast as possible.
This is a distributed sorting question asked by Google in a senior engineer interview. This was asked during the final interview round, and it is a hard question. In addition to the question being hard, the discussion held with the interviewer around other possible solutions, possible improvements, and external factors that might affect the performance, will test your deeper understanding of the topic. The source of this question is Hacker News, and you can find several versions of this question being discussed by ex-Googlers on HN. Obviously, I cannot reveal the source directly, but I am keeping a record of these questions as I find them to see how frequently they are asked. Distributed computing questions seem very frequent at senior Google interviews. On a side note, the original question was asked to be implemented using Python or C++ on a Kubernetes cluster. Finally, 2 hours is allocated for this interview, including design, implementation, and discussion. We will look into multiple approaches to solve this problem and investigate different requirements along with other variants that you might be given in an interview situation.
My simplified distributed sorting algorithm implementation:
My k-way merge and tournament tree implementations and their tests:
Chromium Project Design Document Template:
My “K-Way Merge” video, which explains k-way merge which is used extensively in this video:
My “Software Engineering Compensation Guide” video to help you estimate what you should be paid:
If you want to read or contribute, you can find this guide on:
My Algorithms Playlist:
– – – – – – – – – – –