Solving the "Customer Tracking" Coding Question for Amazon Data Science Roles

Solve the "Customer Tracking" coding challenge from Amazon and Shopify data science interviews. Master time-series data wrangling and session log analysis with this step-by-step solution.

Mentor

Blog

In this blog, we'll be discussing a coding question asked in data science interviews at top tech companies like Amazon, Shopify, and eBay.This question was asked at Amazon's data science interviews.

The question is related to Customer Tracking.

This type of problem is often used to evaluate your problem-solving skills, attention to detail, and proficiency in data manipulation using programming languages like Python.

A solid solution to this question can show your readiness to tackle real-world data processing tasks that are commonly encountered in industry roles.

Problem Statement

Given the users' session logs on a particular day, calculate how many hours each user was active that day.

Note: The session starts when state=1 and ends when state=0.

Data

Solution

To calculate the total active hours for each customer, we'll follow these steps:

  1. Parse the timestamp into a usable time format.
    1. Iterate through the session logs in pairs, where each pair consists of a start (state=1) and end (state=0) for a particular customer.
      1. Calculate the duration of each session by subtracting the start time from the end time.
        1. Sum up the durations for each customer to get the total active hours.
          1. Present the results in a structured format, such as a table.

            Let's implement this in Python: 

            from datetime import datetime
            
            # Session logs data
            session_logs = [
                {"cust_id": "c001", "state": 1, "timestamp": "07:00:00"},
                {"cust_id": "c001", "state": 0, "timestamp": "09:30:00"},
                # ... (other logs)
                {"cust_id": "c005", "state": 0, "timestamp": "18:30:00"},
            ]
            
            # Function to calculate active hours
            def calculate_active_hours(logs):
                active_hours = {}
                session_start = None
            
                for log in logs:
                    cust_id = log["cust_id"]
                    state = log["state"]
                    timestamp = datetime.strptime(log["timestamp"], "%H:%M:%S")
            
                    if state == 1:
                        session_start = timestamp
                    elif state == 0 and session_start:
                        duration = (timestamp - session_start).total_seconds() / 3600
                        active_hours[cust_id] = active_hours.get(cust_id, 0) + duration
                        session_start = None
            
                return active_hours
            
            # Calculate active hours
            active_hours = calculate_active_hours(session_logs)
            
            # Prepare the output
            output_data = [["Customer ID", "Active Hours"]]
            for cust_id, hours in active_hours.items():
                output_data.append([cust_id, round(hours, 2)])
            
            This code will output a table with each customer's ID and their total active hours for the day.

            ****

            If you want to tackle more coding challenges like this, why not take it to the next level?

            I'd be happy to hop on a 1:1 free call with you to provide more practice questions and guidance.

            Whether you're prepping for upcoming interviews or simply looking to sharpen your data-wrangling skills, a personalised session could be just what you need to level up.