How to Remove Duplicates but Keep Latest Value in Power Query [3 Methods]

In Power Query, removing duplicates and keeping the latest refers to eliminating duplicate values from a dataset while retaining only the rows with the latest or maximum values based on a specified criterion, such as a date or timestamp. This involves sorting the data in descending order, grouping it by a key column (e.g., an identifier), and extracting the latest value within each group using aggregation functions like “Max.” The result is a deduplicated dataset where the row represents each unique identifier with the most recent data. This process is crucial for scenarios where historical records exist, and only the latest information for each unique identifier is needed for analysis or reporting purposes in Power Query.

In this article, I’ll guide you through the process of removing duplicates while retaining the latest values in your dataset. I’ll be using a dataset that includes information about various products and their sales on different dates. The goal is to demonstrate how you can efficiently eliminate duplicate products and keep only the most recent entries based on the date. This step-by-step walkthrough will help you streamline your data by ensuring you have the latest and most relevant information for each product.

Dataset of remove duplicates keep latest

Here are 3 ways to remove duplicates but keep the latest values in Power Query:

Using the Sort Descending Feature in Combination with the Remove Duplicate Operation

To clean up your data with Power Query, use the Sort Descending and Remove Duplicates capabilities. You guarantee that you retain only the most recent values by sorting them from newest to oldest and then eliminating any duplicate information.

In Power Query, you can remove duplicates from a table while keeping the latest record based on a specific column. Here’s a step-by-step guide:

Step 1: Open Power Query Editor

  1. Choose the table.
  2. Select the “Data” tab when the Excel ribbon is open.
  3. After choosing “From Table/Range,” click “Get Data.”Dataset from table or ranges to remove duplicates and keep latest
  4. Turned on “My Table has a Header.”My table has a header window
  5. Your data will be displayed when the Power Query Editor launches.Dataset of removing duplicates and keep latest in power query editor

Step 2: Remove Data from the Table

  1. Click on the drop-down arrow in the “Date” column header.
  2. Choose “Sort Descending” to sort the data by date in descending order.Sort descending feature in power query editor
  3. Select the “Product” column.
  4. Go to the “Home” tab and click on “Remove Duplicates.”Remove duplicate feature from Home tab
  5. Once more, click the drop-down arrow in the “Date” column heading. Next, select “Sort Ascending” to arrange the data in ascending chronological order.Sort Ascending feature in power query editor
  6. Click on “Close & Apply” to apply the changes and close the Power Query Editor.Unique dataset in power query editor after removing duplicates and keep latest

After removing duplicates based on the ‘Product’ column, only the records with the latest date for each unique product were retained.

Final result of removing duplicates and keep latest using the combination of Sort Ascending, Sort Descending and removing duplicate feature

Using the M Codes in Advanced Editor

In Power Query, use M code in the Advanced Editor to clean your data by removing duplicate entries and retaining the latest values. This method allows you to customize and automate the process for a more tailored data transformation.

Follow the steps below to remove duplicates but keep the latest value using the M Codes in the Advanced Editor feature:

  1. Go to the “Data” tab and select “From Table/Range” to load the data into Power Query.Dataset in power query editor
  2. In Power Query Editor, go to the “Home” tab.
  3. Click on “Advanced Editor” in the “Home” tab.Opening Advanced editor from view tab to remove duplicates and keep latest
  4. Replace the existing code with the following:
    let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],  // Replace with the actual source step
    GroupedTable = Table.Group(Source, {"Product"}, {{"MaxDate", each List.Max([Date]), type date}})   
    in
    GroupedTable

    M codes in Advanced editor

  5. Click “Done” to apply the changes.Max date column in power query editor
  6. After this operation, you’ll have a new table with unique products and their corresponding maximum dates.
  7. No need to expand in this method, as the aggregation is done in a single step.

The resulting table will have columns “Product” and “MaxDate,” representing the maximum date for each product.

Final outcome of removing duplicates and keep latest in Advanced editor in power query

Using Sorting, Indexing, and Filtering Features

You can use the sorting, indexing, and filtering features in Power Query to streamline your data by removing duplicate entries and retaining the latest values. This process allows for effective organization and cleanup, ensuring that your dataset reflects the most recent and unique information. Here’s how:

  1. Open Power Query Editor.Opening power query editor
  2. In Power Query Editor, click on the “Date” column header.
  3. Choose “Sort Descending” to sort the data by date in descending order.Sort Descending feature in power query editor
  4. Go to the “Add Column” tab.
  5. Click on “Index Column” and choose “From 0” to add a new column with unique index values.Add new Index Column in power query editor from Add Column Feature to remove duplicates and keep latest
  6. Click on the filter icon in the new index column.Using Number filter algorithm in power query editor
  7. Choose “Equals” and enter “0” to filter rows where the index is zero.Filtered rows windows in power query editor
  8. Click “OK” to apply the filter.
  9. Remove unnecessary columns, such as the original index and sorting columns, to clean up the table.Filtered Rows after removing duplicates but keep latest

Click on “Close & Apply” to apply the changes and close the Power Query Editor. The resulting table will have only the latest records for each group, where the index is zero.

Final result of removing duplicates and keep latest

Conclusion

In this article, I’ve shared three effective methods to eliminate duplicates while preserving the latest values in your data. These versatile techniques can be applied to a variety of problems, showcasing their adaptability. By becoming adept at these methods, you’ll enhance your expertise in Power Query, empowering you to tackle diverse data challenges with confidence and efficiency.

Frequently Asked Questions

Can I remove duplicates based on multiple conditions in Power Query?

Yes, you can. Utilize the “Group By” feature to group your data based on multiple columns, and then apply aggregation functions to retain the latest values within each group, effectively removing duplicates based on the specified conditions.

What benefits does removing duplicates and keeping the latest values offer in data analysis?

Certainly! Here are the benefits of removing duplicates and keeping the latest values in data analysis in simple list form:

  1. Ensures the dataset contains unique records.
  2. Reflects the most up-to-date information for accurate and consistent reports.
  3. Eliminates redundancy, preventing skewed analysis.
  4. Focuses on the most recent trends, patterns, or changes for relevant insights.
  5. Supports better decision-making with reliable and current information.
  6. Crucial for understanding data evolution over time.
  7. Maintains consistency and reliability by ensuring each entity is represented once.
  8. Optimizes database performance and reduces storage space.
Rate this post

Leave a Reply

Your email address will not be published. Required fields are marked *