Data for logistics: What data is needed & why is data quality so important?

Fri | 28 Oct 2022 | Simacan Solutions

Data for logistics

What data is needed & why is data quality so important?

The Simacan platform collects a huge amount and variety of data from different sources. By cleverly combining all this data and applying mathematical logic to it, the Simacan platform creates even more “new data”, providing more new insights.

In this blog, we take a closer look at exactly what data is used and how, why data quality is so important, useful data for analytics and the challenges faced in data processing and data analytics.

What data is processed and generated by Simacan?

In previous blog articles, we wrote that data on its own adds little value and can be almost useless. Valuable information arises when data is processed and information from multiple sources is linked and analysed. But what kind of data processes the Simacan platform?

Simacan receives the following data from various sources: planning schedules, realisation data and traffic information. In addition, the Simacan platform uses artificial intelligence (AI) to generate “new data” such as: (preferred) routes between locations with corresponding distances and driving times, expected arrival times (ETA) and measured arrival and departure times.

Relevant insights generated from this complete dataset are insightful at any level of detail. In this, the Simacan platform distinguishes between pre-trip (before the trip), on-trip (during the trip) and post-trip (after the trip). In this blog, we will only discuss the data insights for on-trip and post-trip.

On-trip operations can be monitored and managed real-time via the Simacan platform.

Post-trip data can be obtained in several ways:

1. Raw data at trip level: With Simacan's Realised Trip Service (our push service), almost all received and generated data can be obtained. More information >

2. Dashboard with insights of a transport operation like operation size, timeliness, planning accuracy and data completeness. Trip data can also be downloaded, to do further analysis with other analytical tools. More information >

3. Quality reporting per trip at daily level: Provides insights into the reasons for limited real-time data per trip. More information >

4. Historical trip data: Trip data at stop level, available for download. More information >

The Simacan platform allows users to download and analyse trip data themselves, but our platform also offers a helping hand with a ready for use dashboard.

Data quality

Before data can be used for analyses in the mentioned Simacan dashboard, the quality of the data is assessed, as it must meet certain requirements. For instance, the data availability and completeness are examined. Incomplete data is not used in analyses for this dashboard.

By checking the availability, completeness and timeliness of the data, the reliability of the data is assessed. Incorrect data is intercepted as much as possible and not included in analyses. Unfortunately, it is impossible to determine the exact data truth, this would require actual observation by humans for every trip. Nevertheless, data can be properly assessed for reliability, because years of gained experience and knowledge help us make the best estimates to obtain the most reliable data. This is done by filtering unreliable data in order to purify it.

When is transport data unreliable?

The most common reason why data becomes unreliable and why filtering is desired is because the transport planning schedule does not match reality. Or when the times measured by Simacan differ from the realisation.
The planning schedule not matching reality is a frequent occurrence. This happens for example when the sequence of stop locations (stops) is planned differently than realised; other times a vehicle did not reach a location which was planned for or a vehicle stopped somewhere that was not planned for. It also happens that trips are modified after completion, then it becomes unclear whether the information before or after the modification is correct.

The received realisation data include GPS coordinates. These coordinates tell us where a vehicle is at what time. If the received location differs from reality (you have probably experienced this yourself while navigating; your GPS locator points to somewhere in the vicinity of a motorway while you are actually driving on it) we do not exactly know when a vehicle will arrive or depart at a planned location. There is also a possibility a delay occurs while a GPS signal is being transmitted. If this happens the Simacan platform makes an estimate, based on historical and actual data, of the location of the vehicle.

Sometimes discrepancies happen in the times measured by Simacan. These discrepancies are often the result of unreliable planning or unreliable realisation data. For example, the platform registers the departure of a vehicle, but in reality this vehicle did not depart until minutes later. This makes the measured duration at that location a lot shorter and the driving time longer than reality. If these planning versus reality discrepancies show large differences, they are filtered out of the data.

There are also parts of the trip that are already less reliable due to its characteristics. For example when a vehicle is starting a trip from a distribution centre. We do not exactly know what time that vehicle is at that location because it is already there, we only know what time it departs the DC. In short, there are multiple grounds for data rejection.

Assessing the quality of the data with above-mentioned examples filters out incomplete and unreliable data. This results in a dataset of usable data which can be processed for analyses to be displayed in the dashboard.

Why is data analytics becoming increasingly important?

The market is shifting, a few years ago it was mainly about gaining real-time visibility in logistical operations, now it is becoming more and more about making operations efficient and sustainable, this requires insights. Analysing and interpreting trip data contributes to gaining those insights.

Transport data can be used for various purposes. For example, gaining insight into the timeliness of operations, the billing of operations, or shortening planned dwell times at locations, but of course also being able to quickly identify and solve bottlenecks.
In short, data helps you make better informed supply chain decisions, improve customer service, track and manage performance and reduce costs. Companies can become more agile and responsive to the ever-changing needs of their customers.

What issues are in play in data analytics for logistics?

Incomplete or poor quality data can cause incorrect and unclear results. Misinterpretations by users can also play a role in incorrect analyses. That is why it is very important to think carefully about the specifications of your data thresholds; a threshold is a value. If the collected data value does not suit the threshold value then it indicates that this kind of data might lead to poor performance. What does this exactly entail? Again we provide some examples for clarity:

1. Should manually cancelled trips be included in the invoicing? What about trips which were not cancelled but it remains unclear whether they were carried out or not?

2. Has a vehicle arrived on time at a location if we have defined 15:00 as an 'on time’ value, but we know it was not there at 14:55 but it was at 15:05?
In the platform, the arrival time is the first GPS coordinate measured at the specified location. But with a low update rate of GPS coordinates, the vehicle might have arrived earlier.

3. Has a vehicle arrived on time at a location if we have defined 15:00 as an 'on time' value, but is measured to be near the location at 15:00?
In the platform the arrival time is the first GPS coordinate measured is near / at the specified location.

4. The average dwell or stop time for a location is determined from the historically measured dwell times. Suppose a driver regularly takes his lunch breaks at a certain location, is it desirable for these breaks to be included in determining the average dwell time of that location? And is it a problem if a dwell time is included in a calculation while the measurement deviates 30 minutes from reality?

5. How can it be evaluated whether the planned driving time is accurate if the measured driving time includes dwell and break times? While planned driving times do not include these?


So every transport organisation has its own, predefined thresholds or framework to operate within. You choose the thresholds for your framework yourself and this determines the eventual quality and usability of your data.

The Simacan platform encourages cooperation with all supply chain stakeholders, because knowledge transfer and coordination are important and necessary. That’s why we help our users understand the supplied data properly, in order to set the right framework and avoid misinterpretations.

Why a ready for use dashboard?
Can't I do it in Excel?

Simacan users can download, interpret and analyse data themselves, but they can also use our Transport Performance Monitor-dashboard. You might think why a ready for use dashboard, can't I just use Excel? Simacan users can of course choose to process their data themselves, but this requires time and a lot of expertise. Analysing in Exel can be difficult to carry out, as it involves a lot of data.... often too much data for one Excel file. For these amounts, you need extensive analytical tools or some coding is necessary.

Combining data from different sources adds to the trickiness of analysing data yourself. In addition, the correct interpretation of the data is very important, again, requiring time, knowledge and experience. For example, you need to know exactly which data fields to use to assess whether a vehicle has arrived on time. You need to know whether the measured arrival times are correct. In addition, if arrival times are missing, you will need to know why they are missing before you can safely leave them out of your analysis. In other words, data quality plays a very large part (!) in correct interpretation. Once you have all this in place, the final - and not to overlook step - is to create visually appealing and clear figures and diagrams. You have to be sure on how to create these so you can easily draw conclusions.

Simacan understands all this takes time and effort, money and expertise. That is why we prepared this ready for use dashboard for our users, the first step towards gaining those insights into your transport operations.

Curious about what we can do for your organisation? Please contact us for more information or a free demo.

Authors: Anne Siersema & Marije Gemmink (Simacan Data Scientists)

More about Simacans Transport Performance Monitor

Request a free demo

Discover the benefits of Simacan by using it. Request a free demo today!