As a response to the coronavirus (COVID-19) pandemic, Neuralet has released an open-source application that helps people practice physical distancing rules in different places such as retail spaces, construction sites, factories, healthcare facilities, etc. This open-source solution is now available as part of a standalone product at Lanthorn.ai.
This tutorial provides a technical overview of the approach to Smart Social Distancing. If you want to skip the implementation details, head to our GitHub repository to read the Smart Social Distancing application setup guide. You can read about our previous application architecture (outdated) in this article.
Familiarity with deep learning computer vision methods, Docker, and edge deep learning devices, such as Jetson Nano or Edge TPU, is recommended to follow this tutorial.
Social distancing (also physical distancing) is one approach to control and prevent the infection rate of different contagious diseases, e.g., coronavirus and COVID-19, which is currently causing one of the largest pandemics in human history.
Wearing facemasks and maintaining social distancing are the most effective solutions to contain the spread of COVID-19. However, enforcing these policies pose a significant challenge to policymakers.
Please visit this blog post to read more about why current approaches fail and why we need a more robust approach.
Smart Social Distancing application
Our approach uses artificial intelligence and computing devices from personal computers to AI accelerators such as NVIDIA Jetson or Coral Edge TPU to detect people in different environments and measure adherence to social distancing guidelines. The Smart Social Distancing application provides statistics to help users identify the crowded hotspots and can give proper notifications every time social distancing rules are violated.
A high-level perspective
Smart Social Distancing application uses existing cameras to analyze physical distancing measures. The video will be streamed from a video feed to the main device, such as an Edge AI board or a personal computer. The main device processes the video and writes the analytical data on external storage. Analytical data only include statistical data, and no video or identifiable information is stored. Different graphs and charts visualize these statistics in the Dashboard. The user can connect to the Dashboard and view the analytical data, export it, and make decisions.
The user can access the Dashboard from a web browser. To ensure security and ease of development, requests to read the analytical data are sent directly to the main device by the web browser. As long as these requests and responses are encrypted, the Dashboard server cannot access the board’s stream and the analytical data. Since no data needs to be stored on the cloud, and no video file is stored on the analytics storage, the Smart Social Distancing application completely preserves the users’ privacy.
We have achieved shippability and accessibility by “Dockerizing” the Smart Social Distancing application. The application architecture consists of two main parts that should be run separately; the Dashboard and the Processor.
The Dashboard consists of a React application served by a FastAPI backend, which provides the user with the necessary tools to communicate with the board and visualize the analytical data.
The Processor consists of an API and a Core; the Core is where the AI engine and the application logic are implemented. The API receives external commands from the user and communicates with the Core.
We will explain each component in more detail in the following sections.
Dashboard: an overview
The Dashboard is where the visualizations happen; the user can find various charts and diagrams to better understand the measured statistics. The Dashboard interface is a React app served by a FastAPI backend.
The Dashboard consists of two Dockerfiles:
frontend.Dockerfile: Builds the React application.
web-gui.Dockerfile: Builds a FastAPI backend that serves the React app built in the previous Dockerfile.
To run the Dashboard, you should build the frontend before building the web-GUI Dockerfile. Building the Dashboard is resource-intensive, and It is troublesome to build the Dashboard on devices with limited memory, such as Coral Edge TPU. Therefore, we have separated the Dashboard from the Processor to enable users to skip building the Dashboard if they wish to. However, the users would not encounter any problem building the Dashboard on other supported devices with higher capacities, such as X86 and most Jetson platforms.
Since the Dashboard is separated from the Processor, even if the Processor is not running, the user can still read the old statistics from the analytics server (of course, if the analytics server is accessible). However, any request to the Processor’s API will be failed, indicating that the Processor is not accessible at the time. On the other hand, regardless of whether the Dashboard is running or not, the Processor can run independently on the main device and store the statistics on the analytics storage. Users can access these statistics through the Dashboard at any time.
If you plan to make changes to the web-GUI Dockerfile, you need to ensure that you are using the latest version of the frontend Docker image. Otherwise, some inconsistencies may occur.
Processor: an overview
The application API, the AI core, and all the mathematical calculations to measure social distancing have been implemented in the Processor.
The Processor is made up of two main components; the Processor API and the Processor Core. The Processor API receives an API call from the outside world (the user) and sends a command to the Processor Core. The Core communicates with the API and sends the response according to the received command (if approved). Two queues, shared between the Core and the API, store the API commands and responses and enable the Core and the API to communicate.
Figure 2 illustrates the different components of the Processor. The arrows indicate the relations between these components. In the following sections, each component will be explained in more detail.
The Processor Core is where the AI engine and the application logic is implemented. The Core itself has several components and listens to the Processor’s API on a queue called the
cmd_queue. The Core creates an internal thread to perform the actual processing in a multithreaded fashion. This way, we can ensure that the Core is always listening on the
cmd_queue, even while processing videos.
Deep learning algorithms are implemented in a component named
CvEngine (which is an alias for the
Distancingcomponent in our implementation).
CvEngine is responsible for processing the video and extracting the statistical information. Note that some of the implemented algorithms, such as pedestrian detection, are device-specific, whereas other parts, extracting social distancing violations, for instance, are common between all the supported devices.
The Core keeps track of the tasks at hand and acts according to the received command. When the Core receives a command, it puts the proper response on a queue called the
result_queue, assuming that the API is waiting on the
result_queue for the response. Note that the Processor Core creates these queues and the API registers to them. If the Core component is not ready yet, the API will wait until the Core is up and ready, and the queues are accessible. We will explain these commands and their responses in more detail in the following sections.
The Processor API is where the external requests (by the user) are received. By these requests, we mean the supported API endpoints. Here is a list of supported commands with a description of what they do:
|PROCESSOR_IP:PROCESSOR_PORT/process-video-cfg||Sends command PROCESS_VIDEO_CFG to the Core and returns the response.|
It starts to process the video addressed in the configuration file. A true response means that the Core will try to process the video (with no guarantee), and a false response indicates that the process cannot start now. For example, it returns false when another process is already requested and running.
|PROCESSOR_IP:PROCESSOR_PORT/stop-process-video||Sends command STOP_PROCESS_VIDEO to the Core and returns the response.|
It stops processing the video at hand and returns a true or false response depending on whether the request is valid or not. For example, it returns false when no video is already being processed to be stopped.
|PROCESSOR_IP:PROCESSOR_PORT/get-config||Returns the config used by both the Processor’s API and Core.|
Note that the config is shared between the API and Core. This command returns a single configuration set in the JSON format according to the config-*.ini file.
|PROCESSOR_IP:PROCESSOR_PORT/set-config||Sets the given set of JSON configurations as the config for API and Core and reloads the configurations.|
Note that the config is shared between the API and the Core. When setting the config, the Core’s engine restarts so that all the methods and members (especially those initiated with the old config) can use the updated config. This attempt will stop processing the video - if any.
Table 1. Supported API endpoints with their descriptions.
Some requests are directly related to the Core, such as
process-video-cfg, while others only affect how the Core works, such as
set-config. In the latter case, the API is responsible for sending proper commands to the Core. For example, if the API receives a command to update some parameters in the config file, it sends a restart command to the
CvEngine component inside the Core to update config parameters before continuing further processing.
start_services.bash file is a script that starts the Core and the API by running
run_processor_api.py Python scripts. After the Core and the API start, the queues are ready, and the connection between these components is established, the API starts listening for requests.
To start processing the video by default when everything is ready, the
sample_startup.bash file runs a command in a while loop that calls the
process-video-cfg API endpoint until a true response is received.
The Smart Social Distancing application uses two config files;
config-frontend.ini is used by the Dashboard, and
config-*.ini determines the Processor configurations.
config-*.ini is a device-specific config file used by the Processor and shared between the API and the Core. More specifically, the API and the Core build a config instance from the
ConfigEngine class separately. However, they pass the same
config-*.ini file to the
ConfigEngine constructor. Therefore, they use the same configurations until a request is made by the user to set the config file.
set-config command is received, the API creates the new config object, writes the updated parameters in the config file, and reloads the config object. Then, the API restarts the Core by sending the
stop-process-video command followed by a
process-video-cfg command to avoid inconsistencies and ensure that the Core is also using the same configurations. Since the Core updates the config files every time
process-video-cfg is executed, this issue will be solved when the Core restarts.
A key point about the config files is how to set the host and port. In
config-frontend.ini, there is a
Processor section. You should set the host and port in this section to the host and port you want to use when running the Processor Docker container. The
App section in this config file specifies the host and port on which the Dashboard is accessible. In
config-*.ini, the host and port under the API section specify the Processor API address that is only accessible inside the Processor’s Docker, and the user does not need to change its value. This port will be forwarded to the port specified in the Docker run command and matches the port specified under the Processor section in the
config-frontend.ini file. We will provide an example to explain how to set the host and port.
For example, let us say we want to run the Processor on port 2023, and for some customization reasons, we have changed the Processor’s Dockerfile so that the Processor API should use port 8040 inside the Docker (the default is on 8000). We should apply the following changes before running the application:
- Change API port in
- Change the Processor port in
- Forward the port 8040 of the Processor’s Docker to the
HOST_PORT, which is 2023. For example, if we are running the application on Jetson Nano, we should run this command:
docker run -it --runtime nvidia --privileged -p 2023:8040 -v "$PWD/data":/repo/data -v "$PWD/config-jetson.ini":/repo/config-jetson.ini neuralet/smart-social-distancing:latest-jetson-nano
There are some other parameters that you can change in the
config-*.ini file. For example, under the
[Detector] section, you can modify the
MinScore parameter to define the person detection threshold. You can also change the social distancing threshold by altering the value of
DistThreshold. You can read the comments in the
config-*.ini file to learn about other parameters and their possible values.
The config files are copied to the Docker images when they are built. Be aware that if you change the config files, you need to rebuild the Docker images to make sure the application is using the latest versions of the config files. The build time will be much less than the first time you build the Docker images because the copying will take place in the last layers of the Dockerfiles. Therefore, the Docker images use the previously built Docker layers and only copy the necessary files into the new image.
The application logic
Most of the Smart Social Distancing application’s mathematical calculations are implemented in the
Distancing class in the
libs/distancing.py file. You can replace the
Distancing class methods with those matching your specific needs to customize the Smart Social Distancing application. For example, you can implement your own distance measuring algorithm by changing the
We will now explain the building blocks of the
Distancing class. We recommend you to read the source code to understand the application logic better.
Image pre-processing and inference
__process method applies some pre-processings to each input frame, such as resizing the image to the configured resolution and converting image colors to match the RGB format. It will then import the proper
Detector based on the device in use and open the video file to run inference and get bounding boxes for the detected pedestrians.
By default, Smart Social Distancing uses SSD-MobileNet-V2 trained on the SAIVT-SoftBio dataset by Neuralet for inference. Other possible models for your device are written as comments in the config file that matches your device name. In this file, you can change the
Name parameter under the
[Detector] settings to specify the model you want to work with.
At Neuralet, we have trained the Pedestrian-SSD-MobileNet-V2 and Pedestrian-SSDLite-MobileNet-V2 models on the Oxford Town Centre dataset to perform pedestrian detection using the Tensorflow Object Detection API. We experimented with different models on both Jetson Nano and Coral Dev Board devices. See Table 2 for more details.
|Device Name||Model||Dataset||Average Inference Time (ms)||Frame Rate (FPS)|
|Coral Dev Board||SSD-MobileNet-V2||Oxford Town Centre||5.7||175|
|Coral Dev Board||SSD-MobileNet-V2-Lite||Oxford Town Centre||6.1||164|
Table 2. Inference performance results from Jetson Nano and Coral Dev Board
Bounding box post-processing
Some post-processing techniques are applied to the detected bounding boxes before calculating distances to minimize errors. There are some reasons to apply post-processing techniques. First, since we are using a general-purpose object detection model trained on COCO with 80 different classes, including pedestrians, there are some false positive outputs for the pedestrian detection task. These false positives are often seen in the form of extra-large boxes (Figure 3) or duplicate boxes specifying a single object (Figure 4). In the following lines of code, we address these two problems by filtering large and duplicate boxes:
new_objects_list = self.ignore_large_boxes(objects_list) new_objects_list = self.non_max_suppression_fast(new_objects_list, float(self.config.get_section_dict("PostProcessor")["NMSThreshold"]))
The second reason to apply post-processing is to track objects smoothly. In the absence of post-processing, the object detection model may lose track of the detected pedestrians from one frame to the other. This issue can be solved by designing a tracker system that follows the detected pedestrians in different frames.
tracked_boxes = self.tracker.update(new_objects_list)
__process method calls
calculate_distancing to calculate the real-world distances between each pair of bounding boxes. We need to decide how to measure the distance between two bounding boxes to estimate the real-world distances between detected pedestrians.
There are a few methods to calculate the distance between two rectangles; however, choosing the right approach to measure this distance depends on different specifications, such as the data characteristics and the camera angle. Since some measures, such as the distance between the camera to the center of the bounding boxes or the depth data, are not available in most cases, we implemented a calibration-less method as well as a calibrated one. The calibrated method is more accurate than the calibration-less method. However, the latter is easier to implement.
Let us explain more about each method.
In this approach, we map the distances in the image (in pixels) to physical distance measures (in meters), by assuming that each person is
H = 170 cm tall. The question is to measure
L, which is the distance between person 1 and person 2 in centimeters. Let
L4 denote the distances between all four corners of the bounding boxes specifying person 1 and person 2 in centimeters. We can make one of the following assumptions to address this problem (see Figure 5):
1- Assume that
L = L1 = L2 = L3 = L4 approximately.
2- Assume that
L = min (L1, L2, L3, L4).
Following the first assumption makes the approach easier to understand with less amount of calculations needed. However, the second method produces more accurate results.
We explain how to calculate
L1 in this section. If you want to use the first method, put
L = L1. Otherwise, calculate
L by setting it equal to the minimum value of
L4. It is trivial that you can calculate
L4 following the same approach.
L1, we first calculate the horizontal and vertical distances between person 1 and person 2 in pixels, denoted as
DX = Xp_1 - Xp_2,
DY = Yp_1 - Yp_2
Then, by the
H = 170 cm assumption, we map the distances in pixels to distances in centimeters:
Lx = DX * ((1 / H1 + 1 / H2) / 2) * H,
Ly = DY * ((1 / H1 + 1 / H2) / 2) * H
Finally, we apply the Pythagorean formula to get
L1 = sqrt(Lx + Ly)
We can now calculate
L based on our initial assumption, which is a reasonable estimate of physical distance between two people.
In your device config file,
- You can set the
DistMethodparameter under the
PostProcessorsection to specify the approach you want to apply for distance measurement. It is set to
CenterPointsDistanceby default. You can set it to
Las the minimum value of the distances between all four corners of the two bounding boxes (
L = min (L1, L2, L3, L4)).
- The minimum physical distancing threshold is set to 150 centimeters by default. You can set a different threshold for physical distancing by changing the value of
The calibrated method takes in a homography matrix and calculates the real-world distances by transforming 2D coordinates of the image to the real-world 3D coordinates. We will explain how this method works in a separate post.
Some data visualizations are provided to give the user better insights into how well physical distancing measures are being practiced. The data analytics and visualizations can help decision-makers identify the peak hours and detect the high-risk areas to take effective actions accordingly.
The visualizations include camera feed, birds-eye view, the pedestrian plot, and the environment score plot.
The camera feed shows a live view of the video being processed (see Figure 6). Colored bounding boxes represent detected pedestrians. You can see the frame rate and environment score (explained below) at the bottom of the video screen.
This video shows the birds-eye view of the detected pedestrians (see Figure 6). The red color indicates a social distancing violation, while the green color represents a safe distance between the detected pedestrians in each frame.
The pedestrian plot (Figure 7) shows the total number of detected pedestrians (the blue graph) and the number of detected pedestrians who are violating the physical distancing threshold (the orange graph) over time.
Environment score plot
This plot (Figure 8) shows the “environment score” over time. Environment score is an index defined to evaluate how well social distancing measures are being practiced in an environment. Two different formulas are implemented to measure the environment score in
mx_environment_scoring function takes in the number of pedestrians who are violating the social distancing rules and returns the environment score. This function calculates the environment score using the following formula:
env_score = 1 - np.minimum((violating_pedestrians / MAX_ACCEPTABLE_CAPACITY), 1)
MAX_ACCEPTABLE_CAPACITY is the maximum capacity of an environment considering the physical distancing measures. For instance, if the
DistThreshold is set to 200 centimeters,
MAX_ACCEPTABLE_CAPACITY is the maximum number of people who can stand at least 200 centimeters away from each other in that environment.
mx_environment_scoring_consider_crowd takes in two parameters: the number of detected pedestrians and the number of pedestrians who are violating the physical distancing limits. This function returns the environment score using a different formula that can reflect how crowded an area is, with respect to its full capacity:
env_score = 1 - np.minimum(((violating_pedestrians + detected_pedestrians) / (MAX_CAPACITY + MAX_ACCEPTABLE_CAPACITY)), 1)
In this formula, the
MAX_CAPACITY parameter represents an environment’s full capacity without considering the physical distancing limits. This parameter reflects how large the environment is, in terms of area, and can bring area crowdedness into the calculations.
Note that in both formulas, the environment score takes a value between 0 to 1, and the more violating cases appear in a frame, the less the environment score of that frame becomes.
Data logging system
We have implemented a logging system for the Smart Social Distancing application that can be very useful for data analytics. By storing statistical data, such as the overall safety score, the number of people in the space over the past day, and total risky behaviors, users can gain insights into different social distancing measures. Let us explore how the logging system works.
There are three files under the
The first two scripts implement their
update method that is called from the
loggers.py file frequently. You can create your own logger module by implementing a customized
update method. There are some parameters in the config file that can modify the logging system behavior:
Name: the name of the logger you want to work with.
TimeInterval: the frequency of the log file updates in seconds.
LogDirectory: the directory in which the log files are stored.
The update method in the
csv_processed_logger.py file writes information about each time interval into the object log file. Each row of the object log file consists of a detected object (person) information such as object (person) ids and bounding box coordinates.
csv_logger.py script stores two log files: the object log file and the distance log file. The two log files store different kinds of information.
- The object log keeps track of the information about all the detected objects in each time interval. Frame number, person id, and bounding box coordinates are stored in this log file.
- The distance log stores the details of physical distancing violation incidents, such as the time in which physical distancing rules are violated, the id of the persons crossing the distance threshold, and the distance they are standing from each other.
This project is sponsored by Lanthorn. We are happy to help you deploy the Smart Social Distancing application for commercial purposes. Contact us at Lanthorn for more information.