2021-08-21 • Luc Kusters


How to use an IDS uEye Camera to analyze live video with Darknet YOLO using OpenCV-Python

NOTE: This article is a work in progress. I will update it regularly when I have time.

Introduction

In a recent machine learning project I had to use a uEye camera to analyze experimental data using the Darknet/YOLO neural network. At work, my predecessor had set up Darknet such that it could run without a problem on video files and images, but my task was to implement live analysis. I set about doing this in Python, as it has a great implementation of OpenCV and there are also open source uEye libraries (pyueye) available in Python, so it seemed a natural choice. Moreover, I like coding in Python, because its an easy going language, so there’s that too.

To my amazement, it seemed that there was not a lot of information available online on how to use a uEye camera for this purpose. Darknet supplies you with a terminal command to open darknet on a webcam, which works for most simple cameras, but it doesn’t work for uEye cameras because these require more settings to be configured than simple webcams. These settings, as it turns out, aren’t automatically configured by this ‘standard’ command. So a custom code implementation was a must. I also saw a lot of code examples, which used webcams, that use OpenCV-Python directly to call the camera using the cv2.VideoCapture() command which I think is what darknet also does behind the scenes, and I even saw an example where they were even able to configure some uEye camera settings using OpenCV, but were not able to set other very critical settings. This made me very weary at first, because it seemed there was maybe no already neatly implemented way to couple a uEye camera to Darknet, which would have meant I would have had to make a deep dive into Darknet’s source code.

I had to do a lot of digging but at last I found a solution. A very simple one too, and against my fears it turned out everything was indeed already implemented quite well, though it was documented quite terribly every step of the way, if I may let my frustration shine through a little. Darknet already supplies a Python interface in its GitHub repository. There are a few different forks of Darknet and the interface I liked the most was AlexeyAB’s version. It’s not really explained how to use it and there was one major obstacle to overcome concerning image types to be passed from Python to Darknet which I will cover later, but I found a satisfying solution to that.

In this blog post I will be explaining how to control your uEye camera using the pyueye module together with pypyueye, a pyueye wraper, and I’ll show you how to use separate threads to get camera input and display such that these processes aren’t run in series with the image detection. Finally I will cover how to hook up Darknet/YOLO to this for live image detection. I will be assuming that you already have Darknet, pyueye, and OpenCV installed, and will not be covering any of the installation steps, as they are explained well enough elsewhere. Since pypyueye is an abandoned project, I’ll cover its installation and some changes that I made. Note that I have installed OpenCV with GPU support, which is not available in the standard pip opencv-python package, however that package does work as well, though with lower performance, so I can’t guarantee that you will have the same satisfying results if you’re not using a GPU.

Pyueye and pypyueye

pyueye is a “low level” Python interface and has all the tools needed to control a uEye camera. However, its also tedious to use because of this. Luckily someone already implemented quite a few ease-of-life functions in the pyueye wrapper: pypyueye. This project is by no means perfect, and as mentioned before, the project has been abandoned by its only developer and may be may even be out of date for newer cameras for all I know. Still, it’s a good starting point for working with uEye camera’s so as not to have to start from scratch. I myself had to make some adjustments to get a satisfying live camera feed which I will also explain.

Installing pypyueye

To install pypyueye is very simple, and also described on its GitHub page. Using pip you can install it by simply running:

$ git clone https://github.com/galaunay/pypyueye
$ cd pypyueye
$ pip install .

If you’re on Windows or otherwise don’t have access to these commands, simply download the repository from GitHub as a zip file, unzip it and open a terminal in the install location and run python setup.py install and it should install.

If you want to check if you installed it correctly simply try an import in Python:

# PYTHON CODE
import pypyueye

Alternatively to installing, you can also simply place the cloned/downloaded respository in your project and import it locally.

Initializing a pypyueye Camera object

Pypyueye provides the Camera class which let’s you easily interface with your uEye camera. Simply import Camera from pypyueye and get going! You need to provide a camera handle id if you have multiple cameras hooked up to your computer, by default it’s set to 0, but if you have multiple cameras hooked up you might have to pass 1, 2, or 3 and so on. From there you can access the interface options via the camera object such as setting the uEye color scheme, fps, exposure etc.

# PYTHON CODE
import Camera from pypyueye
from pyueye import ueye

with Camera(device_id=0, buffer_count=3) as cam:
  cam.set_colormode(ueye.IS_CM_RGB8_PACKED)
  cam.set_exposure_auto(1) # set automatic exposure
  cam.set_gain_auto(1)     # set automatic gain
  cam.set_fps(5)

You can grab a single live frame using the

# PYTHON CODE
img = cam.capture_image()

The img object is nothing but a simple numpy array which has light intensity values between 0 to 255 for every pixel for three color channels. The numpy array has a shape of (height, width, channels). We can use OpenCV to start displaying video by simply passing this img object to cv2.imshow() like so:

# PYTHON CODE

import cv2
cv2.imshow("window name", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

The cv2.waitKey(0) function freezes the window untill any key is pressed (while the displayer window is selected). This way, you can display a single frame for an indeterminate amount of time. If you don’t supply the waitKey() function, the window displaying the image would instantly close again. For proper clean-up we also need to end with the cv2.destroyAllWindows() command.

But what if we don’t want to display a single frame but instead want to display a live stream? Easy, have a look at the example below:

# PYTHON CODE

wait_ms = 1
while True:
  img = cam.capture_img()
  cv2.imshow("window name", img)
  if cv2.waitKey(wait_ms):
    break
cv2.destroyAllWindows()

Here we simply open a loop which is permanently set to True to run indefinitely to capture new images. cv2.imshow() displays the newest image every pass and cv2.waitKey(wait_ms) then waits at least as long as wait_ms in milliseconds before the image is updated. Again the window doesn’t close until a key is pressed, but we also break the loop upon exiting the window and then destroy all window instances during clean-up. We can also wait for a specific key input by substituting if cv2.waitKey(wait_ms): with if cv2.waitKey(wait_ms) & 0xFF == ord(YOURKEY): where YOURKEY could for instance be YOURKEY="q" or YOURKEY="esc". This is the most basic way to use pypyueye with OpenCV to get a live feed of your camera, but we can do better!

Some changes for better live-streaming

There’s a problem with the capture_image() function implemented in the Camera class which makes getting live images this way significantly slower than necessary. Even though there is a buffer preparation command (cam.capture_video()) is supplied which initializes one or more image buffers, for some reason no utility function for reading frames from this buffer directly is implemented. The capture_image() function instead creates a new image buffer each time the function is run, as shown in the code below, which is inefficient.

# PYTHON CODE
# Camrea.capture_image
class Camera():

  #... other source code ...

  def capture_image(self, timeout=None):
          if timeout is None:
              timeout = self.__get_timeout()
          self.capture_video()
          img_buffer = ImageBuffer() # <- THIS IS THE PROBLEM FOR LIVE VIDEO!
          ret = ueye.is_WaitForNextImage(self.handle(),
                                         timeout,
                                         img_buffer.mem_ptr,
                                         img_buffer.mem_id)
          if ret == ueye.IS_SUCCESS:
              imdata = ImageData(self.handle(), img_buffer)
              data = imdata.as_1d_image()
              imdata.unlock()
              self.stop_video()
          else:
              data = None
          return data

  #... other source code ...

Therefore we need to implement a live-frame readout function, which is quite simple. Luckily it’s not hard to implement this functionality, as it is practically the same as the capture_image(). For this reason I created a new class called LCamera which inherits from Camera with the following changes. You could also implement this in the source code, but I didn’t want to change anything there. This can no doubt be implemented a little better and I am thinking about forking pypyueye to implement this functionality but for now this worked for me.

# PYTHON CODE

from pyueye import ueye
from pypyueye import Camera
from pypyueye.utils import ImageData
import numpy as np


class LCamera(Camera):
    def __init__(self, device_id=0, buffer_count=3, fps=10, verbose=False):
        super().__init__(device_id=0, buffer_count=3)
        self.__b_CaptureVideo = False
        self.verbose = verbose


    def start_video_live_capture(self, wait=False):
        """
        start video capture
        wrapper for Camera.capture_video()
        """
        ret = self.capture_video(wait)
        self.__b_CaptureVideo = True
        return ret


    def stop_video_live_capture(self):
        """
        Stop capturing the video.
        wrapper for Camera.stop_video()
        """
        ret = self.stop_video()
        self.__b_CaptureVideo = False
        return ret


    def capture_live_frame(self, img_buffer = None, timeout=None):
        """
        grab a frame, only works if start_video_live_capture was called
        """
        if self.__b_CaptureVideo:
            if timeout is None:
                timeout = self._Camera__get_timeout() # because python inheritance is strange...
            if img_buffer is None:
                img_buffer = self.img_buffers[0]
            ret = ueye.is_WaitForNextImage(self.handle(),
                                        timeout,
                                        img_buffer.mem_ptr,
                                        img_buffer.mem_id)
            if ret == ueye.IS_SUCCESS:
                imdata = ImageData(self.handle(), img_buffer)
                image = imdata.as_1d_image()
                imdata.unlock()
            else:
                if self.verbose:
                    print("WARNING: image could not be received from camera")
                image = None
            return image
        else:
            if self.verbose:
                print("WARNING: live image capture is turned off, returning 'None'")
            return None


    def capture_live_frames(self, n_images, timeout=None):
        """
        grab multiple frames, only works if start_video_live_capture was called
        """
        if self.__b_CaptureVideo:           
            if n_images > self.buffer_count:
                print(f"WARNING: n_images ({n_mages}) > buffer_count ({self.buffer_count}), setting number of returned images to {self.buffer_count}")
                n_images = self.buffer_count
            images = np.zeros(n_images)
            for i in range(n_images):
                buff = self.img_buffers[i]
                images[i] = self.capture_live_frame(img_buffer = buff, timeout=timeout)
            return images

This way, the pre-defined image buffer is actually used and can easily be read out, which for me meant a significant speed-up in image capture (from about 400 ms to the minimum time possible for a specific frame rate).

Camera threading for even better performance

Because the in the current setup image capture and image display are executed in series video update, the live video feed is slower than necessary. Especially if we considering that we will want to run image detection using darknet, which, depending on your GPU, may be even slower if it is run in series with image capture and image display. You won’t get around the fact that darknet needs some time to analyze your images before you can display an image containing your detections, however we can get a small boost by simply running image capture and image display on their individual threads. For this I implemented the code below.

# PYTHON CODE

from threading import Thread
import time
import cv2

class VideoCaptureThread:
    """
    Class that continuously gets frames from a LCamera object
    with a dedicated thread.
    """

    def __init__(self, LCameraObj):
        self.cam = LCameraObj # <- this *must* be a LCamera object!
        self.stopped = False
        self.frame = None
        self.__thread = None

    def start(self):
        self.cam.start_video_live_capture()
        self.stopped = False
        self.__thread = Thread(target=self.get, args=())
        print(self.__thread)
        self.__thread.start()

    def get(self):
        while not self.stopped:
            self.frame = self.cam.capture_live_frame()

    def stop(self):
        self.stopped= True
        self.cam.stop_video_live_capture()

class VideoDisplayerThread:

    def __init__(self, frame=None, window_handle="Video Stream", wait=0):
        self.window_handle = window_handle
        self.frame = frame
        self.stopped = False
        self.wait = wait
        self.__thread = None

    def start(self):
        self.__thread = Thread(target=self.show, args=())
        self.__thread.start()
        return self

    def show(self):
        while not self.stopped:
            if self.frame is not None:
                cv2.imshow(self.window_handle, self.frame)
                if self.wait > 0:
                    # slow down video stream...
                    time.sleep(self.wait)
                if cv2.waitKey(1) == ord("q"):
                    self.stopped = True

    def stop(self):
        self.stopped = True

Now we can get our live images from a separate thread, make our detections, draw the detections on the images, and pass the changed image to the display thread like in the code below. We’ll get to YOLO/Darknet detection in the next section.

# PYTHON CODE

from lib import darknet as dn
from src import utils
from src import Analysis as anal
from src.CamPypyueye import LCamera as Camera
from threading import Thread
from pyueye import ueye
import cv2 as cv
import time

with Camera() as cam:
    cam.start_video_live_capture()
    cam.set_exposure_auto(1)
    cam.set_gain_auto(1)
    cam.set_fps(20)

    # create a video capture thread
    liveCam = VideoCaptureThread(cam)
    liveCam.start() # Start the live feed
    wait_time = 5/cam.get_fps() # slow down image displayer thread
    viewWindow = VideoDisplayerThread(wait=wait_time)
    viewWindow.start()
    while not viewWindow.stopped:
        img = liveCam.frame
        # img_processing ...
        viewWindow.frame = img
    viewWindow.stop()
    liveCam.stop()

Passing live images to YOLO/Darknet

Now we understand how to efficiently use our uEye camera using Python with OpenCV-Python, we can get tho the bread and butter of this article: YOLO/Darknet image detection using our uEye camera. As I mentioned in the introduction, this wasn’t as straight forward to implement as I had initially thought, but it’s actually not that difficult!

Initializing a YOLO network in OpenCV through the Darknet interface

The Darknet interface allows us to open a pre-trained YOLO model in python. This model is imported into the network object with which we can run our detections. Loading a network takes some time, but only once at the beginning of our code. Different implementations of the Python interface (darknet.py) have slightly different ways of initializing a network, but all work essentially the same way. Our pre-trained model is essentially saved in three files: 1. A configuration file 2. A (meta-) data file 3. A weights file The standard “out of the box” model provided in Darknet comes with the ability to detect people and many every day objects such as cars and bicycles, but for many applications you will have to train your own model. I will not cover how to train your model here, as it is described well elsewhere. AlexeyAB’s python interface supplies the following load_network(config_file, data_file, weights, batch_size=1) function which initializes our network and returns the network, a class_names list, and a class_colors the last of which can be used when drawing our detections onto the video. A simple example of this step is listed below.

# PYTHON CODE

import darknet as dn

net, class_names, class_colors = dn.load_network("path/to/config.cfg", "path/to/meta.data", "path/to/weights.weights")

Be sure to set your paths accordingly!

Converting OpenCV-Python numpy images to Darknet’s IMAGE type for detection

One last obstacle remains, to pass our OpenCV ndarray type image to our network. For this we have to somehow convert this numpy array into the right format for our network to be able to read it, namely the IMAGE type which is defined in Darknet’s source code. This proved to be a bit more complicated than I had initially thought, as it isn’t documented very well how to do this and I didn’t find any satisfactory solutions online. That is, until I had a look at the source code and also found an issue on GitHub where people were asking themselves exactly this question. It had a solution by the user TheMikeyR, which was to change Darknet’s source code on the C-side to be able to convert a numpy array to the right format, which I incidentally observed to be already implemented in the source code in the same way. Here the image was saved in a float pointer (i.e. as a 1D array of floats). I’m sure this solution worked, but I didn’t really feel like recompiling darknet just for this purpose and I felt like there must be some way to do this all in Python. Indeed, I found a way to do this in Python, and I left a comment explaining how to do it. For the sake of completeness, I’ll reiterate the points I made there below.

When testing I realized that the loop below, as found in TheMikeyR’s answer, was essentially doing the same as could be done in python in a single line of code using numpy!

/* C/C++ CODE */
for(i = 0; i < h; ++i){
    for(k= 0; k < c; ++k){
        for(j = 0; j < w; ++j){
            index1 = k*w*h + i*w + j;
            index2 = step_h*i + step_w*j + step_c*k;
            //fprintf(stderr, "w=%d h=%d c=%d step_w=%d step_h=%d step_c=%d \n", w, h, c, step_w, step_h, step_c);
            //fprintf(stderr, "im.data[%d]=%u data[%d]=%f \n", index1, src[index2], index2, src[index2]/255.);
            im.data[index1] = src[index2]/255.;
        }
    }
}

I initially implemented the code above exactly in Python. It worked, but it was rather slow. I tested my code a little and found that the above for loop it is simply converting the (w x h x c) ndarray to a one dimensional array with the data being strung together column-wise (the columns as they are visually represented by print(np_image)). For this reason, all you have to do is use np.transpose() and np.flatten(). Since in a 3-dimensional matrix/array it is not entirely trivial how transpose works (and I didn’t know how the function was implemented either), I will also give a short explanation on that. What you’re doing is permuting the axes from 0, 1, 2 (h, w, c) to 2, 0, 1 (c, h, w) if I understand it correctly. Then flattening the data using np.flatten() gives you exactly the data you need, and all that’s left to do is cast this numpy array to a float using ctypes. Of course, using numpy sped it up significantly. The transpose and flatten step essentially does the same as this in TheMikeyR’s code:

Running Darknet detect and OpenCV visualization

The Darknet Python interface by AlexeyAB also has some OpenCV drawing functions included for ease of use. To detect objects in our frame we simply pass the network, the class names, and the IMAGE object we learned how to create in the last section to Darknet with the help of the detect_image(network, class_names, image, thresh=.5, hier_thresh=.5, nms=.45) function, which returns a list of detections. The list of class labels that can be passed here is the same class_names that was returned upon the network initialization. The thresh, hier_thresh and nms key word arguments all concern which detections get discarded, but I won’t go any deeper into the details of that here.

The returned list is a list of detections for each detected object, and each detected object has the following structure: detections = [label(str), confidence(float), [x, y, w, h]] where the label is the object’s name (i.e. car, person), the confidence is the confidence of correct classification by the YOLO network, and the coordinates [x, y, w, h] are the x, y coordinates and the box width and height w, h. Note that x, y are the coordinates at the center of the box.

The interface furthermore provides the draw_boxes(detections, image, colors) function which allows us to draw the detection boxes onto our (opencv/ndarray) image captured by our camera provided we pass the detections. A complete script using everything we have learned so far may now look as follows. Note that I have implemented the LCamera class and

# PYTHON CODE

from lib import darknet as dn
from src import utils
from src import Analysis as anal
from src.CamPypyueye import LCamera as Camera
from threading import Thread
from pyueye import ueye
import cv2 as cv
import time

net, class_names, class_colors = dn.load_network(
                                        utils.P_CONFIG, 
                                        utils.P_META, 
                                        utils.P_WEIGHTS
                                    )
print(net, class_names, class_colors)

with Camera() as cam:
    cam.set_exposure_auto(1)
    cam.set_gain_auto(1)
    cam.set_fps(20)

    # create a video capture thread
    liveCam = VideoCaptureThread(cam)
    liveCam.start()
    wait_time = 5/cam.get_fps()
    viewWindow = VideoDisplayerThread(wait=wait_time)
    viewWindow.start()
    #exit()
    start_time = time.time()
    while not viewWindow.stopped:
        img = liveCam.frame
        t = time.time() - start_time
        # img_processing ...
        C_IMAGE = utils.np_image_to_c_IMAGE(input_frame)
        # detect tips
        dets = dn.detect_image(net, class_names, C_IMAGE_frame)
        # draw detections
    img = dn.draw_boxes(img, dets, class_colors)
        # set output frame
        viewWindow.frame = img
    viewWindow.stop()
    liveCam.stop()