Digital Fingerprinting and GPU Accelerated Computing with Nvidia Morpheus & Rapids.

31.07.2023Frederik Möllers
Tech Artificial Intelligence Machine Learning

What is digital fingerprinting?

A digital fingerprint, similar to a fingerprint in the real world, is a (more or less) unique identification of an entity. In the digital world, this entity can be a (technical) user, a user group or even a specific device.

The goal of digital fingerprinting is to identify an identity by means of a specific set of characteristics. These characteristics can be, for example:

  • IP address
  • user agent
  • Operating system
  • browser
  • time zone
  • etc.

In addition, general behaviour, such as accessing certain applications from a certain location, can also serve as another characteristic.

If sufficient data is available, a digital fingerprint can be created of a user or user group, which can be used to identify a user relatively uniquely. In addition to identifying users, there is of course also the possibility of verifying already authenticated users. This is where Morpheus comes into play.

GPU Acceleration with Nvidia Rapids

Before we get into Morpheus, let’s take a quick look at Nvidia Rapids and GPU acceleration, as this is the foundation for Morpheus.

Accelerated computing is the use of special processors called GPUs to speed up certain tasks. To benefit from GPUs, the tasks must be parallelisable, an example of this being data transformations, where each data set is transformed individually. Classically, GPUs are used for graphics calculations, as these can be parallelised very well. Especially in the field of machine learning, GPUs are indispensable nowadays, as they can speed up calculations many times over (and in some cases make them possible in the first place).

In order to run programmes on the GPU, they must be written specifically for the GPU. This is possible using CUDA for Nvidia GPUs, Vulkan for AMD GPUs or OpenCL.

Figure: Source: Nvidia RAPIDS.ai

As the inventor and one of the largest manufacturers of GPUs, Nvidia is of course also very active in the field of GPU acceleration. Morpheus is based, among other things, on Nvidia Rapids, an open source framework that ports well-known data science frameworks such as Scikit-Learn and Pandas to the GPU. This offloads data processing to the GPU and speeds up computations.

Since transformation of data and feature engineering is usually necessary before it can be used in further processes such as a machine learning model, Rapids not only enables the acceleration of processes, but also the creation of more complex pipelines that can process large amounts of data in real time. It is precisely this ability to process huge amounts of data in real time that makes Morpheus useful in the first place.

Digital Fingerprinting with Morpheus

The Morpheus SDK is an open source framework developed by Nvidia that, among other things, enables digital fingerprints to be created and verified using network traffic logs. In addition to digital fingerprints, other use cases such as Sensitive Information Detection, Fraud Detection and Phishing Email Detection can also be used. In addition, Morpheus offers developers the ability to create their own models and pipelines for the specific use case.

Morpheus can be used on-prem, in the cloud or on the edge. The data is not stored, but only analysed. This is particularly interesting for companies that are not allowed to store data for compliance reasons.

This is what Morpheus has under the bonnet

To create digital fingerprints, the SDK uses a simple autoencoder. A small autoencoder is trained on the network logs of each user (even a group). Once this has happened, it can be continuously verified whether the user is behaving as expected or whether there are any deviations.

Possible deviations and anomalies in this case could be, for example, that a user tries to log in from a different location or that he tries to access an application to which he normally has no access.

Since these can be perfectly normal developments that one knows from everyday operations (for example, when someone changes departments or projects), there is of course the possibility of optimising the models afterwards.

However, if a user has been compromised or an attacker has gained access to a machine, deviant behaviour can give a very early indication of an attack. Since this is based not only on random samples of network traffic, but on 100% of the traffic, even very small deviations can be detected very early.

Nvidia itself uses Morpheus internally and uses it to monitor 100% of the network traffic of around 25,000 users.

Conclusion and outlook

Accelerated Computing is a very exciting topic that will have a lot of potential in the future. Advances such as LLM’s (keyword ChatGPT) currently show impressively what is possible from the combination of huge data sets and massive computing power. The ability to analyse and evaluate the network traffic of thousands of users in real time without much effort is, in my view, a good example of the added value that GPU acceleration can offer.

When it comes to data for training machine learning algorithms, people often think of static data sets that have been aggregated over a long period of time and can now be used to generate specific added value. Nvidia uses Morpheus to analyse 900TB of network logs per day, a volume of data that would be challenging to store alone.

I think we will see many more interesting use cases in the future that are only possible because of the possibilities of direct processing, analysis and evaluation of large amounts of data in real time. We’ll definitely stay tuned, and hope you will too! 😎🔥

Morpheus

Rapids

Translated by our automatic markdown translator. 🙌