Monday, July 14, 2025

FIRST WITH SECURITY NEWS

The alarming rise of AI impersonation: When seeing and hearing isn’t believing

Published on

Artificial Intelligence (AI) is rapidly evolving, bringing with it incredible advancements. However, this progress also unveils a darker capability: the power to convincingly impersonate individuals through AI generated voice and facial likenesses, commonly known as deepfakes. These sophisticated forgeries are no longer confined to internet memes; they are actively being used in elaborate scams, causing significant financial, social, and reputational damage. This post delves into how these AI impersonation tools work, examines real-world case studies, and discusses potential mitigations.

History Of Deepfakes

The term “deepfake” burst into public consciousness in late 2017 and originated from Reddit, where a user of the same name shared manipulated pornographic videos, often of celebrities, by superimposing their faces onto existing footage. This initial, notorious application combined “deep learning” algorithms with “fake” media, and the release of underlying open-source code made the creation process available to everyone, allowing individuals with moderate technical skills to produce their own versions.

Since then, the technology has evolved at a breakneck pace, moving from these early, often discernible, manipulations to increasingly sophisticated and convincing fakes capable of being used for political disinformation, financial scams, and even live impersonations, marking a swift and concerning progression from niche internet phenomenon to a mainstream societal challenge.

Case Studies: The Real-World Impact of AI Impersonation

High-profile cases have demonstrated the devastating potential of this technology. The following two case studies served as the source of inspiration that drove the research MWR CyberSec did into this subject.

Case Study 1: The Elon Musk Deepfake Scams in South Africa

In South Africa, a series of sophisticated deepfake scams emerged, leveraging the likeness of Elon Musk and other prominent local billionaires like Johann Rupert and Patrice Motsepe. Scammers created convincing videos where these personalities appeared to endorse AI-powered cryptocurrency trading platforms. These deepfakes promised outlandish returns – for instance, turning a R4,700 investment into R30,000 in a single day.

The deepfake videos were remarkably well-produced, with “Musk’s” voice even mimicking local accents to enhance credibility. These videos circulated widely on social media, with some attracting hundreds of thousands of views. The consequence was substantial financial loss for numerous investors. In one documented instance, an individual invested and lost R5 million to one such scheme. While precise overall numbers are hard to ascertain, the widespread nature of these campaigns suggests that over 150 individuals could have fallen victim to this one specific campaign.

The following two videos show how one such deepfake scheme was created using a real existing interview of Elon Musk on Wall Street Journal and lip syncing it to a different audio source.

 

This case highlights not only the commercial impact through direct financial losses but also the significant social impact, as it erodes public trust and preys on the familiarity and authority of well-known figures.

Case Study 2: Arup’s $25 Million Lesson in Live Deepfake Deception

In a chilling demonstration of how deepfakes can infiltrate corporate settings, a multinational engineering firm called Arup fell victim to a $25 million scam in 2024.

A finance employee based in the company’s Hong Kong office received an email, purportedly from the UK-based Chief Financial Officer (CFO), requesting urgent and confidential fund transfers.

Initially, the employee was sceptical about the email, however, he was then invited to a video conference call. On this call, attackers convincingly impersonating the CFO and other senior executives (whose likenesses and voices were deepfaked in real-time) assured the employee that the instructions were legitimate and allayed concerns. Convinced by what appeared to be a legitimate, multi-participant video meeting with trusted colleagues, the employee authorised transfers amounting to approximately $25 million (HKD 200 million) to accounts controlled by the fraudsters.

This incident was a stark wake-up call, proving that live, real-time deepfakes are now sophisticated enough to deceive professionals in a business environment. The attack resulted in massive financial loss and underscored the potential for severe reputational damage to organisations that fall prey to such schemes. It also highlighted the psychological manipulation involved, as the live video interaction effectively overrode the employee’s initial skepticism.

How These Things Actually Work: The Technology Behind the Deception

Deepfakes leverage sophisticated AI to superimpose existing images and videos onto source images or videos (for video deepfakes) or to synthesise a target person’s voice (for audio deepfakes). Here is a general overview of the process that goes into making a deepfake:

Phase 1: Data Collection

For both video and audio deepfakes, source material is required for the target person that will be deepfaked. Typically, a significant amount of high-quality video, images and audio is required of the target person in order to train AI models to perform the deepfake.

To perform training, processing has to be performed on the source data such as transcribing audio, cropping images, feature extraction, and various other actions that enhance the training process.

Phase 2: Model Training (Teaching the AI)

This is the most computationally intensive phase of the process. The source material gathered by the attacker is fed to a model and training is performed. For video content, this involves learning the unique facial features, expressions, and nuances that make up the target person. For audio training, it involves learning the characteristics of the target’s voice, including pitch, timbre, intonation and rhythm.

Phase 3: Generation & Refinement (Creating the Fake)

Once the model has been trained, a deepfake can be produced. This can take the form of pre-recorded video or a live performance deepfake. The trained model from phase 2 generates the target person’s face or voice based on the input it receives from either a driving video or webcam and microphone.

For pre-recorded content, post processing and refinement could also be performed to make the deepfake look and sound more realistic. Lip syncing could be performed or visual artifacts might be edited or hidden with overlays.

Pre-trained models

Various pre-trained models are also available that can be used to produce deepfake content. These tools completely eliminate the need to perform vast source material gathering or training a model for millions of iterations, but rather allows a user to upload a short audio clip or even a single picture in order to start the deepfake process. The following video from research done by Bytedance shows just how advanced these pre-trained models can be.

It should be noted that the tool mentioned above was not publicly available at the time of writing due to ethical concerns by the developers (Good!). It does however demonstrate that with the rapid progression of AI tools in our modern age, the limitations for creating deepfakes are becoming less and less of a barrier to entry for attackers.

Technical Shortcomings Seen In The Practical Application Of Deepfakes

During MWR’s own attempts at recreating these techniques, some technical shortcomings were encountered that could help identify poorly made deepfakes.

Audio Deepfake Shortcomings
  • Unnatural Cadence and Pace: AI models can struggle with rhythm. Listen for speech that is unnaturally fast or slow, as this can cause the AI to generate noticeable glitches or distortions.
  • Volume Changes: Rapid volume changes from loud to quiet or vice versa can often lead to audio artifacting (that robotic sounding voice) being produced by the model.
  • Whispers: Whispering lacks strong vocal cord vibration (pitch), which is a key feature that audio models rely on. Consequently, cloned whispers often sound distorted, breathy, or may have bizarre tonal inclinations.
  • Context is King: The most powerful detection tool is your own familiarity with the person supposedly speaking. If you know them well, you may notice that their diction, tone, or emotional inflection is “off”. Trust your intuition if the voice sounds like them, but the way they are speaking doesn’t.
  • Vocal Range Mismatch: Real-time voice changers are particularly vulnerable when there’s a significant difference between the input and target voices. For example, if someone with a naturally high-pitched voice attempts to clone a very deep, low-pitched voice in real-time, the output may sound strained, tinny, or unstable. This doesn’t help that much in detecting these as an attacker would likely pick a target that more closely resembles their own voice.

The audio samples below demonstrates some of the shortcomings:

Video Deepfake Shortcomings
  • Masks: A deepfake model needs to constantly detect a face in the source video to overlay a target face onto it. If this detection is interrupted—perhaps by a hand passing in front of the face or gestures that the source video didn’t cover (think sticking your tongue out), or poor lighting, the “mask” can break. The results are often jarring and obvious, ranging from features being incorrectly mapped to the fake face momentarily vanishing altogether.
  • Unnaturally Smooth Skin: The deepfake generation process often involves compressing and then reconstructing facial features. This can lead to a loss of fine detail. Look for skin that appears unnaturally smooth, almost like a digital airbrush has been applied. Details like pores, wrinkles, fine hairs, or even stubble may be smoothed over or absent entirely, giving the person a doll-like appearance.
  • Irregular Gestures and Behaviour: This is another context-based clue. We all have unique mannerisms, head tilts, and hand gestures that accompany our speech. A deepfake may replicate a face perfectly, but if the gestures or expressions don’t match the person you know, it’s a major red flag. If a normally animated friend is suddenly stiff and inexpressive on a video call, or vice versa, it could indicate that you’re watching a digital puppet, not a real person.

The video below demonstrates some of these shortcomings in an exaggerated manner:

Mitigations and Staying Vigilant: What Can Be Done?

The U.S. Department of Homeland Security (DHS), in its report “Increasing Threats of Deepfake Identities” emphasises that the threat of deepfakes comes not just from the technology itself, but from our natural inclination to believe what we see and hear. Even less sophisticated deepfakes can be effective in spreading misinformation.

The DHS report outlines that there is no single, universal solution to the deepfake problem. Instead, a multi-pronged approach is necessary, encompassing the following phases of a deepfake attack:

  1. Technological Innovation:
  • Developing and improving deepfake detection technologies. This is an ongoing “cat and mouse” game as generation techniques become more advanced.
  • Exploring digital watermarking or authentication technologies that can help verify the authenticity of media.
  • This phase is largely dependent on developers and organisations that have to consider the ethical implications of what they are developing but also how these safety measures could be added.
  1. Education and Awareness:
  • Critical Evaluation of Media: Individuals need to be educated to critically evaluate online content, especially if it seems sensational or too good to be true. Look for inconsistencies in media, including unnatural features visible artifacting and what some call “uncanny valley”.
  • Source Verification: Always try to verify the source of information. Is it from a reputable news outlet or official channel? Be wary of content shared widely on social media without clear attribution.
  • Awareness of Impersonation Tactics: Understand that AI can be used to impersonate executives, colleagues, or public figures. For sensitive requests, especially those involving financial transactions or confidential information, use out-of-band verification (e.g. a phone call to a known number, or an in-person check if possible) before acting.
  1. Regulation and Policy:
  • Developing legal frameworks and regulations to address the malicious use of deepfakes, including issues of consent, fraud, and defamation.
  1. Public-Private Cooperation:
  • Encouraging collaboration between government agencies, research institutions, and private sector companies (including social media platforms and tech developers) to share information, develop standards, and implement safeguards.

Individual Precautions:

The DHS highlights different phases and threat actors, however, the core advice for individuals to protect themselves includes:

  • Be Skeptical: Approach unsolicited communications or unusual requests with caution, even if they appear to come from a known person.
  • Verify Identity: If you receive a suspicious video call or audio message, try to verify the person’s identity through a different communication channel that you know is legitimate. Ask questions that only the real person would know.
  • Look for Tell-Tale Signs: While deepfakes are getting better, some artifacts may still be present:
    • Unnatural eye movements or lack of blinking.
    • Awkward facial expressions or lip-syncing.
    • Blurring or distortion, especially where the face meets the hair or neck.
    • Strange lighting or skin tones.
    • Audio that sounds robotic, has an unusual cadence, or lacks emotional depth.
  • Report Suspected Deepfakes: If you encounter a malicious deepfake, report it to the platform where you saw it and, if appropriate, to law enforcement.

AI-driven impersonation is a rapidly evolving threat that poses significant risks across personal, commercial, and societal domains. As the technology becomes more accessible and sophisticated, the potential for misuse grows. By understanding how these deepfakes are created, learning from real-world incidents, and adopting robust mitigation strategies that combine technological solutions with critical human awareness, we can better defend ourselves against this new wave of digital deception. Staying informed and vigilant is our first and most crucial line of defence.

MOST READ

SITE SPONSORS

More like this

My house is my castle: Kaspersky explains how to protect smart home devices

It is expected that by 2028, more than 33% of households worldwide will be equipped with smart home systems. Voice assistants, kitchen robots, smart lights and many other intelligent devices have...

New INTERPOL report warns of sharp rise in African cybercrime

A growing share of reported crimes in Africa is cyber-related, according to INTERPOL’s 2025 Africa...

NETSCOUT warns of new hacktivist threat posing global risks, from the US, Middle East, Africa and beyond

DieNet, a newly emerged hacktivist group, has claimed responsibility for more than 60 distributed...