A new AI created by South Korean scientists was trained with the Darknet as its sole source of information.
The scientists from Jin Youngjin's team have created a new AI called DarkBERT, built entirely from datasets from the Deep Web or Darknet. The dataset that this AI learned from is therefore significantly different from those that a tool like ChatGPT learned from. Another significant difference between these two AIs is the language.
Darknet AI Revealed – Why, Dear Scientists?
At first glance, it may seem like a situation in which one would like to shout a loud "Why?" to the scientists. With all the dystopian movies, games and books out there, you can quickly imagine a worst-case scenario. But the scientists' intentions are contrary to things like the infamous Skynet from the Terminator series.
- Another great Sci-Fi story is The Expanse series – and if you want to watch the show, you can do so with this free Amazon Prime trial
The goal of DarkBERT is to bring a little light into the darkness of the Deep Web. The name "BERT" isn't a coincidence or a reference to Sesame Street, but an abbreviation for: Bidirectional Encoder Representations from Transformers, a specific form ofLarge Language Models that deal with probability distribution over word sequences. Or simplified, this means that LLMs learn the complexity of human language and can also apply it. So exactly what we know from ChatGPT, and BERT can do that pretty well.
Because DarkBERT has learned specifically with data from the Darknet, it can now evaluate it better than conventional AIs. For example, it can classify Deep Web activities such as illegal drug or weapons trafficking. Therefore, it can also be used to detect ransomware leak sites, which would benefit cybersecurity.
According to the scientists, DarkBERT does this way more efficient and precise than existing LLMs. This is the main approach for a possible use of DarkBERT: security and law enforcement agencies.
However, this is still a dream for the future, because even if the AI works very well, there are still some limitations. DarkBERT can only capture English texts, and no images or videos, which was intentionally designed by the scientists. This was done to protect themselves from certain content and the associated criminal liability.
Also, for more specific cybersecurity-related tasks, some fine-tuning would have to be done before DarkBERT would really be ready for use. For now, the usage of DarkBERT for other reasons than scientific research is denied by its inventors, and it seems pretty unrealistic that it could become accessible for the public.