Stanislav Kutnyk

Stanislav Kutnyk

6min read

CUBE3 Unveils Next-Gen AI Models for Fraud Prevention

CUBE3 is proud to announce a major leap forward in AI fraud prevention, that is revolutionizing how we evaluate blockchain data and analyze smart contracts. This upgrade significantly enhances accuracy and efficiency, empowering our clients with deeper insights and smarter decision-making.

Upgrading our AI models marks a significant advancement in how we process and analyze Ethereum Virtual Machine (EVM) bytecode. The upgrade focuses on improving our model’s understanding of EVM code by implementing Control Flow Graph (CFG) construction as a preprocessing step.

This innovation allows our models to interpret bytecode more effectively, resulting in better predictions and conclusions that can support the ecosystem detecting rug pulls, flash loans, price manipulations, Sybil attacks and many other exploits.

Why the Upgrade? Understanding Unique Challenges of EVM Code vs NLM

First let’s step back and see the reason for the upgrade.

In order to make decisions about the contracts right at deployment time, we use contract’s bytecode as input to our AI models. Bytecode is a sequence of commands that Ethereum Virtual Machine (EVM) can execute. And this is quite unusual information load for AI models to process.

bytecode example

We can actively convert individual bytecodes into opcodes to improve human understanding.

opcode example

Opcode view looks less intimidating, in fact it looks like text (kind of). Developers leverage Natural Language Processing (NLP) techniques extensively when working with code.

Similarities of text and bytecode:

  1. It’s a sequence of various length
  2. We can split it into individual tokens (words).
  3. Usually a token is related to the previous token, that’s why they form exactly a sequence and not just a set of unconnected tokens (like a word cloud)

The difference comes from the purpose. Text is often meant to be ingested sequentially from start to end, while bytecode is meant to provide reactive instructions. There are designated commands to alternate code running and the way contract’s code is executed depends each time on the contract call details (i.e. arguments, caller, current blockchain status). So it’s rather an ‘interactive novel’ category of texts that is similar to bytecode. Shorter sequential instructions in EVM code are split with reactive points. Each call the complete instruction varies in length, order and hence outcome.

Just like a reader could navigate through various story endings, bytecode presents us with all the possible paths a program can take during execution.

Though at close up EVM code is a sequence, from a distance it’s a tree of instructions.

Each contract call can produce different sequence of instructions depending on circumstances.

Our Solution: Empowering AI with Control Flow Graphs

Computer scientists refer to the tree of instructions as a Control Flow Graph (CFG). With the new upgrade, we introduce CFG building as one of the preprocessing steps, so that the model by itself intakes not a complete sequence of code, but a set of shorter sequences (instructions).

Here are the reasons it makes processing more accurate:

  1. Execution inevitably forces sequential fragments to remain in order.
  2. Places where execution may alter are no more presented to the model as a sequence.
  3. Fragments that are not executable at all (whether it’s arbitrary data stored on the blockchain or the code not reachable by EVM, like metadata hash left by Solidity compiler) are not passed to the model.

With the complexity smart contracts may achieve finding CFG can become so computationally expensive that it loses feasibility, thus we do so by running a simplified EVM to define the CFG.

EVM knowledge introduced in such a manner to the ML pipeline makes the latter more robust and “intelligent” to work with EVM code specifically. Now the model can consider the code that can be really executed (hence make a change to the blockchain state), rather than trying to perceive it like a human-readable piece of text, which it is not. Our evaluation measurements have already validated this assumption.

This upgrade not only improves our models at inference time but also increases the utilization of training data, which we are constantly updating and using to retrain our models for AI fraud prevention.

Benefits of the Upgrade

  1. Improved Accuracy: By utilizing CFGs, we can accurately represent sequential code fragments, which reduces ambiguity and enhances the model’s predictive capabilities.
  2. Enhanced Efficiency: The model focuses on relevant instructions, omitting non-executable fragments or arbitrary data stored on the blockchain.
  3. Better Understanding of Code Execution: CFGs help the model differentiate between various execution paths, providing a clearer understanding of potential contract behaviors.
  4. Robustness: Incorporating CFGs make the models more robust, enabling them to effectively manage complex smart contracts while keep processing speed at the same level.
  5. Intelligence: The upgraded model interprets EVM code more intelligently, aligning with its reactive nature rather than treating it as linear text.
  6. Improved Training Data Utilization: The new preprocessing method allows for better use of training data, which we continuously update and use to retrain our models, ensuring they remain up-to-date and effective.

Unlocking New AI Fraud Prevention for CUBE3.AI Clients

Our clients will experience significant benefits from this upgrade. The improved accuracy and efficiency in processing EVM bytecode will lead to more reliable decision-making. Clients can expect:

  • More Accurate Data: Enhanced models provide precise insights and predictions.
  • Increased Efficacy: Better handling of contract complexities results in more effective analysis.
  • Empowered Decision-Making: Clients can rely on more accurate and actionable data to inform their strategies.
  • Continuous Improvement: With better utilization of constantly updated training data, our models are continuously improving, ensuring clients benefit from the latest advancements.

The recent upgrade to CUBE3’s machine learning models represents a significant milestone for AI powered fraud prevention. By incorporating Control Flow Graphs into our preprocessing steps, we have not only enhanced the accuracy and efficiency of our EVM bytecode analysis but also empowered our clients with more reliable and actionable insights. This continuous improvement reflects our commitment to innovation and excellence, ensuring that our clients remain at the forefront of blockchain security and smart contract analysis.


Stanislav Kutnyk

Stanislav Kutnyk

LinkedIn

Machine Learning Engineer


Stay informed, stay protected.
Get the latest web3 security news first