GGPoker Cheating Detection: Architecture, Signals, Failure Modes
By Raul Moriarty ·Poker Software Expert
A reverse-engineered map of what GGPoker's security stack actually looks like from the outside — behavioural fingerprinting, statistical play-pattern analysis, anti-collusion graph models, and the human review layer that signs everything off.
Summary
- Beginning of Text GGPoker operates a four-layer detection system. Not one of the layers will be conclusive on its own. Signals are accumulated from weeks using an account-specific scoring mechanism with a False Positive Budget that can be adjusted.
- Behavioural fingerprinting (distributions of timing of actions, curvature of inputs, time spent in each step) is probably the least expensive layer and the worst for people who don't understand how to implement it.
- The statistical (play-pattern) analysis of a player's pure GTO outputs occurs at an unexpected rate faster than that of a noisy human player; because population variability is the baseline.
- Most impactful ban activity occurs at the collusion graph level. Multi-account farming can be an unintended result of botting." [1]
- Human Review of Flagged Accounts, is the most important Level. A majority of Bot Bans are reviewed by a reviewer not a Rule.
- Anti-Detection is Adversarially Classifying Data — from the Dalvi 2004/2005 lineage — rather than a Checklist of Features.
What counts as cheating in GGPoker's terms
The categorization process is important since each category includes a unique signal stack, false positive budget and consequence path. Security Team Proactively Works Against the Following Five Categories That are Banned in the Public Terms of Service:
| Category | Operator priority | Detection difficulty | Typical signal |
|---|---|---|---|
| Collusion / chip dumping | Highest (regulatory exposure) | Medium | Account graph + suspicious hand sequences |
| Multi-accounting | High | Low–Medium | Device fingerprint + KYC join |
| Botting (automated play) | High | Medium | Behavioural fingerprint + play-pattern |
| Real-time assistance (RTA) | Medium-High | High | Statistical play-pattern over volume |
| External HUDs / overlays | Medium | Low (client telemetry) | Client-side process detection |
| Ghosting | Medium (event-driven spikes during major MTTs) | High | Win-rate vs known-skill baseline + IP joins |
The operator should prioritize collusion above all else as it has the greatest negative impact on customers. The botting and RTA will be the second most important to address. Enforcement of using external HUDs will be an easy method for the client process to identify since they are running outside of their process space. Ghosting appears to spike in frequency right before and after major tournaments and thus receives disproportional amounts of attention by the community at large during these times.
The four-layer detection model
The stack of items that is visible to an external observer has a total of four components. It's likely there are additional, unobservable elements (heuristics, artificial intelligence-based scoring systems, hidden signals) which affect a customer's account; however, the above list includes the four that can be observed in terms of their effect on customer accounts.
- Layer 1: Behavioural fingerprinting
- Client telemetry on input timing, mouse-path geometry, touch dwell on mobile, action-confirmation latency, idle behaviour between hands. Cheap to compute, runs continuously, feeds into a behavioural score per session. Bites naive implementations hardest.
- Layer 2: Statistical play-pattern analysis
- Per-account distributional analysis on VPIP, PFR, 3-bet by position, fold-to-cbet by board texture, bet-sizing histograms, river aggression, all-in equity at showdown. Heavy compute, runs nightly or weekly, produces a play-pattern outlier score.
- Layer 3: Anti-collusion graph models
- Account graph joined by IP, device fingerprint, deposit method, KYC document, table co-occurrence, action correlations within hands. Catches multi-accounting and chip dumping; botting falls out as a side-product when a farm runs under a single fingerprint.
- Layer 4: Human review
- The final decision point. Reviewers use mathematical modeling data to support their decisions. They then take into consideration user behavior from that session’s hand history along with session start and stop times based on the users reported time zone. Reviewers also look for small, but identifiable human error. A player calls a "fish" during a session (usually when they are confident of winning) as well as sits out during a session due to needing to make an outgoing phone call. A player may also type errors in chat.
The four layers are weighted asynchronously. Layer 1 produces a high-frequency score that mostly stays under threshold. Layer 2 runs offline and contributes to a per-account risk score that decays slowly. Layer 3 is event-driven by graph changes. Layer 4 is the bottleneck — reviewer capacity is limited, so a queue is maintained and prioritised by combined risk score, expected revenue impact, and recent withdrawal activity.
Signal weights and observable failure modes
The actual weightings of the system are confidential to the operators; however, since we can infer the relative weighting based on how often customers' accounts were "caught", as well as the sequence of events that triggered the catches (i.e., first, second), it has a reliable-enough pattern to support both the design/building of such systems by other developers/engineers as well as defense/counter-attack strategies.
| Signal | Layer | Relative weight | Naive failure mode |
|---|---|---|---|
| Action-timing variance < population | L1 | High | Constant-latency action emission |
| Touch coordinate clustering on buttons | L1 | Medium | Pixel-perfect tap on button centroid |
| Idle behaviour between hands too uniform | L1 | Medium | No micro-movement, no chat, no occasional tab-switch |
| VPIP/PFR ratio at population mass with low variance | L2 | High | Pure GTO baseline, no human-noise overlay |
| Bet sizing clustered on exact pot fractions | L2 | High | Solver output without sizing perturbation |
| Win rate persistently outside skill-pool envelope | L2 | Very High | Hot run, high stakes, no manual sessions interleaved |
| Shared device fingerprint across accounts | L3 | Very High (regulatory) | Bot farm on one IP / device |
| Withdrawal pattern → big-bang on first cashout | L3+L4 | High | Quiet grind for 30 days, then large withdrawal |
| Chat behaviour: zero outgoing messages over 5k+ hands | L4 | Medium | Bot never says "nh" |
| Sit-out behaviour: never sits out on bad table | L4 | Medium | Bot grinds whoever sits at it |
The consistency of this pattern is evident; the least computationally intensive layers (Layers 1 & 3) tend to be targeted by the most casual users (and implementations), whereas the layers requiring either substantial computational resources (Layer 2), or significant human time/attention (Layer 4), will attract the more capable/complex implementations. It’s also why an implementation can continue to successfully evade Layer 1 detection yet fail at Layer 2 for an extended period (weeks/months) until the online player behavior patterns accumulate sufficient review triggers to exceed an off-line "play-pattern" score for a reviewer. As such, it clearly explains the commonly observed lag in time (usually from 2-9 months, with a median of approximately 8-14 weeks) between a user's initial introduction of a bot into their account, and when they are subsequently identified as using one.
Action-timing fingerprints
. The most discussed and least well-designed of all signals. A simple implementation will have an action occurring at a fixed time period or randomly with a mean that is uniformly distributed about some central point. Both approaches are completely disastrous.
Real world (human) action-timing distributions appear to be log-normal in nature, have a large number of "right tails" and are heavily dependent upon game-state. It can take anywhere from 600-1200ms for a snap-fold of a hand that is clearly garbage, up to 5-30 seconds for a river decision that requires thoughtful analysis, and as little as 1.5-4 seconds to fire off a routine flop continuation-bet on a clean board. The distribution shape is not just wider than those from naive bots — the overall shape of this distribution is fundamentally different from those found in naive bot’s, and these differences are so distinct that they can be considered “fingerprints
# Schematic: behaviourally-shaped action timing
# Conceptual, not the production implementation
def sample_action_delay(decision_difficulty, action_type, hand_state):
"""Return seconds-to-act drawn from a state-conditional log-normal."""
# Difficulty in [0,1]: 0 = trivial fold, 1 = boundary call
mu_base = {
'fold_trivial': math.log(0.9),
'cbet_routine': math.log(2.4),
'check_routine': math.log(1.6),
'river_boundary': math.log(8.5),
'all_in_decision':math.log(12.0),
}[action_type]
# Difficulty stretches mu logarithmically
mu = mu_base + 0.7 * decision_difficulty
# Sigma rises with difficulty — humans deliberate variably on hard spots
sigma = 0.35 + 0.55 * decision_difficulty
delay = random.lognormvariate(mu, sigma)
# ~3% chance of distraction tail: 8–25s independent of difficulty
if random.random() < 0.03:
delay += random.uniform(8, 25)
# Floor at a non-zero minimum; humans cannot react in < 250ms
return max(0.25, delay) The example is schematic. Production systems condition on more variables — stack depth, opponent action sequence, position, multiway versus heads-up, and a per-session "alertness" parameter that drifts down over long sessions to mimic fatigue. The point is that the right behaviour is not "add noise" — it is "draw from a distribution whose shape matches the population, conditioned on state."
False-positive budget and review pipeline
The primary restraint upon the entire Stack is False-Positive Cost. GGPoker can't lose a large number of customers by banning significant amounts of players who are winning legitimately. Each time there is a false positive it results in a Regulatory Complaint, Chargeback, Forum Post or Churned Customer. The Detection System has been set up to run at a conservative level for false positives so as not to have too many Automated Signals automatically result in some type of Action.
What they trigger is a review queue placement. The visible stages from outside, in order:
- Quiet flag. Account moves into a higher-scrutiny review bucket. No visible change to the player. Telemetry continues.
- Soft restriction. Withdrawal limits drop. KYC re-verification requested. Bonus eligibility quietly removed. Some players notice and modify behaviour; most don't.
- Structured interview. Support requests "clarifying information" about play style, schedule, and software use. The interview is logged and the answers are matched against the play-pattern model.
- Confiscation and closure. Winnings voided, balance held pending investigation, account closed. The investigation period extends from weeks to months depending on jurisdiction.
The cycle from first quiet flag to confiscation typically runs 14 days to 9 months, anchored on review-queue capacity and triggering events (especially withdrawal activity). The longest cycles we've seen are accounts that ran quietly for a year, withdrew their first significant amount, and were reviewed 15 days after the withdrawal. The mathematical signal was present from month two; the human review was queued only by the withdrawal event.
Anti-detection as adversarial classification
The standard mistake among bot builders is to treat detection as a feature checklist — add latency noise, vary touch coordinates, randomise schedule. This is the wrong frame. Detection is an adversarial classifier: the operator builds a model that distinguishes bot behaviour from human behaviour, and the bot's job is to produce a behaviour distribution the classifier cannot separate from the human distribution while preserving EV.
The formal literature on this dates to Dalvi et al. (2004), Adversarial Classification, and Lowd & Meek (2005), Adversarial Learning. The setting is identical in structure: an attacker (here, the bot) chooses an action that maximises expected utility under a classifier whose decision boundary the attacker can probe but not fully observe. The modern adversarial-ML literature (Goodfellow et al. 2014 onward) extends this with neural-network classifiers, gradient-based attacks, and the certified-robustness lineage.
Three operational consequences fall out of the formal frame:
- The classifier's decision boundary is non-stationary
- Operators need to train their systems again to detect new bots. Behaviour that was undetected as bot-behaviour in 2024 might be undetectable in 2026.
- Population baseline is the right reference, not "looking human"
- The classifier divides your distribution (that of your bot) away from that of the population distribution — NOT AWAY FROM "WHAT A HUMAN LOOKS LIKE" IN GENERAL. In other words, if the NL50 6-max population has a specific bet-sizing histogram with an extended tail for small overbet sizes, then so does your bot. It is not based on you wanting your bot to be "more human," but rather because the classifier is trained upon and compared to the distribution of the population.
- EV-detection tradeoff is the right optimisation target
- Pure-GTO output maximises EV under fixed opponents. Behaviourally-shaped output gives up some EV in exchange for a lower detection score. The right optimum is not zero detection — it is the EV-maximising point under a budgeted detection probability over the account's expected lifetime.
This perspective can explain an apparent contradiction as well: Pure GTO bots get banned quicker than less-than-optimal bots with overlayed human noise. Although the GTO bot makes more profit on average per hand it is easier to identify; therefore, has a lower number of hands played before being identified by the system for removal.
Have a question? Talk to us
Adversarial classification in this domain, behavioural shaping under EV constraints, detection-system architecture from the operator side — questions on any of it land with the Poker Bot AI team.
References and related work
Selected sources on the above topics. Names and identifiers provided; URLs are stable (arXiv) and persistent (Science).
- Brown & Sandholm, 2019. Superhuman AI for multiplayer poker. Science 365 (Pluribus). The reference result for 6-max NLH at superhuman level.
- Moravčík et al., 2017. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science 356. arXiv:1701.01724.
- Brown & Sandholm, 2017. Safe and nested subgame solving for imperfect-information games. NeurIPS (Libratus core technique).
- Dalvi, Domingos, Mausam, Sanghai & Verma, 2004. Adversarial Classification. KDD. The foundational paper on the adversarial-classifier framing.
- Lowd & Meek, 2005. Adversarial Learning. KDD. Probing the decision boundary of a deployed classifier.
- Heinrich & Silver, 2016. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games. NIPS DRL workshop. arXiv:1603.01121.
The companion notes on this site cover the broader picture: why "GGPoker hacks" do not exist and the homepage's overview of what we mean by "poker bot" in 2026. The FAQ answers specific implementation questions that come up regularly in the chat.