Header image for v6.1: Smarter Game Embeddings, a New Uniqueness Score & a Rebuilt Steam Map

v6.1: Smarter Game Embeddings, a New Uniqueness Score & a Rebuilt Steam Map

Share:
Ross Burton, PhD
Author: Ross Burton, PhD, Head of Product and Data
Category: Devlogs
Published:
Updated:

An Epiphany on a Train

I don't tend to talk about our personal life much. Game Oracle is built by myself and my wife; I do the data and tech whilst she manages all the marketing, business, admin, and to be frank keeps the ship on course. But we're not funded, this business is 100% bootstrapped and we keep the lights on with contracting gigs which occupies most of our time right now. It's a small price to pay to keep the dream alive and we have big dreams, so we just find a way to make it work.

I'm a contract data scientist and every so often I have to travel into London for my work. It offers me an opportunity to reflect — train journeys have a certain way of cultivating that. On one particular train journey about 3 weeks ago I had an epiphany. I've been revising everything I know about representation learning because, despite a PhD and several years of applied experience, the self-doubt never really leaves you. It's always keeping you in check. I'm glad it does because I had a certain realisation about what we're doing here at Game Oracle.

Representations

The centerpiece of my work around understanding the Steam market is "How can I mathematically represent the similarity between games?". I spend an ungodly number of hours thinking about this, because it unlocks so much potential. To represent something mathematically means we grant some complex high-dimensional entity (in our case a game) geometric meaning — a lot of words to unpack there:

  • What do I mean by high-dimensional? Every game has attributes: it's themes, mechanics, the genre it occupies, it's visual style, the player perspective...I could go on an on. Each attribute, each tiny slither of information and the factors behind what derives that information, forms a dimension that we can either measure or estimate.
  • Geometry? Yes! Think back to high school maths. If a game only ever had two dimensions we could measure — let's say "Number of boss fights" and "Number of levels" (gross over simplification, but stick with me) — we could measure the similarity between two games by measuring the distance between those games in two-dimensional space. We can do the same thing even when we have thousands of dimensions that represent "what a game is". 
A two-panel diagram explaining game similarity using geometry. The top panel, labelled "Simple Example", shows a 2D scatter plot with "Number of Boss Fights" on the y-axis and "Number of Levels" on the x-axis. Two game controller icons sit at different positions on the plot, connected by a double-headed arrow labelled "Distance = Similarity", illustrating that the further apart two games are, the less similar they are. The bottom panel, labelled "Reality", shows that real games have two types of input — visuals (an image icon) and descriptions/tags (a document icon) — each feeding via red arrows into a neural network (represented by a connected-node graph), which combines them into a unified mathematical representation.
(Top) A simplified two-dimensional example showing how two games (represented by controller icons) can be plotted in a space defined by measurable attributes — here, "Number of Boss Fights" (y-axis) and "Number of Levels" (x-axis). The distance between the two points in this space is a direct measure of their similarity: games that are close together are more alike, while games that are far apart are more different. (Bottom) In reality, games cannot be fully described by just two hand-crafted features. Instead, visual data (e.g. screenshots and capsule art) and text data (e.g. descriptions, tags, and metadata) are each passed through a neural network, which learns to encode all of that rich, complex information into a single high-dimensional vector — a game embedding — that captures the true, multifaceted nature of a game. Similarity can then be measured as distance in this high-dimensional embedding space.

This geometry is critical because it helps us answer questions like:

  • Are there any similar games to my concept? What did they include in their game? How well did they perform? How did they market it? 
  • How saturated/dense is the market around a particular concept?
  • Are there any "empty" or "undersatured" areas of the market with players looking for games?
  • What ideas seem more unique? How much "space" is there here?

That last idea, the question around "space", that is what has haunted me. You see, although it is possible to measure the distance between games in high-dimensional space, measuring density is actually extremely difficult. Things get weird in higher dimensions. Like in Ant Man when they go into the Quantum Realm, everything is topsy turvy. Almost all the space piles up out near the “edges,” so shapes like spheres and cubes stop behaving the way our 2D and 3D intuition expects them to. It means market density can be very misleading.

An image of a man looking confused, captioned "I'm so confused"

A Better Way

On the train that day I had a breakthrough. I realised two things at once:

  1. We can represent each game more accurately with a new type of model — a custom built variational autoencoder with a very particular design that allows the visuals and text descriptions of games to work together.
  2. The way we present this to our users can be greatly improved. Reducing the chance of misrepresenting the data. 

This culminated in the release of v6.1 which also prompted a much needed rebuild of the Steam Map, our 2D visualisation of the entire marketplace. I'll speak to point 2 above in more detail and then we can get into some examples of how the Steam Map has changed.

Uniqueness Score

In previous versions of Game Oracle we provided a "Saturation Score"  for each title. A measure of how saturated the market is around each game. It had a range of 0 to 10, with higher values indicating that the game sits in a more saturated market. The problem is, it wasn't very accurate or stable; when we added new games to our dataset our global estimate of density (which was poor due to the wackyness of high dimensions) caused Saturation Score to change wildly.

The solution was to use a better model that could project our data into a lower dimensional space without losing information and measure "uniqueness" instead of "saturation". Why are they different? Our Uniqueness Score now measures the average distance between the 20 nearest competitors. It is quite literally that. It is a measure of how much "creative space" exists between a game and it's direct competitors. Emphasis on direct. A game can exist in a saturated genre (like Co-Op Horror games) and still be unique. It is all about the local space around that game.

The Uniqueness Score is measured between 0 and 10, with higher values indicating a game is more unique than it's closest competitors.

 

A two-panel diagram illustrating the concept of Uniqueness Score. Both panels show a dashed red circle labelled "Creative Space" containing a central red game controller icon (the game being evaluated) surrounded by several white game controller icons (its nearest competitors). In the top panel, labelled "Less Unique", the white controllers are clustered closely around the red one, indicating low average distance between the game and its competitors. In the bottom panel, labelled "More Unique", the white controllers are spread further apart from the central red one, indicating greater average distance and therefore a higher Uniqueness Score.
Uniqueness Score measures the creative space around a game relative to its nearest competitors. Each circle represents the local "creative space" around a focal game (red controller icon), containing only that game's closest competitors (white controller icons). (Top) A game with a low Uniqueness Score sits in a crowded local space — its nearest competitors are packed tightly around it, leaving little creative distance between them. (Bottom) A game with a high Uniqueness Score occupies a more spacious local neighbourhood — its nearest competitors are spread further away, meaning the game is doing something meaningfully different from those it most directly competes with. Crucially, this is a local measure: a game can belong to a heavily populated genre and still score highly if it carves out a distinct identity among its direct rivals.

Saturation in 2D

You will still be able to view market saturation in our Steam Map tool. The Steam Map "squeezes" all our data into a 2D projection, placing all the games on Steam into bins of similar games. This is amazing because it means we can visualise and explore the entire marketplace in a single interactive plot. However, it is not as accurate as our Data Explorer, Game Gap, and Concept Compass tools which uses all the dimensions available; all that squeezing into 2D loses information.

So what is the Steam Map good for? It allows you to visualise, broadly, the market saturation in genres and sub-genres. It gives you a global view.

But saturation on the Steam Map is fundamentally different to Uniqueness Score, because Uniqueness Score measures "How unique is this idea when we look closely at it's own little corner of Steam?".

Steam Map Refurbishment

Classic scope creep. Whilst I was rebuilding the models I had the urge to just completely rebuild the Steam Map and address some issues people consistently bring up. Originally we were using Apache ECharts because it allowed me to move quick without worrying about the interactive elements.

But we seem to have hit its limits. It was time to go all-in and build a custom 2D heatmap using D3.js

The Steam Map now includes a number of features our users have been requesting for a long time:

  • Zoom functionality: you can now zoom in and out of the map
  • More natural selection tools: I've run a clustering algorithm over the map itself, so when you hover over the map games are naturally clustered together with a nice preview of popular tags and titles, helping to understand the contents of each segment
  • More stats: you can now visualise not only market saturation but also Steam DB score, estimated sales, estimated wishlists, and estimated revenue.

 

A three-part screenshot showcasing the updated Game Oracle Steam Map. At the top, a metric selector bar shows five options — Saturation (currently selected and highlighted), Steam DB Score, Wishlists, Sales, and Revenue. In the middle, a zoomed-in view of the heatmap at 200% displays a pixelated mosaic of coloured tiles, with a legend showing blue for low saturation, through yellow, to red for high saturation; dense red clusters indicate heavily saturated genre areas while blue and black regions indicate gaps. At the bottom, a wider view of the same heatmap shows a hover tooltip for a selected cluster, displaying top tags including "3D", "Horror", "Survival Horror", "First-Person", and "Atmospheric", and top games including "The Forest", "Sons of The Forest", and "Resident Evil 4".
The refurbished Steam Map: an interactive 2D heatmap of the entire Steam marketplace. (Top) A metric selector allows users to colour the map by one of five overlays: Saturation, Steam DB Score, Wishlists, Sales, or Revenue. (Middle) The map can be zoomed (here shown at 200%) and panned to explore dense regions in detail. Colour runs from blue (low) through yellow to deep red (high), giving an at-a-glance view of where the market is crowded or sparse. (Bottom) Hovering over a region triggers a cluster preview panel showing the top tags (e.g. 3D, Horror, Survival Horror, First-Person, Atmospheric) and top games (e.g. The Forest, Sons of The Forest, Resident Evil 4) for that segment, helping users quickly understand what genre each area of the map represents. Built with D3.js, the map provides a global, exploratory view of market conditions across all of Steam.

What Next?

With v6.1 released we can begin to focus on our longer roadmap. Our goals with the business are to:

  • Become the most affordable and reliable platform for market research for indie game devs and publishers
  • Help indie's develop sustainable businesses that serve real market niches
  • Address some of those market niches ourselves with our own indie studio (the reason we started all this in the first place!)

To do that we need to make our data collection and processing more reliable. This will probably be what I'm working on all summer. We also have an ambition to improve our sales and wishlist estimates, which will likely be my next immediate focus. 

We hope you enjoy the v6.1 updates and if you have any issues at all you can always reach out to use directly on Discord.

You can also stay up-to-date with all our latest market research insights by subscribing to our free monthly newsletter!

Enjoying these insights? Get the latest Indie Innovations in your inbox every month

By signing up, you agree to our Terms of Service and Privacy Policy.

Our Products

Dive into the data yourself with our research tools for unrestricted market research.

  • -Our Steam Map allows you to find and analyze games with similar art style and mechanics
  • -Analyze uniqueness and performance in one glance with our bespoke statistical scores
  • -Validate your game ideas and develop realistic targets with our forecasting tools