Recency Bias

Data is arguably the most valuable asset of a modern biotech organization. It defines the core scientific direction and drives the highest-stake decisions. Increasingly complex and precise instruments and assays produce data that allows teams to understand the underlying biology from dozens of different angles. And yet, they often analyze it one dataset at a time. So without the ability to easily look across multiple assays and experiments, biotechs aren’t just missing a big opportunity. They’re allowing recency bias to dominate their high-stake decisions.
Despite the rapid advances in analytics software in other industries, bench scientists still use tools that limit their ability to integrate and merge multiple datasets. For individual readouts, they may use assay-specific tools, more general statistical software, or plain old spreadsheets. But when it’s time to bring them all together, file formats are only the first problem.
Initial readouts are typically structured in a way that makes sense for the specific experiment, but not for synthesis with others. They may use different gene vocabularies, different units, different column names. Readouts may be arranged like a plate layout in some of the spreadsheets. Samples may be split across columns in one dataset, and across rows in another.

Getting readouts like this into a compatible form requires custom transformation and logic. Moreover, because the details of experiments tend to vary between runs, this custom logic varies as well, even for the same underlying assay.
Computational biologists and data scientists are able to manage this complexity with code. Often, the transformations only require a few lines. But figuring out those lines without the bench team’s context takes time. And most biotechs just don’t have a big enough computational team to do this for every experiment, let alone do it with a reasonable turnaround.
So, left to their own devices, bench teams are faced with two choices: Do the slow, tedious manual work to transform the data into the right form, or don’t do it at all. And given the pace of most biotech startups, they usually choose the latter.
As a result, most biotech teams leave data on the table. After spending millions on equipment, lab supplies and labor to generate data to drive their high-stakes decisions, they only leverage a fraction of it. Usually that means the most recent data. And this recency bias can end up being even more expensive in the long run.
At Sphinx, we believe that enabling biotechs to maximally leverage their data starts with enabling bench scientists to transform and integrate it as easily as computational teams do today, but without writing a single line of code. That’s why we built the Collections module to help merge multiple datasets into a single longitudinal table. And you can leverage our AI agent, Metis, to do the heavy lifting for you.
Metis scans diverse datasets to infer their structure and automatically identify the necessary transformations. Bench scientists work alongside Metis to verify its work and fill in the small details that it can’t figure out itself.
Thanks to Metis, teams that use Sphinx go from manually analyzing the most recent dataset to seamlessly leveraging all relevant data. For these teams, the bottleneck is no longer the amount of data they can analyze. It’s the data they choose to generate.

Additional Resources
