Report: Computer vision teams worldwide say projects are delayed due to insufficient data

Hear from CIOs, CTOs and other C-level and senior executives about data and AI strategies at the Future of Work Summit on January 12, 2022. Learn more

According to new research from Datagen, 99% of the computer vision teams (CV) have had a machine learning project (ML) canceled due to insufficient training data. Delays, meanwhile, appear to be ubiquitous, with 100% of teams reporting that they experience significant project delays due to insufficient training data. The research also indicates that these challenges with training data come in many forms and affect CV teams to almost equal extent. The biggest problems that CV teams experience include poor annotation (48%), insufficient domain coverage (47%) and simple scarcity (44%).

The lack of robust, domain-specific training data is only exacerbated by the fact that the field of computer vision lacks many well-defined standards or best practices. When asked how training data is typically collected in their organizations, respondents revealed that a patchwork of sources and methods is being used both across the field and in individual organizations. Whether they are synthetic or genuine, collected internally or retrieved from public datasets, it seems that organizations are using all the data they can to train their computer vision models.

However, computer vision teams have already identified and begun to embrace synthetic data as a solution. 96 percent of CV teams reported that they had already used synthetic data to help train their AI / ML models. Nevertheless, the quality, source and proportion of synthetic data used remain highly variable across the field, with only 6% of teams currently using exclusively synthetic data.

Bar chart.  Has your team experienced problems with training models?  52% said it was wasted time or resources caused by a need to rehabilitate the system frequently.  48% said poor annotation resulted in quality issues.  47% said poor coverage of our domain in the collection process.  44% said lack of sufficient amount of data.

This wave of adoption of synthetic data is in line with a number of recent industry reports predicting that 2022 will be a breakout year for synthetic data. This growing consensus certainly bodes well for computer vision’s many, long-awaited applications. In fact, it is possible that these technologies are much closer to being realized than they may seem. Who knows? Maybe we’re just a few good datasets away from a driverless world.

The report is based on the results of an online survey among 300 computer vision professionals representing 300 unique companies.

Read the full report from Datagen.


VentureBeat’s mission is to be a digital marketplace for tech makers to learn about transformative technology and trade. Our site provides essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to join our community to access:

  • updated information on topics of interest to you
  • our newsletters
  • gated thoughtful content and reduced access to our valued events, such as Transformation 2021: Learn more
  • networking features and more

sign up

Leave a Comment