• Intelligent Systems

DataS: Software for Data Collaboration and Monetization

PI: PI: Prof Ooi Beng Chin

Opportunity

The current policy landscape is pushing for the unlocking of data value and the development of new productive forces by promoting the market circulation of data elements to drive innovation. However, there is a significant gap in the market as high-quality data, which is crucial for AI development, is in short supply. This shortage stems from most companies lacking the necessary resources and expertise for effective data development and utilization. Additionally, the process of extracting valuable insights from raw data is complex, time-consuming, and labor-intensive, involving steps such as data management, cleaning, extraction, and integration. Without a standardized approach, these challenges lead to high costs and inefficiencies, preventing businesses from fully realizing the potential of their data assets.

The need for multi-party collaboration to combine datasets and enhance their value is increasingly apparent, but significant barriers remain. Current data systems and AI models often operate in silos, making it difficult to perform comprehensive analyses. Moreover, data protection is a major concern, as sharing sensitive information can lead to potential leaks. A federated learning collaborative framework addresses these issues by allowing data to remain within its local environment while enabling secure and efficient collaboration. By providing technology for seamless data management and AI-driven analysis, along with advanced tools like zero-knowledge proof and blockchain for data validation and tracking, the framework enables companies to unlock data value through secure, multi-party cooperation, paving the way for a new era of data-driven innovation.

Technology

Acknowledging the importance of high-quality data, DataS project aims to revolutionize data lifecycle management in the AI to improve data accessibility, collaboration, and commercialization.

The solution enables (i) efficient cleaning, processing and extraction of valuable data assets from high volumes of mass data, and (ii) contribute and commercialize high-quality data assets without disclosing the actual data. DataS comprises three pillars: (1) GLASSDB serves as an end-user database, including built-in tools for data cleaning, visualization, security, aiding data owners in preparing data for future transactions. (2) Apache SINGA offers a powerful machine learning library to allow users to efficiently apply or develop AI models on their data. (3) Falcon enables privacy-preserving federated learning. It allows multiple parties to develop AI applications using joint data without compromising privacy.

 NANA

NA

 

Document Status

Download

Technology Readiness Level (TRL)

4

Minimal Viable Product built in laboratory

Applications & Advantages

  • 01

    One-Click Data Preparation: Built-in tools enable one-click data extraction, cleaning, encryption, and synthesis, significantly reducing the complexity and labor costs of data processing.

  • 02

    AI×DB Autonomous System: An AI-driven autonomous data system that aims to fully support AI design in each major system component and provide database-based AI-driven analytical capabilities.

  • 03

    Privacy Preserving Data Collaboration: A federated learning collaborative framework that enables clients to collaborate and develop with other third parties or multiple parties without data leaving their local environment