DataS: Software for Data Collaboration and Monetization
Opportunity
The current policy landscape is pushing for the unlocking of data value and the development of new productive forces by promoting the market circulation of data elements to drive innovation. However, there is a significant gap in the market as high-quality data, which is crucial for AI development, is in short supply. This shortage stems from most companies lacking the necessary resources and expertise for effective data development and utilization. Additionally, the process of extracting valuable insights from raw data is complex, time-consuming, and labor-intensive, involving steps such as data management, cleaning, extraction, and integration. Without a standardized approach, these challenges lead to high costs and inefficiencies, preventing businesses from fully realizing the potential of their data assets.
The need for multi-party collaboration to combine datasets and enhance their value is increasingly apparent, but significant barriers remain. Current data systems and AI models often operate in silos, making it difficult to perform comprehensive analyses. Moreover, data protection is a major concern, as sharing sensitive information can lead to potential leaks. A federated learning collaborative framework addresses these issues by allowing data to remain within its local environment while enabling secure and efficient collaboration. By providing technology for seamless data management and AI-driven analysis, along with advanced tools like zero-knowledge proof and blockchain for data validation and tracking, the framework enables companies to unlock data value through secure, multi-party cooperation, paving the way for a new era of data-driven innovation.
Technology
Acknowledging the importance of high-quality data, DataS project aims to revolutionize data lifecycle management in the AI to improve data accessibility, collaboration, and commercialization.
The solution enables (i) efficient cleaning, processing and extraction of valuable data assets from high volumes of mass data, and (ii) contribute and commercialize high-quality data assets without disclosing the actual data. DataS comprises three pillars: (1) GLASSDB serves as an end-user database, including built-in tools for data cleaning, visualization, security, aiding data owners in preparing data for future transactions. (2) Apache SINGA offers a powerful machine learning library to allow users to efficiently apply or develop AI models on their data. (3) Falcon enables privacy-preserving federated learning. It allows multiple parties to develop AI applications using joint data without compromising privacy.