TECHNOLOGIES USED
- Snowflake
- Snowpark Python
- Vigilant AI's proprietary API
OUR CLIENT
Imagine running a corporation where every accounting transaction can be automatically reviewed and verified. Imagine, as an auditing firm upon receipt of a companies' financial documents, that they have already been automatically reviewed against the accounting system for discrepancies without reading a single document yourself. Vigilant AI is building a platform to empower auditors to shift to a data-driven approach to audit, and we were lucky to meet them at a time where they were considering Snowflake to enhance their data querying, visualization, and sharing capabilities.
Snowflake appealed to Vigilant AI as a way to easily query tables of processed data from their API, as well as to seamlessly integrate with Snowflake BI tools like ThoughtSpot. The easy data-sharing capabilities for customers already in Snowflake (and even those without) were also particularly interesting for Vigilant AI, but they were new to the platform and needed to verify that it was a good fit for their business plans.
Up until now, everything they had built was built in-house from scratch, and Snowflake's wide suite of data tools were a big draw to ensure that their product could easily and securely connect with how their customers, and their customers' clients wanted to share and review data.
OUR IMPACT
Our proof of concept (POC) provided Vigilant AI with a powerful sales tool and exploration into what's possible through Snowflake. By demonstrating direct integration possibilities with Snowflake, Vigilant can now show prospective clients exactly how their service can operate within the client's existing data infrastructure.
Key outcomes from the POC include:
- A demo for prospective clients: Showcases seamless integration with Snowflake with examples of data visualization, attracting clients already invested or interested in the Snowflake ecosystem.
- Enhanced data accessibility: Delivers queryable data directly within Snowflake, compared to the current version which requires you to export CSVs from the Vigilant AI web interface.
- Expanded BI tool compatibility: Demonstrates the possibility of integration with native visualization and Snowflake BI partners like ThoughtSpot, letting clients analyze data within their preferred platforms, or even within Snowflake itself.
- Maintained existing workflows: Operates in parallel with the existing workflows through the Vigilant AI web interface. Users can operate in Snowflake or Vigilant and achieve the same goals without any conflicting processes between the two flows.
Working closely with the Vigilant AI team, we delivered the proof of concept in three weeks and provided immediate value as a compelling sales demo for an upcoming conference. Built to run alongside existing systems, the solution lays the groundwork for future expansion of Snowflake integration capabilities without impacting existing customer workflows or adding significant extra development work to the Vigilant AI team.
THE CHALLENGE
Our central challenge for this project was time. The POC had a hard deadline and had to deliver specific functionality to meet Vigilant AI's needs. This meant that throughout the project, we had to make decisions between building solutions for the short term that weren't as scalable, and our ideal of building things for the long term.
Within this time constraint, there were other technical challenges we needed to face:
- Data ingestion complexity: Vigilant AI's API takes in and processes unstructured files like PDFs before they're ready for analysis, making direct ingestion via Snowpipe (designed for tabular data like CSVs) unsuitable. This demanded a more sophisticated approach to data loading and processing within Snowflake.
- Data structure variability: Every file that is processed is different, not only between obvious examples such as invoices and receipts but also within sub-genres of each, such as invoices to or from different companies. This variability prevented the creation of a fixed, persistent schema. Instead, a dynamic schema inference mechanism was required when building result tables to ensure the data could be effectively queried within Snowflake.
- Seamless workflow integration: A critical requirement was to integrate the solution seamlessly within Vigilant AI's existing pipeline and API, including their OAuth implementation. This necessitated an adequate (and quick - given the time constraints) understanding of their architecture to ensure compatibility and efficiency without requiring significant changes to their existing processes.
Throughout this project, we prioritized finding ways to leverage Snowflake's strengths while working within Vigilant's established technical framework, rather than forcing change to their existing systems.
THE SOLUTION
To address the challenges of integrating Vigilant AI's solution with Snowflake, we designed and implemented a solution encompassing the following key components:
Automated data pipeline: We developed an automated data pipeline for new file uploads comprising the following steps:
- File uploads kick off orchestrated tasks and streams to send new files for processing via Vigilant AI's API.
- A dedicated "Snowflake" button within Vigilant AI's GUI enables users to effortlessly migrate processed data to Snowflake for analysis and further exploration.
- To save time for the POC, we chose to use the existing CSV export functionality to put processed data into Snowflake using dynamic schema inference with Snowflake's infer_schema function to generate queryable tables on the fly.
- Using a variation of the same process, we brought the results of audit tests into Snowflake for analysis and further exploration as well.
Data visualization and exploration:
- We were able to demonstrate the ease of building dashboards within Snowflake to provide users with simple access to visualizations of the data. These dashboards facilitated data exploration and analysis, enabling Vigilant AI, and thus their customers, to quickly understand and interpret the results.
- Using out-of-the-box integrations that come with Snowflake, the same types of dashboards could be easily built in common BI tools like ThoughtSpot, Looker, and PowerBI.
Seamless workflow integration: To ensure minimal disruption to Vigilant AI's existing processes, we integrated our solution directly into their workflow. This included:
- Leveraging our expertise in OAuth and Python to interact seamlessly with Vigilant AI's API.
- Development in parallel with existing systems so that nothing we did affected any existing customers' workflows.
- Providing comprehensive documentation and demonstrations to support a smooth transition and minimal learning curve for Vigilant AI's team. We talk more about the importance we place on handoffs in another post.
This solution effectively addressed the challenges of data ingestion, schema variability, and workflow integration, providing Vigilant AI with a powerful and user-friendly mechanism for leveraging Snowflake's capabilities alongside their existing environment. The result is a straightforward example of the ways that Snowflake can enhance Vigilant AI's capabilities, and something they can show their customers as a sales tool moving forwards.
THE FUTURE
Our proof of concept demonstrates the core integration capabilities between Snowflake and Vigilant AI. Looking ahead, we envision a fully integrated solution that brings the full spectrum of Vigilant AI's features into the Snowflake environment. To enhance the user experience, we plan to implement smarter file routing and intuitive user interfaces, simplifying data loading and minimizing potential errors. This includes exploring the development of a native Snowflake application that would put the Vigilant AI GUI right in Snowflake itself.
Furthermore, Snowflake provides more advanced tools we could use for data security like row-level or column-level masking to protect sensitive information and ensure compliance with data privacy regulations. To ensure we meet the rigid data security needs of clients, implementing robust data governance policies and procedures such as pipeline testing and data quality checks, will ensure data integrity and reliability.
Beyond batch processing, we also see the possibility of building real-time data pipelines that instantly detect discrepancies and notify key stakeholders. This evolution will enable proactive monitoring and immediate response to critical data events.
You can read Vigilant AI's blog post on our ongoing partnership and the ways we hope to expand the tool into customer Snowflake accounts here. We look forward to the many opportunities ahead and the interesting technologies we will get to build together.
TESTIMONIAL
"Polar Labs provided expert guidance on embedding our services into the Snowflake environment, allowing us to visualize our data findings for immediate client value."
- John Craig, CEO, Vigilant AI