Again, a frequently asked and pertinent question
This is the fifth of a ten-part series on contract migration and related data extraction, which will help your legal teams gain a better understanding of the information included in all of your contracts.
The first reaction is that EVERY document and EVERY data point needs to be extracted and everything migrated into a new CLM. This is great until the cost and effort are factored in. Migrating 300k contracts into CLM, with an average of 10 basic attributes, turns into 3 million data points that first need to be extracted!
As discussed in the previous blog (Can AI software handle it all?), the only way to get accurate results is to configure the AI software for the extraction and couple it with a legal team to check EVERY attribute against the original contract after extraction. A HUGE task with 3 million data points to be checked. Not to mention de-duplication of the documents, segregation into different contract types (MSAs, Order forms, SoWs, etc.), removal of unwanted documents, drafts, partially signed documents, creation of contractual hierarchy, etc.!
Making the effort both cost and time prohibitive; remember – don’t expect to throw AI software at it and see perfect results!
So how do you then think of breaking this volume of 300k docs/3Milion attributes down? Here are some things to consider.
1. Are all the documents contractual documents that need to be in my CLM? Typically, PDFs only are signed documents. Can JPGs, MSG files, Doc and XLS files, etc. be ignored?
2. Are they all signed documents? Are there duplicates? Are there partially signed documents?
3. Is there a set of clients or contracts that I need to see in the system immediately?
4. Are they all current clients? Or are some from the 90s that don’t even exist anymore?
The answer: Take a subset of the documents based on the above. Then pare that down, “Maybe I can start with my highest spend clients” or “Maybe I can start from a certain year of contract signature and work to current.”
Phase-out the project: At the document level, as well as the number of attributes. You can start by extracting just the basic attributes across all the clients. If that is cost-prohibitive, then take a subset of attributes only for the high-spend clients. Or start with some important document types (maybe MSAs and SoWs are more important than NDAs). Then expand on the attributes and documents as the budget frees up.
For the “irrelevant” documents, you could simply push them into the agreement record field as supporting documents without any metadata extracted. So that at least all of them are being stored in the system.