Migrate legacy contract metadata

Frequently Asked Questions

Learn more about contract data extraction and our solutions

What is abstraction/extraction and why do I need it?

Extraction is the process of pulling out relevant data points from contractual documents.  Once these are extracted and labeled accordingly, they can be put into a structured database (even Excel) for further reporting and analysis.

For example, you can extract Entity name, Counterparty, Expiration date, Autorenewal (Y/N), Workers Comp insurance liability limit (currency), etc.

Reports on these can be done once put into a structured database.  For Example, “Give me the names of the Counter Party which have a Workers Comp liability limit of less than $1,000,000, etc.

Why do I need extraction?

There are multiple reasons for extraction.  At a 60,000-foot level, you need to know easily what is in your contracts.  Having the relevant information at your fingertips through reports in a structured database gets you all the information without you having to open each and every document to determine which meets the profile of your query.  Some of the reasons for extraction are:

  • Information – Much knowledge of the old contracts. This is lost if it is not in a new CLM
  • Reporting – can do triggers of contracts coming up for renewal, supplier contracts with penalty clauses, etc.
  • Adoption – CLM system is adopted if all the current and older contracts are in the new system along with the relevant data. (Important to note that it is not simply a document repository)
  • Revenue recovery – police penalty clauses, etc.
  • Compliance – track items that would be affected by changes in regulatio

Even though there’s a difference in the dictionary meaning we use these two terms synonymously. As per the dictionary –  extraction means the exact phrase from the contract/document is picked and copied whereas abstraction is the iota of extracted data.

For example, for the expiration date attribute, if the contract says “This contract expires on 3/4/19”, the extraction value for this would be “This contract expires on 3/4/19” whereas the abstraction value will be “3/4/19”, removing the irrelevant part from the phrase.

Data can be defined as a set of qualitative and quantitative values, that can be further processed to derive meaningful information. The quality of the information derived from a set of data depends on the accuracy and completeness of the data. It can further be divided into three types:

  1. Structured data – Data that has consistent formats and can be easily organized into a database like facts and figures entered into a pre-defined format or well drafted reports.
  2. Semi-structured data – The information that doesn’t reside in a pre-defined data format but does have some organizational properties that makes it easier to analyse, for example excel reports that do not conform to a formal structure.
  3. Unstructured data –Refers to information that either does not have a pre-defined data format and/or it is not organized using a common layout. Examples of unstructured data can be word docs, PDFs, video files, presentations, emails etc.

Broadly data can be classified into five categories:

  • Text – It includes script characters and expressions (words and sentences), like text written in different languages
  • Numeric – Numeric values consist of numbers, decimal, percentage values, etc.
  • Currency – This data type consists of monetary values like $ (US Dollar), € (Euro), ¥ (Yen), etc.
  • Date & Time – It includes value for the years and time written in different formats like AM/PM, for dates 20th June 1990, 20/06/1990 (dd/mm/yyyy), 06/20/1990 (mm/dd/yyyy).
  • Boolean Expressions – it indicates true/false, yes/no or on/off values.

Legacy data is data from your legacy or your executed contracts. These are contracts that have been executed by both parties. They may be current and still relevant or may have expired. It is important to track legacy data for the following reasons:

  • Information – Much knowledge is stored in the old contracts which may be applicable to future business decisions. This is lost if it is not in a CLM.
  • Reporting – can run triggers on contracts coming up for renewal, or supplier contracts with penalty clauses, etc.
  • Adoption – CLM system is adopted if all the current and older contracts are in the new system
  • Revenue recovery – police penalty clauses, etc.
  • Compliance – track items that would be affected by changes in regulation

To know more read our whitepaper on 7 Reasons to Load Legacy Contracts

Contract lifecycle management is a system to manage contracts within an organization. With a single unified repository to track contracts better, one can be benefited from the reports generated through these systems helping businesses to make better-informed decisions.

Most of contract lifecycle management systems provide contract authoring, contract creation and approval, workflow management, digital signatures, and other related services.

As contracts are a written set of guidelines and commitments to undertake a task or deliver a product or service, failure to comply with these commitments can attract legal or regulatory penalties. Effective contract management is as necessary as entering into it.

Key benefits of a CLM are:

  • Helps in contract authorization and creation
  • A good utility to manage workflow of contract management
  • It is a single and unified repository
  • Provides insight into key provisions at fingertips
  • Helps in compliance monitoring and follow-up
  • Helps in identifying and mitigating risks
  • Can generate useful reports for better informed decisions
  • Helps in management reporting
  • Access to all information in one place from multiple concurrent locations

Contract migration into a CLM is the way that yields benefits from having a single and unified repository to identify & mitigate the risks. Contracts carry an enormous amount of information, knowing what has been committed in past can help take better future business decisions. Migrating legacy contracts into a CLM leads to a structured database that can help in identifying hidden or forgotten information which may result in additional revenue recognition or obligations that could have caused heavy penalties to the organization. Also, it is an effective measure to track any compliance or regulatory changes.

Migration can further be segregated into two categories:

  • Document migration – It is the process of uploading the scanned or OCRed copies of your contracts onto the CLM
  • Meta-data extraction and migration – Hereby, key data points are also known as meta-data from the contracts and are extracted and uploaded onto the CLM for review and reporting.

It is a four steps process:

Step 1 – Scan all the paper documents into a computer base file

Step 2 – Convert the scanned documents into the text from the images (OCR)

Step 3 – Extract data points from these contracts

Step 4 – Ingest documents and data points into the CLM

To learn more about the process, download Brightleaf’s whitepaper on Legacy Contract Management.

Contracts being a legal document carries lot of information, but not every bit is required to be tracked and monitored. It is important to understand what to be extracted and what not to be, as the metadata attribute extraction requirements can vary from industry to industry. One can follow below guidelines to identify which attributes to extract:

  • Divide the contractual documents into different types (NDAs, Procurement agreements, SLAs, etc.)
  • Work with each of the departments that touch the contract types to determine which elements are important to them and need to be extracted.
  • Look at extraction from a reporting standpoint – what reports do you want to run on the extracted data.
  • Determine the frequency of the data points that need to be reported. For example, if the only reason you want to extract the jurisdiction state would be if something goes wrong with the contract, you may not want to extract it since it may only be required for 1% of the contracts through a year.

A cautionary note – if you think full clauses need to be extracted, first determine “why?”. You can report on the full clause, but it may be more beneficial to break down the clause into different data points: For example, you may think to extract the Indemnification clause.  However, it may be better to break that down to Indemnification for breach of confidential information for receiving party (Y/N) etc.

Attribute Name

Attribute Type

Data Type

Name/Title of Agreement

Standard Attribute

Text

Type of Agreement

Standard Attribute

Text

Contract Number

Standard Attribute

Alpha Numeric

Effective Date

Standard Attribute

Date

Expiration Date

Standard Attribute

Date

Renewal Option

Standard Attribute

Yes/No

Initial Term

Standard Attribute

Numeric (Days or months)

Governing Law

Standard Attribute

Text

Delivery term

Business Specific

Text

Freight on Board Charges (incurred by)

Business Specific

Text

Consumer Price Index Adjustment Date

Business Specific

Date

Most Favoured Customer

Business Specific

Text

Currency Conversion Rate

Business Specific

Alpha Numeric

HIPPA Complied

Business Specific

Yes/No

 

Contracts carry crucial information with legal and regulatory bindings any missing or incorrect information can trigger penalties and other legal consequences. Failing in capturing a payment date can attract more interest on payment or even a termination or an incorrect renewal date can result in a loss of revenue. Solely software solutions, though really fast, can promise a quality up to 75-85% only, so it becomes inevitable to have a human intervention to get the maximum level of quality. Read Brightleaf’s whitepaper to understand human intervention in contract review and abstraction.

Typically, a software will do the first level of extraction of data-points from the contracts.  Then a team checks the results.  “Review” has connotations of spot-checking.  That will NOT lead to any accurate results of extraction.

Brightleaf’s stringent process which is embedded in the software AND the lawyers who check the output, makes them verify EACH-AND-EVERY data element against the original document.  This is the ONLY way to get highly accurate results.

So ask the vendor – “Do you check every element, or does “Review” mean spot-checking?  Can you provide an audit of every element checked by the lawyer with time and date stamps”?

  1. Manual abstraction – Interpretation and abstraction of important information done by humans. It requires planning, oversight, and lots of error checking.
  2. Automated abstraction – Abstraction of data purely done by software based on some pre-defined rules and algorithms. Fast but prone to errors.
  3. Hybrid abstraction – Best of both, a combination of manual and automated abstraction. Here the data is abstracted using software making the process fast, then vetted by experts for completeness and accuracy.

To understand the difference, download our whitepaper on Legacy Contract Management.

  1. Visualization of data is done through reports generated by a CLM system. Your CLM provider can help you with a sample report which will assist you identifying what additional data you might need or something you might have already got extracted is not of much use so, it can be dropped from the final extraction.

    You can follow the below process to get the sample reports:

    • Once you have determined the data elements that need to be extracted and uploaded, ask your CLM vendor to provide sample extractions for each contract type.
    • Ensure the data is how you would interpret each data point
    • Ensure that the test data set is uploaded to the test instance and you are receiving the benefit from the reports that can be generated on the data points extracted
    • Run the extracted data through the CLM in small batches. This way you are able to see what is being uploaded/recorded properly and what may need to be changed.

    Do not do a “1 and done” upload, as this may result in unforeseen errors with no way of fixing them.

  • Yes, but this requires a highly thoughtful process
  • Tables are not always consistent across agreements.

For Example: on the buy side, there is no consistency on the columns, SKUs, quantity, unit pricing, pricing per year, escalations per year

  • This makes “normalization” of the data i.e. defining fields that go across all contracts, an intensive task.
  • Always look at it from a reporting standpoint. How would you want the data points shown and how do you want to report on them?

Yes, but…..

  • The software can do a “bulk of the heavy lifting”
  • But no software will be perfect! (no matter what they guarantee, there will always be an error in the software)

There are also hidden additional costs for which you must account: Software training, additional modules to handle Third Party Paper, configurations, lawyers or interns to quality control the resulted outputs, etc.

  • Do they have software
  • Do they have their own software
  • Do they have their own people
  • Are their people lawyers?  Not interns
  • Do they q/c each and every extracted element?  Or spot check
  • Can they guarantee Six Sigma levels of accuracy, a must when you are dealing with critical contractual data?
  • If they provide more than extraction services, how do they choose the extraction team?  Is it whosoever is on the bench?
  • Your contracts are the most confidential part of your business
  • Some vendors use a network of part-time workers to process your contracts.  This creates a security risk for your most sensitive data
  • ISO 27001 is a location-based information security certification standard. You need to insist that the vendor have that certification, and ask for all the controls in place
  • We have our own NLP/AI software.
  • We have our own team of lawyers
  • We are highly focused company, just performing the task of extraction
  • We have a stringent Six-sigma process focused on extraction
  • We control people, process and technology dimensions
  • We guarantee up to Six Sigma level of extraction quality
  • We q/c each and every element against the original document
  • Our team of q/c lawyers are solely trained on q/c
  • Our q/c team of lawyers, sit next to the configures of the software who are next to the software developers.  This gives us the Six Sigma quality.

When you are dealing with legacy documents for migration into a CLM system, these documents have many issues – missing dates, handwritten attributes, typos, bad document scans, mis-filings of masters, and the matchup of addendums, etc.  The software cannot solve all the problems.  Since it is unknown, the only way to get dependable accuracy of data, a must when you are running your business on contracts is to verify EACH and EVERY data element that is extracted against the original contract.

The questions that you ask are:

  • Do you have your own software (get a demonstration)
  • Do you check EACH and EVERY data point against the original contract that is extracted by the software?
  • Does the software have the ability to track and see if the people have checked EACH and EVERY datapoint? (ask for a demonstration)
  • Is the team that is doing the verification of every data point comprised of lawyers (not interns)?

Companies will short-change the verification process and will “spot-check” the extracted output.  This is cutting corners in a major way.  And gives rise to low-quality extraction – oftentimes, less than 70%.

Having 30 errors in 100 is very poor quality data.  What good is any analytics done on the data when it is inaccurate?

  • Yes, Brightleaf has full control over its software allowing full flexibility in the extraction process.
  • Any CLM system can be programmed to ingest any type of business specific data.
  • You can even determine how each of the attributes are tracked/extracted:

For Example, Start date: If there is no explicit Start date in the contract, do you wish to extract it as the Effective date?  Or the Date of Signature?

Brightleaf provides a hybrid solution to handle all your abstraction needs. Brightleaf with the help of its own proprietary technology and a team of legal experts offers a six-sigma level quality output.

  • Save time and money hiring expensive abstraction firms.
  • Empower you to gain strategic insights into overall legal obligations, risks, and opportunities.
  • Helps you to ensure compliance with existing contracts.
  • Enable you to keep your ERP or contract management system up-to-date.
  • Offers a custom-tailored service to clients