The term “Migration” is used, sometimes loosely to describe getting contracts out of an existing system, or even file folders, into a CLM system.  Migration is the process that takes documents from older systems and puts them into a newer system.

  • Scan – Scan the documents from paper to a file on a computer. This will yield an image-based file, in either .TIF, .GIF, .JPG, .BMP, or .PDF. The .PDF would be an “image based” PDF file. i.e. you cannot search for words within the file because it is a digital picture (image) of the document.
  • OCR – OCR is the acronym for Optical Character Recognition, converting the images into text is the process termed as OCR. This converts the images into text using off-the-shelf software. There are many excellent programs that converts scanned images to text. This is a mature software solution and has been in service for many years and has evolved into a highly accurate capability.
  • Extract – The three-level process of data mining comprises of metadata extraction, review and vetting:
    1. Extract the meta-data elements, aka. Attributes. From each contract, extract the items that you would like to track and query/report on. Such as Counter Party name(s), term(s), termination, jurisdiction(s), full clauses such as indemnity clause or even obligations that are usually strewn across the contracts and their addenda. Using software makes this process much more accurate.
    2. Human oversight is essential to ensure quality. A team of lawyers should be used to check what the software extracts, fix/fill-in-the-blanks of where the software couldn’t (maybe because of some OCR read errors or hand-written attributes such as signatures, dates etc.)
    3. The output should always be verified, regardless of whether it is done in-house or outsourced. The process of verifications should initially include checking everything, and then spot checking the most important elements.
  • Upload – The final output is a database file, often as a .CSV or an .XLS file. This file is then mapped into a CLM system (or another data base program) so that the file itself as well as the extracted attributes are uploaded into the system in the file format structure that will support them. This too needs to be validated with a small sample test file to ensure accurate uploads.
  • Ingest (Repositories/Drives/Folders) – One can choose to migrate the data directly into CLM through their document repositories, shared drives or folders if the metadata elements are already available. Else, a process starting from conversion (Stage #2) will usually follow for migration.


  • Reporting and triggers for action are typically performed by the CLM system.
  • It may require to rerun the above process a few times to ensure the full depth of data is extracted with complete accuracy and reliability.