November 3, 2011 | Daisy

An Sfile Perspective | The Truth about Processing Speed

Smarter, Faster Processing Speeds

With almost perfect timing, EDDUpdate recently posted a blog entry titled, “The Truth about Processing Speed.” It is great that Albert Barsocchini brought this important topic to light and posed some very good questions, as there is a lot of confusion and apprehension to the true facts behind processing speed. The software engineers here at Sfile applauded the comments and wanted us to share with you how we address these questions.

1. Does the vendor extract and preserve all metadata during import?  Some leave important metadata behind in order not to sacrifice ingestion speed.

Sfile extracts and preserves all metadata during import and indexing of the data.  Since Sfile utilizes native file review instead of having to convert the files to images prior to review like many competitors, metadata for the files is not lost during the import.

2.    When a text layer (Images, vector graphic files, etc.) is not found during import, is the file OCR’d during import to extract text that would otherwise be left behind?

Sfile identifies image files during the ESI Process that do not contain a text layer.  After the ESI is complete, Sfile consults with our partners and clients as to what files need to be OCRd. Exception reports are generated and shared with our partners and clients.  In many instances, OCRing of all image files can add unnecessary cost to the processing.

3.    Does the vendor de-dupe via a distributed process or do they use a single machine and require the entire data set to be loaded into one window for processing? 

Sfile can process and flag duplicate files via either a single multi-processor server or through a distributed process across multiple servers.  Since each ESI process leverages the latest multi-core processor technology and Sfile’s patent pending multi-threading parallel processing capabilities, de-duplication can easily be scaled from one server to multiple servers.

4.    Does the vendor parallel process the data across several processors and machines?  Just because a vendor says that the application is multi-threaded, does not mean it’s scalable.

Along with Sfile being multi-threaded and fully scalable, data processing can be scaled from one multi-core server to multiple servers.  Sfile’s HPC processor leverages the latest multi-core processors to split the work of ESI processing between cores using a patent pending multi-threaded technology.  Sfile creates a master thread to track all other sub-processing threads and then utilizes throttling to monitor system resources.  By monitoring and controlling the ESI process in this way, Sfile can maximize system utilization without overwhelming the server capabilities.

5.    Often a vendor completes processing during export to save time during import. For example, linking parents/children/families, TIFFing and de-dupping is often done during export to save time and possibly money on import.

This is related to the last point that Barsocchini brings up, the question of whether it matters to TIFF, deNIST, de-dupe, etc., during import.  With Sfile, all data that is imported goes through our ESI Analysis and processing engines for normalization and optimization; and since Sfile is a Native review platform, time taken for TIFFing in other platforms is irrelevant.  This processing includes ESI analysis, deDuplication, de-NISTing and cataloging all parent/child relationships; and, Sfile’s  engine robustness ensures data is processed with no additional time lapse and provides greater cost-benefits. 

6.    Does speed really matter in your case or can you wait an extra day?

Processing quality is the overriding need of all ESI analysis and should not be sacrificed for the sake of speed alone.  With Sfile’s elegantly designed processing architecture, neither is forfeited.

We also think it is important to point out the real main issue of the article, which is the lack of information, or rather metrics of processing, and the metadata associated with the processed data. To address this lack of visibility, Sfile is diligently working on finalizing a modern communication tool to report processing metrics. This upcoming communication tool will convey in simple details all the metric points that are important to users.

We agree and just as Mr. Barsocchini has questioned the truth about processing speeds, consultants should question the true process of data processing speeds that other vendors claim.