Wednesday, September 28, 2011

Troubleshooting OSI PI compression

I just got back from a client where I was getting a copy of their PI server configuration. My customer offhandedly asked me about the size of his archives- "Is it normal to use 600 megabytes every 2 days?" Off-the-bat, I could tell there was something wrong with the data compression of this system. This PI server was < 5000 points and it collects data from about 20 production units.

Customers with similarly sized-plants and run-rates burn through 600 megabytes a MONTH. The largest cell culture facility west of the Mississippi goes through 1000 megabytes a month, so this particular client was definitely looking at something obvious and something that is statistically outside of normal.

Here's how I troubleshot it:

Look for compressing = 0

The PI Point attribute that determines if the data to a point is to be compressed is the compressing attribute. This value ought to be 1. A lot of people like turning this off for low-frequency tags but it's like unprotected copulation - you're not necessarily going to get pregnant, but there's a chance that an errant configuration runs your system down.

Look for compdev = 0

The compdev point attribute determines what data makes it into the archive. Compdev settings ought to be 0.5 instrument accuracy according to solid recommendations on PI data compression for biotech manufacturers. If you find yourself loathe to define this number, I'd make compdevpercent = 0.1. What this does is it eliminates repeats from the archive.

Use PI DataLink to look for diskspace hogs

The easiest way to identify which tags are the culprit is to pull it up in PI DataLink and use the Calculated Tag feature to find tags with high event count. Start by looking in the last hour... then in the last 12-hours, then last day, then last week. The blatant offenders should be obvious within even 1-minute.

In the case of my customer, he had 7 tags out of about 5000 that was uncompressed. Each of these 7 tags was collecting 64 events/sec. 3840 events/minute. 5.5 million events per day. All told, these 7 tags were recording 39 million zeros into the archive per day... burning through diskspace faster than Nancy Pelosi likes to spend your income.

Modern hardware has made these problems insignificant, but burning through diskspace is a latent problem that rears its head at the most inopportune moment.


and learn about OSIsoft's data compression settings and what they ought to be.

Tuesday, September 27, 2011

Version Control

GMP environments require very strict control. Whether or not regulations mandate them, controlling the process and the manufacturing formula is, frankly, a good idea.

The problem with controlling GMP documents and GMP control-system recipes is the onerous change-control process that has evolved over the years. And my observation of this change-control process is that it was design by regulators and not computer scientists.

It's important to bring out computer scientists because managing source code is a core function of companies that develop software. In fact, version control is so sophisticated that it has become distributed and there are distributed version control systems (like Veracity DVCS) that can help cGMP-regulated companies manage their GMP documents and recipes.

I actually have yet to see version control software applied to GMP industries probably because people don't understand it nor how it works. In fact, only recently did I get a primer on it.

Eric Sink Version Control By Example Book CoverThat primer came in the form of the beginner book on version control called "Version Control By Example" by a fellow named Eric Sink. And while he may not have written this book for QA managers in big pharma... every QA/CC manager in big pharma ought to have a copy of his book.

It goes through the evolution of change control. It talks about central repositories and how the industry is moving towards distributed repositories. It imbues the newbie reader with a shared vocabulary so that people who understand the importance of version control can express their needs to people who write version control software. Get a print version from Amazon here.

At Zymergi, we believe that future of QA change control and document management is to turn to proven methods and technology. And looking to the technical folk in the software version control space is where I think the robust solution lies.

Monday, September 26, 2011

OSI PI streaming data compression

The purpose of data compression in the PI server is to save disk space. I heard a story from the CEO of OSIsoft that the first PI server used a 10 Megabyte hard drive and in the 80's, that hard drive cost a $250,000 dollars.

And as hard drives became easier to make and the cost per megabyte plummeted, people think that the data compression is a legacy component that isn't worth thinking about. In fact, I've had people think throwing money at the problem makes it go away. The problem doesn't go away, and here's why:

The value of PI comes from putting expert eyeballs on trends. If it takes longer to load trends because the archive is filled with uncompressed and redundant data, then those eyeballs are going to view less trends. The cost of curiosity increases every so slightly and over time, you lose.

2 days to burn 600MB
From an IT perspective, liberal compression settings means more hard disk consumption. I've seen a GMP plant use 300 megabytes per day. that's 100 gigabytes a year. "Hold on," you say, "100 GB SSD hard drive will cost you $150... that's less than the cost of the Change Record!" True... but over time keeping years of archive data online means you're going to need to keep upgrading the hardware.

Backing up the same amount of data will cost 10X the time. It's just unwieldy, especially when you're talking about simply setting:
  • compressing=1
  • compdev > 0.

Think about your data compression. Do some research on what they ought to be. In fact, get the Zymergi whitepaper on OSI PI compdev and excdev emailed to you for free.


PI data compression is a set-it-and-forget-it activity. Do it right the first time and you basically never have to think about it again.

Tuesday, September 20, 2011

Upgrading to PI Server 2010 for PI Batch Users

PI Server 2010 is the latest PI server offering from OSIsoft. I don't know for a fact, but this seems like a marketing nomenclature to emulate Microsoft's Office 2007, Windows Server 2008...etc. It'll remind me the way my Office 97 makes me feel 14-years behind the times.

Whatever the case, the internal versioning system remains the same: PI Server 2010 is still version 3.4.385.59. What is drastically different is that PI Server 2010 requires (mandates/coerces) users to have PI Asset Framework (PI AF).

Ok, so what's PI AF? PI AF is essentially a scaleable PI Module Database, and what makes it scaleable is that it's built on Microsoft SQL Server. This means that you need to have SQL Server (or SQL Server Express) installed somewhere. Over time, the PI Module Database will be deprecated in favor of PI AF. So the default behavior of the PI Server 2010 is to copy the ModuleDB to PI AF and put it in read-only mode.

The problem is that there are PI applications that use PI ModuleDB that have NOT been moved to PI AF... for us in biotech, that's PI Batch. So in order to keep these customer happy, OSIsoft provides an option for PI AF to be synchronized with PI ModuleDB, but this requires preparation. The PI MDB to AF Preparation Wizard is what achieves this and this wizard comes with PI SMT 2010... which means you need to install PI SMT 2010 next.

Once the PI MDB to AF Preparation Wizard is run and all the errors fixed, you can proceed with upgrading your PI server to PI Server 2010.

How to Upgrade to PI Server 2010 resized 600

This gives you the overview of upgrading to PI Server 2010. This upgrade is not as straightforward as previous upgrades because of the AF mandate. The devil is in the details and you should run through this process several times before apply it in the GMP environment.

Friday, September 16, 2011

OSI PI BatchDB: Batch Generator - part 2

So we know about the data structure storing time-windows in PI. How do we get the actual data into this data structure? And once we get it in, how do we fetch it in order to use it?

Well, if you have an older system with no batch manager, then the answer is the PI Batch Generator (PI BaGen), software that reads from a data source and sends it to PI. In the case of the PI BaGen, the data source is PI tags, and sends the computed results to other PI tags.

Here's how it works:

You have a tag that reads 0 when a unit is not operating and it reads 1 when the unit is operating. In the case of fermentation, you could use the pH controller mode because you only turn on pH control when there is either media or there are microbes in the bioreactor. This tag is will be the Active Point for your unit.

Let's say you have another tag in which the operator inputs the batch identification... this is the UnitBatch ID Point. And again, when the PLC runs, the program name is written to another tag... this would be the Procedure Point.

With this information, you can fire up the PI System Management Tool (PI-SMT) and configure UnitBatches to be automatically generated for your unit.


The purpose of the post is not to walkthrough a PI Batch Generator configuration, but to help you identify the pre-existing conditions conducive of using the PI Batch Generator interface. (The OSI documentation for PI BaGen is the right place to start).

OSI PI's BatchDB is an exceptional tool... especially for users in the biologics manufacturing space. Configuring PI Batch is a no-brainer, especially if you run a batch process and want to increase productivity by no less than 400%.

Wednesday, September 14, 2011

OSI PI Batch Database (BatchDB) for biologics lab and plant - part 1

Biologics manufacturing is a batch process, which means that process steps have a defined starttime and endtime.

CIPs start and end. SIPs start and end. Equipment preparations start and end. Fermentation, Harvest, Chromatography, Filtration, Filling are all process steps that start and end.

Even the lab experiments are executed in a batch manner with defined starts and end.

Like the ModuleDB, OSIsoft has a data structure within PI that describes batch and it is called PI Batch Database (PI Batch). While it comes free, it does cost at least 1 tag per unit (PIUnit) to use.

The most important table is the UnitBatch table. The UnitBatch table contains the following fields:
  • starttime
  • endtime - when the batch happens
  • unit - where the batch happened (with which equipment)
  • batchid - who (name of the batch)
  • product - what was produced?
  • procedure - how was it produced?

PIBatchIn essence, the UnitBatch table describes everything there is to know about a process step that happens on a unit. Remember: units are defined in the PI ModuleDB, which means the PI BatchDB depends on a configured PI ModuleDB.

So why bother configuring yet another part of your PI server? The main reason is to increase the productivity of your PI users. In our experience, up to 50% of the time spent using PI ProcessBook inputting timestamps into the trend dialog. Configuring PI Batch makes it so that your users can change time-windows in ProcessBook with just a click.

We have seen power-users put eyeballs on more trends in even less time than without PI Batch; and the more trends your team seems, the more process experience they gain.

In this dismal economic environment, simply configuring PI Batch on your PI server can make your team up to 400% more productive. This particular modification takes less than a day to accomplish.

Friday, September 9, 2011

Multivariate Analysis in Biologics Manufacturing

All these tools for data acquisition and trend visualization and search are nice. But at the end of the day, what we really want is process understanding and control of our fermentations, cell cultures and chromatographies.

Whether a process step performs poorly, well or within expectations, put simply, we want to know why. 

For biological systems, the factors that impact process performance are many and there are often interactions between factors for even simple systems such as viral inactivation of media.

One time,  clogged filters with white residue were the result when transferring media from the prep tank to the bioreactor. On several occasions, this clogging put the transfer in hold and stopped production.

After studying the data, we found that pH and Temperature were the two main effects that significantly impacted clogging. If the pH was high AND the temperature was high, the solids would precipitate from the media. But the pH or temperature during the viral inactivation was low, the media would transfer without exception.

After identifying the multiple variables and their interactions, we were able to change the process to eliminate clogging as well as simplify the process.

For even more complex systems like production fermentation, multivariate analysis produces results. In 2007, I co-published a paper with Rob Johnson describing how multivariate data analysis can save production campaigns. From the article is the regression pictured below.

Multiple Linear Regression

You can see that it isn't even that great a fit. Statisticians shrug all the time at RSquares less than 0.90. But from this simple model, we were able to turn around a lagging production campaign and achieve 104% Adherance To Plan (ATP).

The point is not to run into trouble and use these tools & know-how to fix the problem. Ideally, we understand the process ahead of time by designing in-process capability and then fine tune it at large-scale; we are less fortunate in the real world.

My point in all this is if you are buying tools and assembling a team without process understanding and control,  then you won't know which are the right tools or what is the best training. Keeping your eye on the process understanding/multivariate analysis prize will put you in control of your bioprocesses and out of the spotlight of QA or the FDA.

Thursday, September 8, 2011

Process Capability (CpK)

From a manufacturing perspective, a capable process is one that can tolerate a lot of input variability. Said another way, a capable process produces the same end result despite large changes in material, controlled parameters or methods.

As the cornerstone of "planned, predictable performance," a robust/capable process lets manufacturing VPs sleep at night. Inversely, if your processes do not tolerate small changes in materials, parameters or methods, you will not make consistent product and ultimately end up making scrap.

To nerd out for a bit, the capability of a process parameter is computed by subtracting the lower specification limit (LSL) from the upper specification limit (USL) and dividing this by the standard deviation measured of your at-scale process:


The greater the Cp, the more capable your process. There are many other measures of capability, but all involve specifications in the numerator, standard deviation in the denominator and values of 1 or greater means "capable."

A closer look at this metric shows why robust processes are rarely found in industry:

  • Development sets the specifications (USL/LSL)
  • Manufacturing controls the at-scale variables that determine standard deviation.

And most of the time, development is rewarded for specifications that produce high yields rather than wide specifications that increase process robustness.

Let's visualize a capable process:


Here, we have a product quality attribute whose specifications are 60 to 90 with 1 stdev = 3. So Cp is (90-60)/6*3 = 30/18 = 1.6. The process has no problems meeting this specification and as you can see, the distribution is well within the limits.

Let's visualize an incapable process:


Again, USL = 90, LSL = 60. But this time, the standard deviation of the process measurements is 11 with a mean of 87.

Cp = (90 - 60)/ 6 * 11 = 30/66 = 0.45. We can expect the process to meet the specification approximately 45% of the time.

Closer examination shows that the process is also not centered and vastly overshoots the target; even if variability reduction initiatives succeeded, the process would still fail often because it is not centered.

If you are having problems with your process reliably meeting their specifications, apply capability studies to assess your situation. If you are not having problems with your process, apply capability studies to see if you are at risk of failing.

The take-away is that process robustness is a joint manufacturing/development effort, and manufacturing managers must credibly communicate process capability to development in order to improve process robustness.

Get a Proven Biotech SPC Consultant

Wednesday, September 7, 2011

PI ProcessBook Is A Trend Visualization Tool, Not An Analysis Tool

ProcessBook is the trend visualization tool written by OSIsoft for their PI system. It is what is called a rich-client, which basically means that it is installed on your local computer and uses your computer's CPU to give the users a rich set of features. Because PI ProcessBook is how users interact with PI, this program is often confused for the PI system itself.

Our customers really like PI (the server) and ProcessBook (the client) - so do we - and sometimes fall in the trap of thinking that PI should be everything to everyone. And why shouldn't they?

ProcessBook provides everything you need for real-time monitoring. One time, I was watching this oxygen flow control valve to my bioreactor flicker on and off. I verified this was abnormal behavior by checking the O2 flow control valve tag from history. I called to the plant floor and met up with the lead technician in the utilities space to walk down the line and found that oxygen was actually leaking from it. There were contractors welding in that space at the time and though risks were low, we got them to stop until we fixed the problem.

Another time using ProcessBook, we saw a fermentor demanding base (alkali) solution prior to inoculation... something that ought not happen since there were no cells producing carbonic acid that required pH control. We called into the floor to turn off pH control to stop more base from going in. Confirmed the failed probe and switched to secondary. $24,000 of raw material costs were saved from looking at PI ProcessBook to see what the trends were saying.

The reason you don't put everything in PI (hence ProcessBook) is because ProcessBook is not an analysis tool. Analysis requires quantification. Good analysis applies statistics to let you know if differences you are measuring are significant. ProcessBook does not do that. It is there to help you put eyeballs on trends.

Spending funds to make PI ProcessBook into an analysis tool has a diminishing ROI. Your money is better spent elsewhere.

Get Expert OSI PI Pharma Consulting

Tuesday, September 6, 2011

OSI PI Module Database (ModuleDB)

The PI Module Database (ModuleDB) is a hierarchical data structure introduced by OSIsoft years ago. This hierarchical data structure comes free with every PI server and is often overlooked (people don't bother configuring it).

Example of PI ModuleDB

You see, the purpose of the ModuleDB is to account for the units of your physical plant. Perhaps you are a biologics manufacturing facility with bioreactors, mixing tanks, centrifuges and chromatography columns. Or perhaps you're a sulfuric acid plant with a blower, furnace and stack. You have equipment big and small whose I/O are sending data to the PLC/DCS that then send the data to PI. You can describe your physical units with the PI ModuleDB.

The big deal with the ModuleDB is that you get to associate these I/O (tags) with the unit to which they belong; and then you get to label (create an alias) that tag something other than the instrument address (which is gibberish to most people anyway). For example:


is not as memorable as
T447 Optical Density

The reason having unit/aliases is important is that it makes PI relevant; it brings PI tags closer to the community of users that talk about it. Walk around your plant and listen to the operators, supes, engineers and managers talk. Are they talking about the pH on Bay 1 or are they talking about AIC510A01.PV?

Chances are, their words refer to some parameter/measurement on the unit rather than to the tagname... which is mostly known by just the folks in automation or instrumentation.

Configuring the PI ModuleDB to represent your physical plant and then associating the relevant tags to those units via alias is a high-bang, low-buck activity that will pay dividends for years to come.