Showing posts with label data analysis. Show all posts
Showing posts with label data analysis. Show all posts

Tuesday, June 17, 2014

Securing the Process Data Historian Search Engine: ZOOMS

When I worked at an FDA-regulated commercial biologics manufacturing facility, access to data was limited to authorized individuals; 21 CFR Part 11 (10d) specifically calls for this control so that the authenticity, integrity, and, when appropriate, the confidentiality of electronic records are assured.

Needless to say, those who operate under cGMP regulations are worried about compliance with Part 11 have concerns with software that is trying to democratize the data.

Fair enough.
ben franklin on freedom vs security

It turns out that you can have cake and eat it too.  That is, you can have an easy-to-use, web-based interface for your trend data AND you can limit the access to authorized individuals.

Within Internet Information Services (IIS), the web-server on which ZOOMS runs, the default setting is to enable Anonymous Login and to disable everything else.

The way to secure ZOOMS is go to click on the website under which ZOOMS is installed and select the Authentication Feature:
IIS authentication feature
When you double-click on IIS Authentication, you get a setting for how you want this web-server to be secured.  There are 4 options to enable/disable.  The way to ensure unauthorized access is not granted is to set Anonymous Authentication Status to Disabled.

And if you happen to be running Windows Active Directory and you want to use that as the method to control access, set Windows Authentication Status to Enabled.

IIS disable anonymous login

Assuming that you're in an environment with Active Directory, you can use Active Directory credentials to control access to ZOOMS.

Now when I attempt to access ZOOMS, here's what I get:
ZOOMS password challenge

(I'm on a Mac using Safari to access a Windows IIS server running ZOOMS)

And only when I input valid Windows credentials am I granted access to ZOOMS.  At this point, access to ZOOMS has been limited from everyone with network access to everyone with a valid domain account.

If you want further restrictions, you simply need to set up ASP.NET URL Authorization where you specify the exact role that you want to allow access.
ASP.NET Url Authorization
If you want to create an Active Directory group for just your users, you can do so.  If you want to grant access to an existing Active Directory group, you can do so.

The key here is to Allow first, Deny last.

The point in all this is that you don't have to sacrifice the freedom of your information for the sake of security.

For more questions, contact Zymergi Technical Support at 650-646-4996.

See also:

Thursday, January 23, 2014

Multivariate Analysis: Pick Actionable Factors Redux

When performing multivariate analysis, say multiple linear regression, there's typically an objective (like "higher yields" or "troubleshoot campaign titers"). And there's typically a finite set of parameters that are within control of the production group (a.k.a. operators/supervisors/front-line managers).

This finite parameter set is what I call, "actionable factors," or "process knobs." For biologics manufacturing, parameters like

  • Inoculation density
  • pH/temperature setpoint
  • Timing of shifts
  • Timing of feeds
  • Everything your process flow diagram says is important
are actionable factors.

Examples of non-actionable parameters include:
  • Peak cell density
  • Peak lactate concentration
  • Final ammonium
  • etc.
In essence, non-actionable parameters are generally measured and cannot be changed during the course of the process.

Why does this matter to multivariate analysis? I pick on this one study I saw where someone built a model against a commercial CHO process and proved that final NH4+ levels inversely correlates with final titer.

What are we to do now?  Reach into the bioreactor with our ammonium-sponge and sop up the extra NH4+ ion?

With the output of this model, I can do absolutely nothing to fix the lagging production campaign. Since NH4+ is evolved as a byproduct of glutamine metabolism, this curious finding may lead you down the path of further examining CHO metabolism and perhaps some media experiments, but there's no immediate action nor medium-term action I can take.

On the other hand, had I discovered that initial cell density of the culture correlates with capacity-based volumetric productivity, I could radio into either the seed train group or scheduling and make higher inoc densities happen.


Friday, August 23, 2013

10 Ways to Tell If You Suck at Cell Culture Support

Here are 10 ways to tell if your support of large-scale cell culture, well, sucks:
  1. s-curve volumetric productivityKey performance indicators.
    You don't know what the right KPIs are for cell culture, but you're 100% certain that it's titer.
  2. Proven Acceptable Limits.
    You don't have any defined for your critical process parameters and you failed to demand them of Process Development.
  3. control chart IR spcControl charts. You're not using them or you don't know how to make them, and your bar graphs are just fine, thankyouverymuch. They're not just fine and it's because you can't identify:
  4. Special cause vs. common cause variability.
    You investigate common cause variability because that titer seemed too low or too high.
  5. CpK. You don't know what process capability is and you're not calculating them.
  6. Histograms. You aren't looking at the distribution of your KPIs.
  7. Bi-variate Analysis.
    Linear-regressions, ANOVA, Tukey-Kramer.  You have no idea what this stuff is, 我還不如寫中文.
  8. multivariate analysisMultivariate Analysis.
    You're not doing these and when you do, Y-responses are treated as X-factors.
  9. MSAT local labLocal Lab. You don't have a local MSAT lab to run satellite experiments to confirm the hypothesis generated from the plant.

    A lot of people assume that you can use the resources of a distant process development lab; but then again, a lot of people like blood sausage.
  10. Excel. You're still using Excel to store data. You're still using Excel to analyze data. If you're looking to play varsity cell culture support, you really need to be using a cell culture data management system.

See also:

Thursday, July 25, 2013

Fermentation Analysis Software

There's this neat question on the Mathematical Modeling of Fermentation LinkedIn Group on software used in Fermentation.
I would like to ask about the software for the analysis of your fermentation processes. Software for analysis, but not for the fermentation control. Although, if you can say something about the control programs, it is welcome, too.

I suspect that the people in this group deal with small-scale or pilot plant-scale, but this question is actually worth answering for large-scale cell culture/fermentation.

deltav In 1999, the fermentation control software was basically free-for-all.  No single company had a stranglehold on the market. Allen-Bradley PLCs were popular, Siemen's was popular, Honeywell was a good option... But over a decade, the company that has really taken over the control layer is Emerson's DeltaV system.

The reason this is worth talking about is because the data source comes from instrument IO that is monitored by the control software. All analysis is preceded by data capture, archival and retreival. DeltaV is that software that does the capture.
1) What software is used on your fermentation equipment?
osisoft pi Next up is the system that archives this instrument data for the long-term. DeltaV has a historian, but the most popular data historian is OSIsoft's PI (a.k.a. OSI PI). And the reason is because the PI has stellar client tools and stellar support. PI client tools like DataLink and ProcessBook are good for generic process troubleshooting and support. More sophisticated analysis requires statistical programs.

Zymergi offers OSI PI consulting for biotech companies.

2) What software you prefer to analyze of your fermentations and for your future fermentation processes planning?

JMP This is where there's a lot of differentiation in fermentation analysis software. My personal fave is SAS Institute's JMP software. This is desktop stats software that lets users explore the data and tease signal from noise or truth from perception. I've solved a ton of problems and produced answers to 7-figure problems with this software.

Zymergi offers MSAT consulting helping customers set up MSAT groups and execute MSAT functions.

There are others operating in this space, but I have yet to see any vendor make headway beyond trial installation and cursory usage.
3) Do you agree with the fact that the question of software for fermentation processes doesn't undergo a rapid development now?
All of these tools are not fermentation specific.  They each are superior in their respective categories:

  • DeltaV is a superior control system
  • OSI PI is a superior data historian
  • JMP is a superior data analysis software
Where there is a gap, fermentation analysis is how to link upstream factors to downstream responses.

Friday, April 5, 2013

How To Interpret Distributions (Histograms)

Here's a set of Y-Distributions (histograms) I saw on the data visualization sub-Reddit.

On the left side, we have Polish language scores. On the right, we have mathematics.

Each row is a year... 2010 through 2012.

According to the notes on the page, these are the high-school exit exam scores for which passing is to receive 30% of the total available points.

Most people know what a "bell-shaped" curve looks like and those Polish language scores don't look like bells. In fact, it looks like right around the 30% mark, someone took the non-passing scores that were "close enough" and just handed out the passing score.

We sometimes see this in biotech manufacturing... where in order to proceed to the next step, you need to take a sample and measure the result. If there is a specification, you'll see a lot of just-passing results. What is euphemistically called, "Wishful sampling."

The process is the process and if the sampling is random, you expect a bell-shaped curve. In the case of Polish high school students, their Polish skills are what they are. What you're seeing is an artifact of the people grading the tests. I would bet a fair amount of money that teachers or schools are rewarded according to the number of students who pass this test.

Let's look at the mathematics scores. This "wishful grading" is going on in mathematics, but is far less pronounced. What is crazy is how different the distributions look from year to year (compared to the language histograms).

It's hard for me to think that mathematics skills of students across Poland vary that much from year to year. Like the U.S. News and World Reports rankings of schools, it's more likely that the difficulty of the test changes significantly from year to year... in this case with 2011 tests with particularly difficult questions.

Histograms say quite a bit about your process. What they never tell you is that the histograms also tell you quite a bit about your process specifications and how truthful your measurement systems are.

If I were the FDA... and I wanted to be mean about it, I'd request a distribution of measurements for every single process specification, and if I saw something like this "Polish language" test, someone has some explaining to do.

Get Biotech Manufacturing Consulting

Tuesday, February 5, 2013

Ice Cream causes Swimming Pool Deaths!

I see this proverbial "Ice cream causes swimming pool drownings" statement made in the world of economics and politics all the time.

It's so prevalent that there's a Wikipedia article on spurious relationships.
[Ice cream] sales are highest when the rate of drownings in city swimming pools is highest.
You can look at the data over and you'll see that this phenomenon happens like clockwork:
  • Low ice cream sales... fewer swimming pool deaths.
  • High ice cream sales... many swimming pool deaths.
So there's a correlation, right? Yes.

With that correlation, some go farther to allege that ice cream causes drownings or that drownings causes ice cream sales. (Ahem, no.)

To claim that ice cream sales is an indicator of drownings or vice versa also misses the point because ice cream sales and swimming pool deaths are both results of an underlying factor; a heat wave.

Unfortunately, this statement of two symptoms indicating one another is seen all the time in the world of cell culture analysis:
  • Final ammonium (NH4+) is an indicator of culture performance
    - or -
  • Final lactate (Lac) is an indicator of product titer

credit: The Usual Suspects MGM

Seriously, who here doesn't already know that cell growth impacts culture performance?  Or that cell metabolism impacts culture performance?

Yet we are still publishing papers on how final lactate is an indicator of product titer and concluding that cell metabolism impacts culture performance.

Final ammonium or final lactate are symptoms of cell culture metabolic conditions that produce higher titers.

Unless you can:
  • Change media components
  • Change a parameter setpoint (pH, temp, dO2)
  • Change the timing of culture operations (temp shift, pH shift, timing of feeds...)
Essentially recommend specific changes the Production group can execute to improve culture conditions and you've simply uncovered a spurious relationship; there remains no action you can take to improve culture performance.

This is why it is best to start your multivariate analysis by picking actionable parameters to ensure that you have true factors.

When you pick actionable parameters to model as factors in your multivariate analysis, you have a shot at gaining control of an out-of-control campaign and meeting your Adherence-to-Plan, as Rob Johnson did.

If you're happy pontificating from ivory towers, keep making true-but-useless statements on how every time Y1 happens that Y2 also happens.


Monday, January 7, 2013

Moneyball for Manufacturing

I'm quite behind the times when it comes to watching movies. The last movie I saw was The Dark Knight Rises...

at a matinee...

so I don't get shot.

A few nights ago, I finally sat down and watched Moneyball, the movie with Brad Pitt and six Oscar nods. It is a "based on a true story" of how the perennially under-budgeted Oakland A's baseball club builds a near-championship team only to lose not only playoff games, but also their best players to big-money baseball clubs when the players' contract expire.

The Oakland A's general manager, Billy Beane, realizes his underfunded system will continue to produce good-enough results that will never win the championship. And to continue running his system the same way is insanity:
Doing the same thing over and over again and expecting different results. - Albert Einstein
To win, Beane decides to do something different, and that something different is focusing on the key performance indicators (KPIs) of winning and getting players that contribute positively to those KPIs... applying statistics and math to baseball is what they call, "Moneyball."

How many of us are in the same boat as this Oakland A's GM?
  • How many of us are getting by with under-funded budgets?
  • How many of us are managing our systems the same way they've been managed for years?
  • How many of us can improve our systems by applying data-driven statistics?
Moneyball is to baseball what Manufacturing Sciences is to manufacturing:
Biotech and pharma manufacturing is in a period of static or diminishing budgets. Do more with the same or make do with less is the general mantra as the dollars go towards R&D or to acquisitions. To make matters worse, biosimilars are coming on-line to drive revenues even farther down.

Questions I'm getting these days are:

What systems do I need to collect the right data?

What KPIs should I be monitoring?

What routine and non-routine analysis capabilities should I have?

Let's Play

p.s. - Watch the movie if you haven't seen it.  It's as good a movie as it is a good business case study.

Thursday, December 20, 2012

Gun Violence Is Not a Univariate Problem

The public seems to have a hard time debating multivariate problems.

I remember the Ford Explorer/Firestone Tires issue years back very distinctly.  Was driving a Ford Explorer the cause of the SUV flipping over?  Those who say Ford was culpable pointed to the fact that few other SUVs were flipping over.  Ford pointed out that there were Explorers that weren't flipping over... just the ones with Firestone Tires.

Firestone was saying that there were plenty of cars driving around on Firestone tires without issue and it was Ford's fault that their SUV sucked.

This debate went on and on.  What my boss when I worked at Genentech Vacaville, Jesse Bergevin, said to me at the time was that this was a classic multivariate problem with one interaction.

Likewise, gun violence in America is a classical multivariate problem: there are not one, not two, but many variables that contribute to these horrific events.  And like most complex systems, gun violence is many variables coming together (interacting) for a specific effect.

  • When it comes to gun violence, we know that guns are a factor... as in, were it not for guns, we wouldn't have gun violence. (Yes, we'd have some sort of other violence).
  • We also know that mental illness is a factor.  After all, not all gun owners are going around shooting up malls and elementary schools.
  • We also know gun-free zones are favorite targets for gunmen with bad intentions
We know of these factors.  And we know that they interact.  To treat this issue as a univariate problem will change the response.

The right thing to do is to model the system and optimize for least number of gun-related deaths.

In the meantime, I will be thinking often of the children who died at Sandy Hook Elementary.  When I think of them, there's this vacuous hole that fills my stomach and my skin feels numb.

We must solve the problem of violence in our society; but we can't afford to do it wrong and treat it as a univariate problem (i.e. ban guns and be done with it).

Tuesday, April 10, 2012

SPC - Univariate and Bivariate Analysis

The next tools in this SPC pocketbook are Histogram and Correlation.

In modern terms, these are called Univariate and Bivariate Analysis.

Histogram - aka Univariate Analysis

A histogram is one aspect of univariate analysis. According to the pocket book, the histogram is:
  1. A picture of the distribution: How scattered are the data?
  2. What the pattern of the data are (evenly-spread? Normal distribution?)
  3. Can be used to compare the distribution to the specification

With modern computers, it is easy to create histograms with just a few clicks on your computer (with the $1,800 software JMP). In JMP, go to Analyze > Distribution.

You're going to get a dialog where you get to choose which columns you want to make into histograms. Select the columns and hit Y, Columns. Then click OK.

And voila, you get your histograms (plotted vertically by default) and more metrics than Ron Paul gets media coverage.

You get metrics like mean, standard deviation, standard error. And most importantly, you get visuals on how the data is spread.

Correlation - aka Bivariate Analysis

A correlation is also one specific type of bivariate analysis; the type where you plot numerical values against each other. Other types of bivariate analysis include means-comparisons and ANOVA. But yes, for SPC, the correlation is the most popular.

The pocketbook says that the correlation illustrates the relationship if it exists. From where I sit, the correlation feature is one of the most used functions in applying SPC to large-scale cell culture. Here's why:

While cell culture is complex, a lot of manufacturing phenomenon is simple. Mass-balance across a system is a linear process. Media batching is a linear process. The logarithm of cell density against time is a linear process. Many things can be explored by plotting Y vs. X and seeing if there's a correlation.

To get correlations with JMP, go to Analyze > Fit Y by X on the menu bar

You're going to get a dialog where you can specify which columns to plot on the y-axis (click Y, Columns). Then you get to specify which columns to plot on the x-axis (click X, Factor).

When you click OK, you're going to get your result. If it turns out that your Y has nothing to do with X, you're going to get something like this: a scatter of points where the mean and the correlation basically are on top of each other.

If you get a response that does vary with the factor, you're going to get something like this:

SPC in the information age is effortless. There really is no excuse to not have data-driven decisions that yield high-impact results.

Monday, April 9, 2012

SPC - Cause/Effect Diagrams and Run Charts

The next two tools were used constantly for large-scale manufacturing sciences support of cell culture: Cause Effect Diagram and the Run Chart.

Cause/Effect (Ishikawa) Diagram

The cause/effect diagram (aka) Ishikawa diagram is essentially taxonomy for failure modes. You break down failures (effects) into 4 categories:

  1. Man
  2. Machine
  3. Method
  4. Materials

It's used as a brainstorming tool to put it all out there and to help visualize how an event can cause the effect. This was particularly helpful contamination investigations. In fact, there's a "politically correct" Ishikawa diagram in my FREE case study on large-scale bioreactor contamination.

Get Contamination Cause/Effect Diagram

The cause/effect diagram helps clarify thinking and keeps the team on-task.

Run Chart

The Run Chart is basically what a chart-recorder spits out. In this day and age, it's what we call OSIsoft PI. You plot a parameter against time (called a trend), and when you do this, you get to see what's happening in sequential order. When you plot a lot of parameters on top of one another, you begin to understand sequence. Things that happen later cannot cause events that happened earlier. Say your online dissolved oxygen readings spiked below 5% for 10 seconds, yet your pO2 remains steady and the following viability measurement shows no drop off in cell viability, you can basically say that the dO2 spike was measurement error.

Here's an example of the modern-day run chart, it's called, "PI":

Run charts (i.e. PI) are crucial for solving immediate problems. A drifting pH probe can dump excess CO2 into a media-batched fermentor. Being able to see real-time data from your instruments and have the experience to figure out what is going on is key to troubleshooting large-scale cell culture and fixing the problem real-time so that the defect is not sent downstream.

Get #1 Biotech/Pharma PI Systems Integrator

As you can see, SPC concepts are timelessly applied today to cell culture and fermentation... albeit with new technology.

Friday, April 6, 2012

SPC - Process Flow Diagram/Pareto Charts

So that little SPC Book goes into 7-tools to use, the next page goes into Process Flow Diagrams and Pareto charts.

Process Flow Diagram

The first tool appears to be the Process Flow Diagram[tm], where one is supposed to draw out the inputs and outputs of each process step. I suppose in the "Lean" world, this is the equivalent of value-stream mapping.

The text of the booklet calls it a

Pictoral display of the movement through a process. It simply shows the various process stages in sequential order.

Normally, I see this on a Powerpoint slide somewhere. And frankly, I've rarely seen it used in practice. More often, if we show this to consultants to get them up to speed.

Pareto Chart

The pareto chart is essentially a pie chart in bar-format. The key difference is that pie charts are for the USA Today readership while pareto charts are for real engineers -- this is to say that if you're putting pie charts in Powerpoint and you're an engineer, you're doing it wrong.

Pareto charts are super useful because they help figure out your most pressing issue. For example, say you're create a table of your fermentation failures:

So you have counted the number of observed failures alongside a weight of how devastating the failure is. Well, in JMP, you can simply create a pareto chart:

and out pops a pareto chart.

What this pareto chart shows you is the most important things to focus your efforts on. If you solve the top 2 items on this pareto chart, you will have solved 80% of your problems - on a weighted scale.

The pareto is a great tool for metering out extremely limited resources and has been proven extremely effective in commercial cell culture/fermentation applications.

Thursday, March 29, 2012

Multiple Linear Regression (Multivariate Analysis)

Here's your process:

generic blackbox process
It's a black box. All you know is that you have multiple process inputs (X) and at least one process output (Y) that you care about. Multivariate analysis is the method by which you analyze how Y varies with your multiple inputs (x1, x2,... xn). There a lot of ways to go about figuring out how Y relates.

One way to go is to turn that black box into a transparent box where you try to understand the fundamentals from first principles. Say you identify x1 as cell growth and you believe that your cells grow exponentially, you can try to apply an equation like Y = Y0eµx1.

But this is large-scale manufacturing. You don't have time for that. You have to supply management with an immediate solution followed by a medium-term solution. What you can do is assume that each parameter varies with Y linearly.

y mx b
Just like we learned in 8th grade. How can we just say that Y relates to X linearly? Well, for one, I can say whatever I want (it's a free country). Secondly, all curves (exponential, polynomial, logarithmic, asymptotic...) are linear over small ranges... you know, like the proven acceptable range in which you ought to be controlling your manufacturing process.

Assuming everything is linear keeps things simple and happens to be rooted in manufacturing reality. What next?

y m1x1 m2x2 b
Next you start adding more inputs to your equation... applying a different coefficient for each new input. And if you think that a few of your inputs may interact, you can add their interactions like this:

mlr with interactions
You achieve interactions by multiplying the inputs and giving that product its own coefficient. So now you - the big nerd - have this humongous equation that you need solving. You don't know:
  • Which inputs (x's) to put in the equation
  • What interactions (x1 * x2) to put in the equation
  • What coefficients to put in the keep (m's)

What you're doing with multiple linear regression is picking the right inputs, interactions and so that the data you have fits that your statistical software package and brute-force the coefficients (m's) to fit an equation that gives you the least error.

Here's the thing: The fewer rows you have in your data table, the fewer inputs you get to throw into your equation. If you have 10 samples, but 92 inputs, you're going to have to be very selective with what you try in your model.

It's a tough job, but someone's got to do it. And when you finally do (i.e. explain the relationship between, say, cell culture titer and your cell culture process inputs), millions of dollars can literally roll into your company's coffers.

Your alternative is to hire Zymergi and skip that learning curve.

More reading:

Tuesday, March 20, 2012

Manufacturing Sciences - Local Lab

The other wing of the Manufacturing Sciences group was a lab group.

Manufacturing Sciences Lab
Basically, you enter the virtuous cycle thusly:
  1. Design an experiment
  2. Execute the experiment
  3. Analyze the data for clues
  4. Go to Step 1.

You're thinking, "Gosh, that looks a lot like Process Sciences (aka Process R&D)." And you'd be right. That's exactly what they do; they run experiments at small scale to figure out something about the process.

Territorial disputes are common when it comes to local Manufacturing Sciences groups having local labs. From the Process Science's perspective, you have these other groups that may be duplicating work, operating outside of your system, basically doing things out of your control. From the Manufacturing Science's perspective, you need a local resource that works on the timetable commercial campaigns to address very specific and targeted issues. People who can sit at a table to update the local plant on findings.

If your cashflow can support it, I recommend developing a local lab and here's why:

The lab counterpart of the Manufacturing Sciences group ran an experiment that definitively proved a physical bioreactor part was the true root cause of poor cell growth... this poor cell growth had delayed licensing of the 400+ million dollar plant by 10 months. The hypothesis was unpopular with the Process Science department at corporate HQ and there was much resistance to testing it. In the end, it was the local lab group that ended the political wrangling and provided the data to put the plant back on the tracks towards FDA licensure.

I do have to say that not everything is adversarial. We received quite a bit of help from Process Sciences when starting up the plant and a lot of our folks hailed from Process Sciences (after all, where do you think we got the know-how?). When new products came to our plant, we liaised with Process Science folk.

My point is: in more cases than not, a local manufacturing sciences group with laboratory capability is crucial to the process support mission.

Monday, March 19, 2012

Manufacturing Sciences - Local Data

My second job out of college was to be the fermentation engineer at what was then the largest cell culture plant (by volume) in the United States. As it turns out, being "large" isn't the point; but this was 1999 and we didn't know that yet, we were trying to be the lowest per gram cost of bulk product; but I digress.

I was hired into a group called Manufacturing Sciences, which reported into the local technology department that reported to the plant manager. My job was to observe the large-scale cell culture process and analyze the data.

Our paramount concern was quantifying process variability and trying to reduce it. The reason, of course, is to make the process stable so that manufacturing is predictable. Should special cause variability show up, the job was to look for clues to improve volumetric productivity.

The circle of life (with respect to data) looks like this:

data flow mfg support

Data and observations come from the large-scale process. We applied statistical process control (SPC) and statistical analysis like control charts and ANOVA. And from our analysis, we are able to implement within-license changes to make the process more predictable. And should the special cause signals arise, we stood ready with more statistical analysis/methods to increase volumetric productivity.

Get Contract Plant Support!