ShaoXueguang, Yu Zhengliang
(Department of Chemistry, University of Science and Technology of China, Hefei, Anhui, 230026, China)
Abstract Based on the wavelet compression and immune algorithm (IA), a novel algorithm for fast resolution of two-dimensional multicomponent overlapping chromatogram is proposed. Due to the characteristic of the linear property of the wavelet transform (WT), the overlapping chromatogram (antigen) can be compressed by WT before it is input into the immune network, the standard chromatogram of each component (antibodies) is also compressed to the same scheme. Especially, for speeding up the computation, the two-dimensional data matrix is arranged into one-dimensional vector form before it is compressed. After the compressed information of each component was extracted by IA, the chromatogram can be reconstructed by the inverse WT algorithm and re-arranged back into matrix form. It was proven that the result is almost the same with the result from IA, but the calculation speed is much faster. At the same time, satisfactory quantitative result can also be obtained.
-
INTRODUCTION
Along with the development of modern chemical instrumentation, multicomponent two-dimensional data matrices can be easily obtained. For the aim of resolving the multicomponent overlapping matrices, several methods, such as chemical factor analysis (CFA),[1,2] wavelet transform (WT) [3-5] and immune algorithm (IA) [6-8] have been proposed. In our previous works,[6-10] it has been proven that the IA is an efficient tool for the resolution of overlapping multicomponent analytical signals. Multicomponent overlapping chromatogram can be easily resolved by an IA and the calculation speed is faster than conventional least-square method.[7] However, when the size of data set is large and there are parameters need to be optimized, the consumed time of the computation is too long to be feasible in practical uses, e.g., in GA-IA[10] method, because the IA procedure is invoked repeatedly. One useful way to speed up the calculation is to compress the raw experimental data. There have been many efficient tools for analytical data compression, such as binary coding method, Adams and Black algorithm, Fourier transform, chemical factor analysis, and wavelet transform. [11-14]
In this paper, both the matrices of the standard chromatogram (antibodies) and the multicomponent overlapping chromatogram (antigen) are converted into one-dimensional vector form and compressed by wavelet transform at first, then perform the resolution by using an IA. It was found that the calculation speed can be improved by the conversion and compression. It can provide a fast preprocessing tool to the GA-IA method. -
ALGORITHM
The principle and application of IA has been reported in our previous works.[6-10] The essence of an IA is that, taking the signal of the multicomponent mixture as antigen and the signals of the standard samples as antibodies, the information of each component in multicomponent overlapping signal can be extracted by a process of recognition, iterative elimination etc. The calculation process of an IA can be simply described by the following formulae:
(1)
(2)
(3)
where T is the weight of input layer, k is the number of iteration, V is the overlapping chromatographic signal (antigen), V0i is the standard chromatographic signal of the ith component with known concentration (antibodies), ci is the relative concentration of the ith component, and VF is the feedback vector or matrix denoting the eliminated antigen. It can be seen that when dc(k) approaches to zero, VF will be the information of each component in the overlapping chromatographic signal. In many cases, due to the variation caused by the experimental reproducibility etc., parameters of V0i, such as the position and the shape of the peaks may need to be optimized. The optimization will be a time-consumed procedure when the data number of V and V0i is large. Therefore, an efficient way to compress the V and theV0i will be necessary for speeding up the algorithm.
The wavelet transform has been proven to be an efficient technique for analytical data compression.[15,16] The dual localization characteristic in both frequency and time domains, the linearity, and the existence of fast algorithm make the WT an ideal candidate for preprocessing data for the IA. In this paper, the multi-resolution signal decomposition (MRSD) algorithm[17,18] is used.
Based on the algorithm of the IA and WT compression, a fast algorithm for resolution of multicomponent overlapping chromatogram is proposed. The flowchart can be described in Figure 1, including the following steps:
Fig.1 The flowchart of the proposed algorithm
(1) Input the overlapping 2-D chromatographic data matrix as antigen.
(2) Estimate the possible components from the original chromatogram.
(3) Input the standard 2-D chromatographic data matrices as antibodies.
(4) Arrange all 2-D data matrices of the antigen and antibodies into one-dimensional vectors along with the chromatographic orientation.
(5) Apply the wavelet compression to the antigen vector, i.e., perform WT on the antigen vector, then suppress those coefficients whose value is less than a threshold. The value of the threshold is determined by a predefined compression ratio that is determined by trial and inspection of the reconstruction error. The remained coefficients will be taken as the antigen for further calculation.
(6) Compress the antibody vectors by performing WT with the same parameters as in the last step and remaining those coefficients at the same position with the compressed antigen.
(7) Extract the compressed information of each component from the compressed antigen by the IA mentioned above, where the compressed antigen is taken as V and the compressed antibodies are taken as V0i. In this step, the compressed information of each component can be extracted from the compressed antigen.
(8) Reconstruct the extracted information of each component by the inverse WT algorithm to obtain a full extracted chromatographic data in vector form.
(9) Finally, re-arrange the extracted data vector back into matrix form.
After all above calculation, the chromatograms of each component can be resolved from the multicomponent overlapping data matrix. From the theory of the IA, ci is the concentration of each component relative to the concentration of the standard sample, i.e., if the concentration of the standard sample is known, then the concentration of each component in the mixture can be calculated by the parameter ci. Therefore, this method can give us both the resolution and the quantitative determination simultaneously.
-
EXPERIMENTAL
The experimental data sets were measured on an HPLC system comprising a Spectrasystem FL2000 (Spectra-Physics, USA) with the spectra Focus multi-wavelength UV detector (Spectra-Physics) and a Spectrasystem workstation. The column was packed with 10mm ODS silica (250 mm×5 mm, Shimadzu). The mobile phase was 0.25mol/L (pH~3.5) lactic acid (A.R.) with 0.01mol/L dodecyl sulfonic acid sodium. The color developing reagent of post column was 1.0×10-4mol/L arsenazoⅢ (Fluka Chemie AG). The flow rate of the mobile phase was 1.0mL/min. The flow rate of the color developing reagent was 1.0mL/min. The temperature of the column was 20°C. The detection wavelength was from 580nm to 720nm. The interval of sampling time is 0.005min.
Table 1 Composition of the samples (unit: mg/ml)
Sample No. | Er | Tm | Yb |
1 | 0.2000 | 0.1999 | 0.2001 |
2 | 0 | 0.1499 | 0.2001 |
3 | 0 | 0.1999 | 0.2001 |
Table 1 shows the composition of the three samples, which are mixed by Yb, Tm and Er, respectively. Figure 2 and Figure 3 (a) (b) (c) show the two-dimensional chromatogram of the mixture sample 1 (antigen) and the standard chromatograms of Yb, Tm and Er (antibodies) obtained by the experiment.
Fig.2 The experimental multicomponent overlapping chromatogram (antigen) of the sample No.1
Fig.3 The standard chromatograms of single component (antibodies)
(a) Er (b) Tm (c) Yb
-
RESULTS AND DISCUSSION
4.1 Selection of the wavelet basis and the decomposition level
Figure 4 shows the coefficients obtained by the wavelet transform of the chromatogram (arrange in vector form) in Figure 2 at level 7 using the Symmlet5 (L=10) wavelet basis. It can be found that the information of the chromatographic signal is mainly concentrated on only a few of the coefficients. Removing the smaller coefficient will not affect the total information. In order to obtain the optimal wavelet basis and decomposition level, reconstruction error obtained by Haar, Daubechies (L=4-20), Coiflets (L=4-20), Symmlets (L=6-30) at different decomposition level, where L is the length of filter, was investigated with the data sets in Figures 2 and 3. Table 2 summarized some of the results when the compression ratio is 1/58. The reconstruction error is calculated by
(4)
where, X is the original signal, XR is the reconstructed signal from remained coefficients, n is the size of the original data set.
Fig.4 The wavelet coefficients obtained by WT of the chromatogram in Figure 2
From Table 2, it can be found that the variation of reconstructed error is almost the same for all the four data sets. For every wavelet basis the minimal reconstruction errors generally appear at decomposition level 6 or 7. Comparing the reconstructed errors between different wavelet basis, it can be found that Symmlet 4, 5, 6 give smaller results. Therefore, Symmlet5 at decomposition level 7 is adopted in the following studies.
Table 2 Reconstruction errors by different wavelet basis at different decomposition level
Wavelet basis | Data | Decomposition level | ||||
4 | 5 | 6 | 7 | 8 | ||
Haar | Mix. | 309.601 | 194.953 | 177.447 | 171.839 | 168.520 |
Er | 94.252 | 78.993 | 74.375 | 73.626 | 74.169 | |
Tm | 69.741 | 58.817 | 55.637 | 55.588 | 56.266 | |
Yb | 69.070 | 58.046 | 53.986 | 52.874 | 51.045 | |
Db4 | Mix. | 276.538 | 114.605 | 92.823 | 88.633 | 95.276 |
Er | 68.043 | 43.651 | 37.795 | 37.023 | 40.095 | |
Tm | 51.618 | 34.362 | 31.237 | 32.520 | 33.644 | |
Yb | 53.556 | 37.539 | 32.522 | 30.989 | 32.992 | |
Db8 | Mix. | 277.929 | 109.555 | 89.389 | 88.817 | 99.135 |
Er | 68.060 | 42.529 | 38.665 | 40.783 | 46.580 | |
Tm | 51.735 | 34.138 | 32.182 | 35.101 | 38.922 | |
Yb | 53.529 | 37.544 | 33.170 | 33.769 | 37.752 | |
Sym3 | Mix. | 276.373 | 120.943 | 98.082 | 92.855 | 94.883 |
Er | 68.186 | 44.532 | 37.727 | 38.435 | 39.737 | |
Tm | 51.816 | 35.000 | 31.922 | 32.380 | 34.307 | |
Yb | 53.856 | 38.130 | 33.074 | 31.258 | 32.033 | |
Sym4 | Mix. | 275.467 | 115.130 | 92.764 | 85.817 | 87.348 |
Er | 67.950 | 43.506 | 37.304 | 37.294 | 40.324 | |
Tm | 51.568 | 34.594 | 30.649 | 31.270 | 32.425 | |
Yb | 53.539 | 37.624 | 32.082 | 30.208 | 30.807 | |
Sym5 | Mix. | 279.138 | 112.817 | 89.699 | 84.277 | 84.930 |
Er | 68.870 | 44.143 | 38.136 | 38.141 | 39.722 | |
Tm | 52.365 | 35.039 | 31.369 | 31.726 | 31.908 | |
Yb | 54.374 | 38.296 | 32.159 | 29.435 | 29.962 | |
Sym6 | Mix. | 275.515 | 110.207 | 87.794 | 82.778 | 86.383 |
Er | 67.938 | 42.396 | 36.546 | 37.034 | 39.834 | |
Tm | 51.502 | 33.856 | 30.895 | 31.911 | 33.564 | |
Yb | 53.468 | 37.125 | 32.402 | 30.549 | 31.978 | |
Sym7 | Mix. | 279.568 | 110.628 | 87.745 | 84.261 | 87.822 |
Er | 68.977 | 43.888 | 37.681 | 38.089 | 40.615 | |
Tm | 52.404 | 34.883 | 31.168 | 31.642 | 34.493 | |
Yb | 54.392 | 38.508 | 32.354 | 30.697 | 31.047 | |
Coif2 | Mix. | 276.228 | 113.440 | 90.970 | 84.528 | 83.760 |
Er | 68.185 | 43.589 | 37.174 | 37.834 | 39.003 | |
Tm | 51.813 | 34.415 | 30.914 | 31.251 | 31.713 | |
Yb | 53.780 | 37.451 | 32.044 | 29.921 | 30.360 | |
Coif4 | Mix. | 277.145 | 107.641 | 86.738 | 83.806 | 87.135 |
Er | 68.182 | 42.591 | 37.072 | 37.435 | 41.578 | |
Tm | 51.813 | 33.722 | 31.162 | 33.030 | 35.006 | |
Yb | 53.752 | 36.947 | 32.133 | 31.387 | 33.073 |
Fig.5 The remained wavelet coefficients after compression and the extracted results (wavelet basis: Symmlet 5, decomposition level: 7)
4.2 Resolved result by the proposed algorithm
In order to resolve the overlapping chromatogram by the proposed algorithm, both the antigen (the multicomponent overlapping chromatogram) and the antibodies (the standard chromatograms of each component) were compressed with Symmlet5 wavelet basis at decomposition level 7. The number of data point is reduced from 52200 to 930. The solid line in Figure 5 shows the compressed result of the overlapping chromatogram in Figure 2. The dot lines show the resolved result by the IA. In order to see clearly, three different regions are enlarged in Figure 6, in which (a) (b) (c) are corresponding to the data points in the range of 140~190, 550~600, 660~690 respectively. It can be seen that the IA can give a very good resolution of the compressed wavelet coefficients.
Fig.6 The enlargement of Figure 5
(a) 140~190 data point (b) 550~600 data point (c) 660~690 data point
Fig.7 The reconstructed 2-D chromatograms
(a) Er (b) Tm (c) Yb
The reconstructed chromatograms from the resolved coefficients in Figure 5 are shown in Figure 7. (a) (b) (c) are corresponding to the reconstructed chromatogram of each component respectively. In comparison with the standard chromatogram of each component in Figure 3, it can be seen that the overlapping chromatogram is well resolved and the chromatogram of each component can be well obtained. The residual is shown in Figure 8, the intensity of the residual is very small compared with that of the overlapping chromatograms or the reconstructed chromatograms, which indicates that almost all the information contained in the overlapping chromatogram was extracted. The little error is mainly caused by the irreproducibility of the experiment. Read more about us.
Fig.8 The reconstructed residual information
4.3 Comparison of the proposed algorithm with immune algorithm and WT-IA
In our previous works, a WT-IA useing a two-dimensional wavelet compression algorithm was proposed for the sake of improving the calculation speed. In order to investigate the efficiency of the proposed algorithm, the consumed time and the residual after resolution are compared with those of the IA and WT-IA, where the value of the residual is the summation of every data point of the residual matrix as in Figure 8. The results are listed in Table 3. It can be seen clearly that the speed of the proposed algorithm is 2.48 times faster than that of the IA, 2.16 times faster than that of WT-IA. The residual is also smaller than that of IA and WT-IA.
Table 3 Comparison of conventional IA, WT-IA and the proposed algorithm*
No. Run | Consumed time (s) | Residual (×102) | ||||
Conv. IA | WT-IA | Proposed Algorithm | Conv. IA | WT-IA | Proposed Algorithm | |
1 | 18.56 | 16.37 | 7.47 | 2.1011 | 2.2326 | 2.0946 |
2 | 18.84 | 16.21 | 7.52 | |||
3 | 18.84 | 16.36 | 7.58 | |||
4 | 18.89 | 16.37 | 7.63 | |||
5 | 18.51 | 16.37 | 7.58 | |||
Aver. | 18.72 | 16.34 | 7.56 |
* Program runs on Pentium(r)/233MHz/Memory 64M.
Table 4 Quantitative results by the proposed algorithm
Sample | Added Conc. (mg/ml) |
Calculated conc. (mg/ml) |
Recovery (%) |
|
1 | Er | 0.2000 | 0.1977 | 98.85 |
Tm | 0.1999 | 0.1914 | 95.75 | |
Yb | 0.2001 | 0.2070 | 103.45 | |
2 | Er | 0 | 0 | 0 |
Tm | 0.1499 | 0.1498 | 99.93 | |
Yb | 0.2001 | 0.2046 | 102.25 | |
3 | Er | 0 | 0 | 0 |
Tm | 0.1999 | 0.1966 | 98.35 | |
Yb | 0.2001 | 0.2041 | 102.00 |
4.4 Quantitative determination using the proposed algorithm
In order to investigate the ability of the proposed algorithm for the quantitative determination, the three samples listed in Table 2 were analyzed and the results were listed in Table 4. It can be seen that all the recoveries are between 100± 5% with the minimum being 95.72% and maximum being 103.44%. The results are satisfactory.
-
CONCLUSION
Based on the wavelet compression and immune algorithm, a fast algorithm for resolution of 2-D multicomponent overlapping chromatogram is proposed. By application of the method in resolution and quantitative determination of multicomponent 2-D overlapping chromatograms, it has been proven that this method is fast in calculation speed and accurate in quantitative calculation. Therefore, the proposed algorithm may be an alternative effective method for resolution of multicomponent 2-D overlapping chromatogram.