The following notes describe the installation and use of kvqc2 version 1.0.1 Sources: https://svn.met.no/viewvc/kvoss/kvQc2/branches/kvqc2-1.0.1/
Algorithms in this release:
AlgoCode | Description | Status |
---|---|---|
9 | SingleMinMaxAverage | kvqc2_1.0.1 For a given paramid and corresponding Max paramid and Min paramid calculates a correction for a single missing value as average of max and min. If no max or min available or specified then reverts to simple linear. |
10 | SingleLinear | kvqc2_1.0.1 Replaces a single missing value with a linear interpolated value. If specified checks that the correction lies within the available max … min range, if not sets to nearest of max or min. Run for any paramid and optional max, min specified in the configuration file. |
Step 1 | Installation |
---|
For example:
$ cd /etc/kvalobs/Qc2Config $ mv XXX.cfg XXX.hold $ mv ProcessUnit.cfg ProcessUnit.hold
( PS NOTE: actually now all the earliy config files have been moved to /etc/kvalobs/Qc2Config/old )
$ sudo apt-get install kvqc2
An alternative is to install the new kvqc2 directly from where the debian package was built on dev-vm101:
i.e. using “sudo dpkg -i kvqc2_1.0.1-1_i386.deb” as illustrated below.
paule@dev-vm101:~/kvqc2-1.0.1$ ls kvqc2-1.0.1 kvqc2_1.0.1-1_i386.changes kvqc2_1.0.1.orig.tar.gz kvqc2_1.0.1-1.diff.gz kvqc2_1.0.1-1_i386.deb kvqc2-1.0.1.tar.gz kvqc2_1.0.1-1.dsc kvqc2_1.0.1-1_i386.upload paule@dev-vm101:~/kvqc2-1.0.1$ sudo dpkg -i kvqc2_1.0.1-1_i386.deb (Reading database ... 52403 files and directories currently installed.) Preparing to replace kvqc2 1.0.0-1 (using kvqc2_1.0.1-1_i386.deb) ... Unpacking replacement kvqc2 ... Setting up kvqc2 (1.0.1-1) ... paule@dev-vm101:~/kvqc2-1.0.1$
Bug Fix and Confirmation of apt-get install
Initial tests illustrated that some data was written back to thedatabase at too high a precision. This is fixed in revision 1782: https://svn.met.no/viewvc/kvoss?view=rev&revision=1782 and led to a kvqc2-1.0.2 revision. This also provided the opportunity to check that the install from apt-get install works, i.e.:
paule@dev-vm101:~$ sudo apt-get install kvqc2 Reading package lists... Done Building dependency tree... Done The following packages will be upgraded: kvqc2 1 upgraded, 0 newly installed, 0 to remove and 78 not upgraded. Need to get 288kB of archives. After unpacking 0B of additional disk space will be used. Get:1 http://repo.met.no etch/main kvqc2 1.0.2-1 [288kB] Fetched 288kB in 0s (12.7MB/s) (Reading database ... 52403 files and directories currently installed.) Preparing to replace kvqc2 1.0.1-1 (using .../kvqc2_1.0.2-1_i386.deb) ... Unpacking replacement kvqc2 ... Setting up kvqc2 (1.0.2-1) ... paule@dev-vm101:~$
Step 2 | Run with “kvstart” |
---|
kvalobs@dev-vm101:~$ kvstart KVBIN=/usr/bin KVPID=/var/run/kvalobs TIMEOUT=60 Starter kvalobs dette kan ta noe tid! Hvis det ikke skjer noe på MER enn 60 sekund bruk CTRL-C for å avbryte! Starter kvQabased ....running Starter kvManagerd ....running Starter kvDataInputd ....running Starter kvServiced ....running Starter kvAgregated ....running Starter kvsynopd ....running Starter norcom2kv ....running Starter kvqc2 ....Ok!
Step 3 | Run “kvqc2” from command line |
---|
kvalobs@dev-vm101:~$ kvqc2 # NB the s/w is installed under /usr/bin
Note the following features:
All of the above can be seen below!
kvalobs@dev-vm101:~$ which kvqc2 /usr/bin/kvqc2 kvalobs@dev-vm101:~$ kvqc2 INFO: -- Reading configuration from file </etc/kvalobs/kvalobs.conf>! INFO: -- Configuration file loaded! INFO: -- Using 'database.dbconnect' from configuration file FROM CONFIGURATION FILE: pgdriver.so Logging to file </var/log/kvalobs/Qc2.log>! INFO: (Qc2 ...) -- kvqc2-1.0.1: starting .... INFO: (Qc2 ...) -- Using <kvtest-dev-vm101/> as path in CORBA nameserver INFO: (Qc2 ...) -- Using CORBA nameserver at: corbans.met.no INFO: (Qc2 ...) -- Loading driver for database engine </usr/lib/kvalobs/db/pgdriver.so>! INFO: (Qc2 ...) -- Driver <PostgreSQL> loaded! INFO: (Qc2 ...) -- Writing pid to file </var/run/kvalobs/kvqc2-dev-vm101.pid>! INFO: -- Qc2Work: starting work thread! INFO: -- New database connection (PostgreSQL) created! DEBUG: -- Created a new connection to the database! INFO: -- %%%%%%%%%%%%%%%%%%%%%%%% Scanning For Files Scanning For Files Scanning For Files
Step 3 | Check configuration files |
---|
# As user kvalobs kvalobs@dev-vm101:~$ touch /etc/kvalobs/Qc2Config/BlankFile.cfg
Scanning For Files Scanning For Files Configuration File Found: /etc/kvalobs/Qc2Config/BlankFile.cfg 2010-06-22 20:10:17: 2010-06-22 20:00:00 -> 2010-06-22 20:00:00 /etc/kvalobs/Qc2Config/BlankFile.cfg
kvalobs@dev-vm101:~$ rm /etc/kvalobs/Qc2Config/BlankFile.cfg
Step 4 | kvstart |
---|
Step 5 | Install the default configuration files |
---|
# As user kvalobs $ cd /etc/kvalobs/Qc2Config $ wget https://svn.met.no/kvoss/kvQc2/branches/kvqc2-1.0.1/src/Reference/DailyProcessUnitMissing.cfg --no-check-certificate $ wget https://svn.met.no/kvoss/kvQc2/branches/kvqc2-1.0.1/src/Reference/MissingLinear.hold --no-check-certificate $ wget https://svn.met.no/kvoss/kvQc2/branches/kvqc2-1.0.1/src/Reference/SingleMaxMinAverage.hold --no-check-certificate $ wget https://svn.met.no/kvoss/kvQc2/branches/kvqc2-1.0.1/src/Reference/TEST3036.hold --no-check-certificate
Step 6 | Deploy DailyProcessUnitMissing.cfg |
---|
In operations this is an example of the default configuration file. At a specified time each day it will set an algorithm to process the last N days of data.
Default setting is last 3 days.
$ tail -f /var/log/kvalobs/Qc2.log
... RunAtHour=21 RunAtMinute=34 ...
NB Times in UTC!!!
kvalobs@dev-vm101:~$ tail -f /var/log/kvalobs/Qc2.log ... 20100622221439: INFO --------------- %%%%%%%%%%%%%%%%%%%%%%%% 20100622235953: INFO --------------- Case 10: Single Linear 20100622235953: INFO --------------- Single Linear
Then … nothing else happened. Note strict filters are being applied to the data. Now rerun but remove the A_fqclevel=0 setting. i.e. in DailyProcessUnitMissing.cfg:
#A_fqclevel=0
--------------- Case 10: Single Linear 20100623000814: INFO --------------- Single Linear 20100623000815: INFO --------------- ProcessUnitT Writing Data 1.9 99754 2010-6-22 20:0:0 20100623000815: INFO --------------- ProcessUnitT Writing Data 13.1 17150 2010-6-22 20:0:0 20100623000816: INFO --------------- ProcessUnitT Writing Data 2.2 99754 2010-6-22 18:0:0 20100623000817: INFO --------------- ProcessUnitT Writing Data 15.2 17150 2010-6-22 17:0:0 20100623000818: INFO --------------- ProcessUnitT Writing Data 3.5 99754 2010-6-22 13:0:0 20100623000819: INFO --------------- ProcessUnitT Writing Data 15.5 17150 2010-6-22 12:0:0 20100623000819: INFO --------------- ProcessUnitT Writing Data 15.2 17150 2010-6-22 10:0:0 20100623000820: INFO --------------- ProcessUnitT Writing Data 5.5 76905 2010-6-22 7:0:0 20100623000825: INFO --------------- ProcessUnitT Writing Data 22.6 77007 2010-6-21 13:0:0 20100623000825: INFO --------------- ProcessUnitT Writing Data 8.4 76930 2010-6-21 13:0:0 20100623000827: INFO --------------- ProcessUnitT Writing Data 7.8 76928 2010-6-21 7:0:0 20100623000831: INFO --------------- ProcessUnitT Writing Data 7.2 76928 2010-6-20 23:0:0
kvalobs@dev-vm101:~$ psql kvalobs Welcome to psql 8.3.3, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit kvalobs=# select * from data where obstime='2010-06-21 13:00:00' and stationid=76930 and paramid=211; stationid | obstime | original | paramid | tbtime | typeid | sensor | level | corrected | controlinfo | useinfo | cfailed -----------+---------------------+----------+---------+---------------------+--------+--------+-------+-----------+------------------+------------------+---------- 76930 | 2010-06-21 13:00:00 | -32767 | 211 | 2010-06-21 13:30:06 | 22 | 0 | 0 | 8.4 | 0000001100000000 | 9894900000000000 | QC2d-2 (1 row) kvalobs=# select * from data where obstime between '2010-06-21 12:00:00' and '2010-06-21 14:00:00' and stationid=76930 and paramid=211; stationid | obstime | original | paramid | tbtime | typeid | sensor | level | corrected | controlinfo | useinfo | cfailed -----------+---------------------+----------+---------+---------------------+--------+--------+-------+-----------+------------------+------------------+---------- 76930 | 2010-06-21 12:00:00 | 7.6 | 211 | 2010-06-21 12:36:27 | 11 | 0 | 0 | 7.6 | 0110000000100000 | 7000000000000000 | 76930 | 2010-06-21 13:00:00 | -32767 | 211 | 2010-06-21 13:30:06 | 22 | 0 | 0 | 8.4 | 0000001100000000 | 9894900000000000 | QC2d-2 76930 | 2010-06-21 14:00:00 | 9.1 | 211 | 2010-06-21 14:24:02 | 22 | 0 | 0 | 9.1 | 0110000000000000 | 7100000400000000 | (3 rows) kvalobs=#
Note: After seeing the above info I added CfailedString=“MIST” to the cfg file.
NB Lots of work to be done tuning the control flag settings etc!!!
NB Currently only work on data where “qclevel NOT EQUAL 0” Therefore it is vital that a qc1 algorithm has checked the data … otherwise there will be no Qc2 check suspect that this might be the case. How often? How better to tune?
Step 7 | Opportunity for ad hoc testing |
---|
The algorithm checks «!CheckFlags.condition(is→controlinfo(),params.Aflag)» that none of the analysis flag settings are true, i.e. the settings in the config file prefixed with “A_”
Step 8 | Bulk testing |
---|
On dev-vm101 the directory /home/kvalobs/TESTDATA contains test data for March 2036.
data-203603-GeneralSingleMissing-P7.dat | All of March 2036 |
ThreeDays.dat | Approximately the first three days |
On each day for all stations the “original” hourly temperature has been set to a missing value -32767 for the times 05:00, 11:00, 17:00 and 23:00 UT and the original value is kept for reference, in curly bracket in CFAILED.
|12320|2036-03-01 11:00:00|-32767|211|2036-03-01 11:06:53|330|0|0|2.3|1111100000000010|7000000000000000| {2.3}
To use the testdata, clean up old entries in the kvalobs db with commands like:
sql> DELETE FROM data WHERE obstime between '2036-02-28' and '2036-04-01';
Reload test data:
sql> \copy data FROM '/home/kvalobs/TESTDATA/data-203603-GeneralSingleMissing-P7.dat' WITH DELIMITER AS '|'
And capture results with
sql> select * from data WHERE obstime between '2036-02-28' and '2036-04-01' and cfailed like '%QC2d-2%' \g | cat >> FullMissingLinear.dat;
EXAMPLE TEST
8i | Clean up previous entries |
$psql kvalobs | |
sql > DELETE FROM data WHERE obstime between '2036-02-28' and '2036-04-01'; | |
8ii | Load test data |
sql > \copy data FROM '/home/kvalobs/TESTDATA/data-203603-GeneralSingleMissing-P7.dat' WITH DELIMITER AS '|'
8iii | /etc/kvalobs/Qc2Config/MissingLinear.hold is set up for this test … |
edit the file to change run time to the next minute, and then: | |
$ mv /etc/kvalobs/Qc2Config/MissingLinear.hold /etc/kvalobs/Qc2Config/MissingLinear.cfg | |
monitor output | |
tail -f /var/log/kvalobs/Qc2.log | |
NB This is a heavy test and the log file will fill and rotate, so “Ctrl C” the above command and restart a few times, otherwise it looks like the program has stalled | |
8iv | When finished |
select * from data WHERE obstime between '2036-02-28' and '2036-04-01' and paramid=211 and cfailed like '%QC2d-2%' \g | cat >> FullMissingLinear.dat;
Also, put the algorithm back on hold since it will try and run everyday … | |
$ mv /etc/kvalobs/Qc2Config/MissingLinear.cfg /etc/kvalobs/Qc2Config/MissingLinear.hold |
ANALYSIS
cat FullMissingLinear.dat | grep -v "rows)" | sed '/^$/d' | sed '/--/d' | sed '/stationid/d' | sed 's/^.\{103\}//' | sed 's/|.*{/ /' | sed 's/}.*//' | sed '/-32767/d' > XY.dat
The above command just strips out the two columns of data to compare, e.g.just looking at the first lines
cat FullMissingLinear.dat | grep -v "rows)" | sed '/^$/d' | sed '/--/d' | sed '/stationid/d' | sed 's/^.\{103\}//' | sed 's/|.*{/ /' | sed 's/}.*//' | sed '/-32767/d' | head
gives:
3.2 5.4 8.1 8.7 2 2.3 6.9 6.7 5.8 6.5 1 0.9 6.6 6.9 6.8 6.63 6.9 6.86 6.5 6.68
Values to be rounded to 1 decimal place. Investigate and fix.
Then plot in R: (BE CAREFUL copying and pasting the long lines !!!)
$ R FileName="XY.dat" dim=100 # make dim smaller to zoom into the middle of the plot UTD <- read.table(FileName, header=TRUE) #jpeg(filename=paste(FileName,"jpg",sep=".")) pdf() plot(UTD[,1],UTD[,2],ylim=c(-dim,dim),xlim=c(-dim,dim),xlab='Qc2 Algorithm/Celsius',ylab='Original/Celsius', main='200803 Hourly Temperature', sub='Missing Value Correction',col=4) # pch=46 for dots lines(c(-dim,dim),c(-dim,dim),col=3) lines(c(-dim-10,dim+10),c(0,0)) lines(c(0,0),c(-dim-10,dim+10)) lines(c(-dim-10,dim+10),c(-dim-10,dim+10),col=3) dev.off() #To exit R: quit() # NB the script creates the plot "Rplots.pdf" picture below (converted to jpg for the wiki).
More examples of other parameters available at:
— 2010/06/22 23:25