RAMADDA SeeSV
The RAMADDA SeeSV package provides 313 commands for manipulating CSV and other types of files
Introduction
The RAMADDA SeeSV can be used both interactively from within RAMADDA as well as via the command line. Help with the interactive version can be viewed here. In depth interactive examples are available here.For brevity, in the below documentation we're assuming that you are using an alias:
alias seesv="sh /path/to/seesv/seesv.sh"The examples here show the command line use. If running interactively through the RAMADDA web interface the commands are the same form except for the seesv command call and the specification of the input file. So, for example, if you are calling the tool from the command line as:
seesv -columns column1,column2 -p file.csv > output.csvThe corresponding commands interactively would be:
-columns column1,column2
Install
- Download the seesv.zip file from the RAMADDA download site
- Unzip the file
- Consult the README
- SeeSV runs using Java 1.8 or higher. Set the JAVA environment variable to point to the java runtime.
- Usage:
seesv <any number of commands> -p <input file> > output file
Basic Use
The general use of the SeeSV tools is as follows:seesv <a number of processing commands> <some output command> file.csv > output.csv #or seesv <a number of processing commands> <some output command> -o output.csv file.csv #To show help: seesv -help #To show help on a topic: seesv -help:some_topic #With no processing commands and just printing the header: seesv -printheader file.csv #Or a shortcut seesv -ph file.csv #With no processing commands and csv output: seesv -p file.csv > output.csv #With no processing commands and a record oriented output: seesv -record file.csv > output.csv #With the -columns command printing csv output seesv -columns column1,column2 -p file.csv > output.csv
Column Identifiers
Note: many of these commands use one or more column names from the source file. These column names are converted to a canonical identifier format by lower casing, changing spaces to "_" and removing non alpha numeric characters. So for example if your source file is:column 1, Column 2, Some "funny" column ...Then the column ids are:
column_1,column_2,some_funny_columnYou can always use the -ids command to list the ids
seesv -ids -printheader test.csv
Some Examples
#### selecting columns #Subset test.csv and only include column1 and column2 seesv -columns column1,column2 -p test.csv > newtest.csv #You can also use numeric indices - zero based seesv -columns 0,1 -p test.csv > newtest.csv #Specify a range of columns #either using column names: seesv -columns column1-columnN -p test.csv > newtest.csv #or indices: seesv -columns 1-5,6,7,10-15 -p test.csv > newtest.csv #or mix and match numeric and names seesv -columns 0-1,columnA,columnB,columnF-columnZ -p test.csv > newtest.csv #Specify a step: #seesv -columns start-end:step #e.g. below gives 0,3,6,9,12,15 seesv -columns 0-15:3 #include all columns except column1 seesv -notcolumns column1 -p test.csv > newtest.csv #subset columns and then change the values seesv -columns column1,column2 -change column1,column2 "pattern" "new value" -p test.csv > newtest.csv #Only include column1 and then only pass through unique values matching exactly seesv -columns column1 -unique 0 exact -p test.csv > newtest.csv #Only pass throught unique values in the first column and then sort the output seesv -columns column1 -unique 0 -sort column1 -p test.csv > newtest.csv
Input
Specify the input. Default is assumed to be a CSV but can support HTML, JSON, XML, Shapefile, etc.
seesv -p test.csv.gz
seesv -lines -p test.txt
#Process xml with the given path
seesv -xml "path.to.rows" -p test.xml
#Process JSON
seesv -json "path.to.array" "object_paths" -p test.json
- -delimiter: Specify the input delimiter
- -inputcomment: Input comment
- -tab: Use tabs. A shortcut for -delimiter tab
- -widths: Columns are fixed widths
- -quotesnotspecial: Don't treat quotes as special characters
- -cleaninput: Input is one text line per row. i.e., no new lines in a data row. Setting this can improve performance on large files
- -start: Start at pattern in source file
- -stop: End at pattern in source file
- -trimline: Trim the input line
- -bom: Input has a leading byte order mark (BOM) that should be stripped out
- -encoding: Specify the file encoding
- -header: Raw header
- -unitrow: Second row contains units
- -multifiles: Treat input files separately
- -json: Parse the input as json
- -jsonjoin: Join different arrays in the input JSON
- -jsonvalue: Extract a value from a JSON column
- -geojson: Parse the input as geojson
- -pdf: Read input from a PDF file.
- -xml: Parse the input as xml
- -shapefile: Parse the input shapefile
- -lines: Parse the input as text lines. Treat each line as one column
- -htmltable: Parse tables in the input html file
- -htmlpattern: Parse the input html file
- -harvest: Harvest links in web page. This results in a 2 column dataset with fields: label,url
- -text: Extract rows from the text
- -extractpattern: Extract rows from the text
- -tokenize: Tokenize the input from the pattern
- -sql: Read data from the given database
- -synthetic: Generate an empty file with the given number of rows
- -prune: Prune out the first N bytes
- -noheader: Strip off the header
- -deheader: Strip off the RAMADDA point header
- -headernames: Make the header proper capitalization
- -headerids: Clean up the header names
- -ids: Use canonical names
- -sheet: Set XLS sheet #
- -cat: Concat the columns in one or more csv files
- -append: Append the files, skipping the given rows in the latter files
- -chop: Write out last N lines. include the header
- -filenamepattern: Extract strings from the file name and add them as new columns
- delimiter:
Use 'space' for space, 'tab' for tab,'?' to guess between tab and space
- widths:
w1,w2,...,wN
nameXXXvalue1XXXXXvalue2XXXXX foo 1 2 bar 3 4You would set the -widths to be:
Normally, when parsing a CSV file if a quote is encountered it is assumed to be a special character enclosing commas or new lines. If the quotes in your CSV file aren't special then use this command.
- encoding:
File Encoding see https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html values:UTF-8,UTF-16,UTF-16BE,UTF-16LE,UTF-32,UTF-32BE,UTF-32LE,CESU-8,IBM00858,IBM437,IBM775,IBM850,IBM852,IBM855,IBM857,IBM862,IBM866,ISO-8859-1,ISO-8859-13,ISO-8859-15,ISO-8859-2,ISO-8859-4,ISO-8859-5,ISO-8859-7,ISO-8859-9,KOI8-R,KOI8-U,Not available,US-ASCII,windows-1250,windows-1251,windows-1252,windows-1253,windows-1254,windows-1257,x-IBM737,x-IBM874,x-UTF-16LE-BOM,x-UTF-32BE-BOM,x-UTF-32LE-BOM
- template:
File template - ${file_shortname} ${file_name} ${count}
- arrayPath:
Path to the array e.g., obj1.arr[2].obj2
- objectPaths:
One or more paths to the objects e.g. geometry,features
{ "p1": { "p2": { "values": [ 1,2,3,4 ] } } }The command would be:
1 |
2 |
3 |
4 |
If you had a path that had an array component that you wanted to access, for example, the second element of the array, e.g.:
{ "p1": [ {"foo":"bar"}, { "p2": { "values": [ "1","2","3","4" ] } } ] }The command would be:
{ "p1": { "p2": { "values": [ { "name":"jim", "value":5 }, { "name":"jill", "value":10 } ] } } }The command is:
name | value |
jim | 5 |
jill | 10 |
If the object within the array is a nested object and you only want to access an inner object, e.g.:
{ "p1": { "p2": { "values": [ { "foo":"bar", "goodvalues": { "name":"jim", "value":5 } }, { "foo":"bar", "goodvalues": { "name":"jill", "value":10 } }, ] } } }You would use the second argument to specify an inner path to the object of interest:
{ "p1": { "p2": { "values": [ { "foo":"bar", "goodvalues": { "name":"jim", "value":5 }, "moregoodvalues": { "age":74, } }, { "foo":"bar", "goodvalues": { "name":"jill", "value":10 }, "moregoodvalues": { "age":80, } }, ] } } }Then the second argument can be a comma separated list of path specifications:
seesv -json "p1.p2.values" "goodvalues,moregoodvalues"Resulting in:
age | name | value |
74 | jim | 5 |
80 | jill | 10 |
- arrayPaths:
comma separated list of the array paths
- keys:
Comma separated list of keys to match on
- pattern:
Optional pattern to replace the key value with
- replace:
Pattern replace
- missing:
Missing value
- columns:
Column names
- arrayPath:
Path to the array e.g., obj1.arr[2].obj2
- includePolygon:
Include polygon values:true,false
The RAMADDA_TABULA environment variable needs to be set to the tabula.sh file found in this release
export RAMADDA_TABULA=/path/to/tabula.sh
- props:
"addPoints true addShapes false"
For each line in the input file this creates a row that contains one column which holds the line.
- skip:
Number of tables to skip
- pattern:
Pattern to skip to
- properties:
Other name value args -
- numTables N:Number of tables to process. Default is 1
- removeEntity true:remove HTML entities
- removePattern pattern
- extractUrls true
- columnN.extractUrls true: N=column number
- stripTags false: strip any HTML tags. Default =true
- columnN.stripTags false: N=column number. Set stripTags for the column
The properties are defined as:
- columns:
Column names
- startPattern:
- endPattern:
- pattern:
Row pattern. Use (...) to match columns
- pattern:
regexp to match
- comma separated header:
- chunk pattern:
- token pattern:
Assume you have a file test.txt
Hello there how are you Person: name: Jim age: 42 and some more text Person: name: Joe age: 53 Person: name: Sally age: 93 and moreWe are going to extract the names and age with:
The (.*?$) in the first pattern is the group that is matched.
The second pattern: name:(.*?)age: *(\d+) has 2 groups, one for the name and one for the age.
- comma separated header:
- token pattern:
Assume you have a file test.txt
Person: name: Joe age: 53 Person: name: Sally age: 93Extract the data with:
- header:
header1,header2...
- pattern:
- db:
The database id (defined in the environment) values:property:seesv_dbs
- table:
Comma separate list of tables to select from
- columns:
Comma separated list of columns to select
- where:
column1:expr:value;column2:expr:value;...
e.g.: name:like:joe;age:>:60
Where expr is: =|<|>|<>|like|notlike
- properties:
name space value properties. e.g., join col1,col2
e.g.:
seesv -sql test ...
export seesv_db_<db>_url=JDBC URL export seesv_db_<db>_tables=List of tables that can be accessedSo for the db "test" in a Derby database:
export seesv_db_test_url=jdbc:derby:/Users/home/derby/db1;create=true; export seesv_db_test_tables=test_table1,test_table2,test_table3
- header:
comma separated header
- values:
comma separated values
- number_rows:
Number of rows
name | value |
example name | example value |
example name | example value |
example name | example value |
example name | example value |
example name | example value |
Note: you don't have to specify all of the example values. For example, using this:
name | value |
<blank> | <blank> |
<blank> | <blank> |
<blank> | <blank> |
<blank> | <blank> |
<blank> | <blank> |
This command converts the column names to a standardized format. Changing the names to lower case and converting punctuation, etc to "_".
- *.csv:
name1 | value1 |
foo | 1 |
bar | 2 |
test2.csv:
name2 | value2 | value3 |
xxx | 1 | 3 |
yyy | 2 | 7 |
Calling:
name1 | value1 | name2 | value2 | value3 |
foo | 1 | xxx | 1 | 3 |
bar | 1 | yyy | 2 | 7 |
- skip:
Number of rows to skip
- files:
- .csv
- .csv
name | value |
foo | bar |
test2.csv:
name | value |
x | y |
test3.csv:
name | value |
l | m |
Calling:
name | value |
foo | bar |
x | y |
l | m |
- numlines:
Number of lines to leave
- file:
- .csv
- .csv
- pattern:
Pattern to match
- columnnames:
Comma separated list of column names
- -skiplines: Skip number of raw lines.
- -maxrows: Set max rows to process
- -match: Pass through rows that the columns each match the pattern
- -notmatch: Pass through rows that don't match the pattern
- -if: Next N args specify a filter command followed by any change commands followed by an -endif.
- -rawlines:
- -inputnotcontains: Filter out input lines that contain any of the strings
- -min: Only pass thorough lines that have at least this number of columns. Specify blank to use the number of columns in the header
- -max: Only pass through lines that have no more than this number of columns. Specify blank to use the number of columns in the header
- -numcolumns: Remove or add values so each row has the number of columns
- -has: Only pass through anything if the data has the given columns
- -ifnumcolumns: Only pass through rows with number of columns passing the operator
- -fuzzypattern: Pass through rows that the columns each fuzzily match the pattern
- -lengthgreater: Pass through rows that the length of the columns is greater than
- -same: Pass through where the 2 columns have the same value
- -notsame: Pass through where the 2 columns don't have the same value
- -unique: Pass through unique values
- -dups: Pass through duplicate values
- -sample: Pass through rows based on probablity
- -minvalue: Pass through the row that has the min value in the group of columns defined by the key column
- -maxvalue: Pass through the row that has the max value in the group of columns defined by the key column
- -eq: Pass through rows that the column value equals the given value
- -ne: Pass through rows that the column value does not equal the given value
- -gt: Pass through rows that the column value is greater than the given value
- -ge: Pass through rows that the column value is greater than or equals the given value
- -lt: Pass through rows that the column value is less than the given value
- -le: Pass through rows that the column value is less than or equals the given value
- -between: Extract rows that are within the range
- -notbetween: Extract rows that are not within the range
- -betweenstring: Extract rows that are between the given strings
- -notbetweenstring: Extract rows that are between the given strings
- -groupfilter: One row in each group has to match
- -before: Pass through rows whose date is before the given date
- -after: Pass through rows whose date is after the given date
- -countvalue: No more than count unique values
- -decimate: only include every
row - -ifin: Pass through rows that the columns with ID is in given file
- -ifnotin: Pass through rows that the columns with ID is not in given file
- -ifmatchesfile: Pass through rows that the columns with ID begins with something in the given file
- -ifnotmatchesfile: Pass through rows that the columns with ID does not begin with something in the given file
- -skippattern: Skip any line that matches the pattern
- -skiprows: Skip number of processed rows.
- -ensurenumeric: Throw error if non-numeric
- columns:
Column indices. Can include ranges, e.g. 0-5
- pattern:
regexp or prefix with includes:s1,s2 to do substrings match
- columns:
Column indices. Can include ranges, e.g. 0-5
- pattern:
regexp or prefix with includes:s1,s2 to do substrings match
For example, the below command would change the value of column2 for all rows that match the -pattern column1 string command.
- filters:
Comma separated list of strings to filter on
- min # columns:
- max # columns:
- number:
use -1 to use the # of columns in the header
- columns:
Column indices. Can include ranges, e.g. 0-5
- operator:
<,<=,>,>=,=,!=
- number:
Number of columns
- threshold:
Score threshold 0-100. Default:85. Higher number better match
- columns:
Column indices. Can include ranges, e.g. 0-5
- pattern:
- columns:
Column indices. Can include ranges, e.g. 0-5
- length:
- column1:
- column2:
- columns:
Column indices. Can include ranges, e.g. 0-5
- mode:
What type of matching is done - exact (exact match) or clean (lower case and remove whitespace) or fuzzy:threshold (do fuzzy matching with threshold from 1: no similarity to 100: exact match. use fuzzy:? to print out values) values:exact,clean,fuzzy:threshold
value:joey smith 95 joe smith value:sam jones 44 joe smith 42 joey smith value:sam jones 44 joe smith 42 joey smith value:sue washington 43 joe smith 42 joey smith 35 sam jones value:sue washington 43 joe smith 42 joey smith 35 sam jones
- probablity:
0-1 probability of passing through a row
- key column:
- value column:
- key column:
- value column:
- column:
- value:
- column:
- value:
- column:
- value:
- column:
- value:
- column:
- value:
- column:
- min value:
- max value:
- column:
- min value:
- max value:
- column:
- start string:
- end string:
name | value1 | value2 |
jane | foo | 1 |
jane | bar | 2 |
john | doe | 3 |
bill | doe | 4 |
bill | doe | 4 |
Calling:
name | value1 | value2 |
jane | foo | 1 |
jane | bar | 2 |
john | doe | 3 |
- column:
- start string:
- end string:
name | value1 | value2 |
jane | foo | 1 |
jane | bar | 2 |
john | doe | 3 |
bill | doe | 4 |
bill | doe | 4 |
Calling:
name | value1 | value2 |
bill | doe | 4 |
bill | doe | 4 |
- column:
key column
- value_column:
Value column
- operator:
values:=,!=,~,<,<=,>,>=
- value:
- column:
Column in the file
- file:
The file
- column2:
Column in main file
precinct |
p1 |
p2 |
p3 |
And a file voters.csv:
name | voter precinct |
jim | p1 |
joe | p1 |
sally | p7 |
... |
If you only want voters whose precinct is in the precints.csv you would do:
- column:
Column in the file
- file:
The file
- column2:
Column in main file
- pattern:
Pattern template, e.g. ^${value}
- column:
Column in the file
- file:
The file
- column2:
Column in main file
For example, say you have a file of addresses short.csv:
address | city | state |
123 oak | boulder | co |
345 main | denver | co |
... |
And your source data has a set of full addresses full.csv:
full address | city | state |
123 oak unit #5 | ||
789 pine apt 3 | ||
345 main | ||
... |
And you only want to pass through rows in full.csv address begins with some address in the short.csv file. To do this call:
^123 oak.* ^345 main.* ...And then only passes through the rows in full.csv that match one of these patterns.
The command -ifnotmatchesfile does the opposite. For a row to pass through none of the patterns can match.
- pattern:
Pattern template, e.g. ^${value}
- file:
The file
- column2:
Column in main file
- columns:
Column indices. Can include ranges, e.g. 0-5
- -columns: Only include the given columns
- -notcolumns: Don't include given columns
- -firstcolumns: Move columns to beginning
- -lastcolumns: Move columns to end
- -columnsbefore: Move columns before the given column
- -columnsafter: Move columns after given column
- -delete: Remove the columns
- -cut: Drop rows
- -include: Only include specified rows
- -rows_first: Move rows to the top that match the pattern
- -rows_last: Move rows to the end of list that match the pattern
- -copy: Copy column
- -add: Add new columns
- -insert: Insert new column values
- -concat: Create a new column from the given columns
- -concatrows: Concatenate multiple rows into a single row
- -combine: Combine columns with the delimiter. deleting columns
- -combineinplace: Combine columns with the delimiter
- -merge: Apply operators to columns
- -split: Split the column
- -splat: Create a new column from the values in the given column
- -roll: Roll columns down into rows
- -grab: Look for rows that match a pattern and grab column values and add them to subsequent rows
- -shift: Shift columns over by count for given rows
- -slice: Slide columns down and over to append new rows to the bottom
- -addcell: Add a new cell at row/column
- -deletecell: Delete cell at row/column
- -clone: Clone each row N times
- -appendrows: Only include specified rows
- -mergerows: Merge rows
- -rowop: Apply an operator to columns and merge rows
- -rotate: Rotate the data
- -flip: Reverse the order of the rows except the header
- -makefields: Make new columns from data values
- -melt: Use values in header to make new row
- -explode: Make separate files based on value of column
- -join: Join the 2 files together
- -fuzzyjoin: Join the 2 files together using fuzzy matching logic
- -cross: Make a cross product of 2 data files
- -normal: Normalize the strings
- -countunique: Count number of unique values
- -dissect: Make fields based on patterns
- -keyvalue: Make fields from key/value pairs, e.g. name1=value1 name2=value2 ...
- -firstchars: Extract first N characters and create new column
- -lastchars: Extract last N characters and create new column
- -between_indices: Extract characters between the 2 indices
- -fromheading: Extract column values from headings
- columns:
Column indices. Can include ranges, e.g. 0-5
- columns:
Column indices. Can include ranges, e.g. 0-5
- columns:
Column indices. Can include ranges, e.g. 0-5
- column:
Column to move before
- columns:
Columns to move
- column:
Column to move after
- columns:
Columns to move
- columns:
columns to match on
- pattern:
Pattern
- columns:
columns to match on
- pattern:
Pattern
- names:
Comma separated list of new column names
- values:
Comma separated list of new values
- column:
Column to insert before
- name:
Name of new column
- values:
Value to insert. Use ${row} to add the row index
- columns:
Column indices. Can include ranges, e.g. 0-5
- delimiter:
- name:
Name of new colums
- num_rows:
Number of rows
name | value |
joe | 5 |
jill | 2 |
jack | 1 |
jim | 7 |
Running:
name | value | name | value |
joe | 5 | jill | 5 |
jack | 1 | jim | 7 |
- columns:
Column indices. Can include ranges, e.g. 0-5
- delimiter:
- column name:
New column name
- columns:
Column indices. Can include ranges, e.g. 0-5
- delimiter:
- column name:
New column name
- columns:
Columns to merge
- name:
New column(s) name
- operator:
Operator values:average,min,max,count
- column:
- delimiter:
What to split on
- names:
Comma separated new column names
- keycol:
Key column
- column:
- delimiter:
- name:
new column name
- column:
Column to match
- pattern:
Pattern to match
- columns:
Columns to grab
- names:
Comma separated list of new column names
- rows:
Rows to apply to
- column:
Column to start at
- count:
- columns:
Columns to move
- dest:
Desc column to move to
- fill:
Comma separated list of values to fill out the new row
- skip:
How many rows to skip
- count:
How many rows to merge
- delimiter:
How many rows to merge
category | category | category |
fruit | vegetable | meat |
1 | 2 | 3 |
4 | 5 | 6 |
Running:
category - fruit | category - vegetable | category - meat |
1 | 2 | 3 |
4 | 5 | 6 |
- keys:
Key columns
- values:
Value columns
- operator:
Operator values:average,min,max,count
- column:
Column to get new column header#
- value columns:
Columns to get values from
- unique column:
The unique value, e.g. date
- other columns:
Other columns to include
- columns:
Column indices. Can include ranges, e.g. 0-5
- header label:
- value label:
- column:
- template:
e.g., file_${value}.csv
- key columns:
key columns the file to join with
- value_columns:
value columns
- file:
File to join with
- source_columns:
source key columns
- default_value:
default value - can be a comma separated list of defaults
- threshold:
Score threshold 0-100. Default:85. Higher number better match
- key columns:
Numeric column numbers of the file to join with
- value_columns:
numeric columns of the values to join
- file:
File to join with
- source_columns:
source key columns
- default_value:
default value
- file:
File to cross with
letter |
a |
b |
c |
d |
number |
1 |
2 |
3 |
The command:
letter | number |
a | 1 |
a | 2 |
a | 3 |
b | 1 |
b | 2 |
b | 3 |
c | 1 |
c | 2 |
c | 3 |
d | 1 |
d | 2 |
d | 3 |
- columns:
Column indices. Can include ranges, e.g. 0-5
- column:
- pattern:
e.g., "(field1:.*) (field2:.*) ..."
- column:
- name:
New column name
- number:
Number of characters
- column:
- name:
New column name
- number:
Number of characters
- column:
- name:
New column name
- start:
Start index
- end:
End index
- columns:
Columns of headings
- names:
Comma separated list of new column names
- pattern:
Regexp to apply to header with () defining column values
- -change: Change columns
- -changerow: Change the values in the row/cols
- -replace: Replace
- -set: Write the value into the cells
- -cleanwhitespace: Clean whitespace
- -macro: Look for the pattern in the header and apply the template to make a new column, template: '{1} {2} ...', use 'none' for column name for no header
- -setcol: Write the value into the write col for rows that match the pattern
- -copyif: Copy column 2 to column 3 if all of the columns match the pattern
- -copycolumns: Copy columns 1 to columns 2
- -filldown: Fill down with last non-null value
- -fillacross: Fill across with last non-null value
- -unfill: Set following cells to blank if the same as previous cell
- -priorprefix: Append prefix from the previous element to rows that match pattern
- -case: Change case of column - type:lower,upper,proper,capitalize
- -toid: Convert the column(s) into IDS (lowercase, no space, a-z0-9_)
- -padleft: Pad left with given character
- -padright: Pad right with given character
- -trim: Trim leading and trailing white space
- -trimquotes: Trim leading and trailing quotes
- -width: Limit the string size of the columns
- -prepend: Add the text to the beginning of the file. use _nl_ to insert newlines
- -pad: Add or remove columns to achieve the count
- -prefix: Add prefix to column
- -suffix: Add suffix to column
- -subst: Create a new column with the template
- -ascii: Convert non ascii characters
- -cleanphone: Clean the phone number
- -formatphone: Format the phone number
- -ismobile: Add a true/false if the string is a mobile phone
- -sms: Send a text message - only for command line
- -js: Define Javascript (e.g., functions) to use later in the -func call
- -func: Apply the javascript function. Use _colname or _col#
- -endswith: Ensure that each column ends with the string
- -truncate:
- -extract: Extract text from column and make a new column
- -urlarg: Extract URL argument and make a new column
- -extracthtml: Extract text from HTML
- -htmlinfo: Extract icon and description from input URL
- -checkmissing: Check for missing URL
- -xmlencode: Encode the value for XML
- -urlencode: URL encode the columns
- -urldecode: URL decode the columns
- -map: Change values in column to new values. Start values with '*' to do a string replace
- -format: Apply decimal format to the columns (see https://docs.oracle.com/javase/7/docs/api/java/text/DecimalFormat.html)
- -denormalize: Read the id,value from file and substitute the value in the dest file col idx
- -break: Break apart column values and make new rows
- -parseemail: Parse out name and email
- -makeids: Turn the header row into IDs (lowercase, no space, a-z0-9_)
- -faker: Fake up data. See the docs at https://ramadda.org/repository/userguide/seesv.html#-faker
- -edit: Hand edit a column (command line only). ESC-stop, BLANK-skip
- columns:
Column indices. Can include ranges, e.g. 0-5
- pattern:
Regexp pattern. Help
- substitution string:
use $1, $2, etc for pattern (...) matches
label |
xxxsome valueyyy |
You can strip out the xxx...yyy and add a label: prefix with:
label |
label:some value |
You can also specify a file that holds a number of patterns:
pattern1::replace_string1 pattern2::replace_string2e.g.:
(?i).*foo.*::foobar .*value.*::new value
- rows:
Row indices. Can include ranges, e.g. 0-5
- columns:
Column indices. Can include ranges, e.g. 0-5
- pattern:
Regexp pattern. Help
- substitution string:
use $1, $2, etc for pattern (...) matches
- columns:
Column indices. Can include ranges, e.g. 0-5
- substitution string:
use $1, $2, etc for pattern (...) matches
use {value} for value
- columns:
Column indices. Can include ranges, e.g. 0-5
- rows:
Row indices. Can include ranges, e.g. 0-5
- value:
- pattern:
Regexp pattern. Help
- template:
- column label:
- column:
match col #
- pattern:
Regexp pattern. Help
- write column:
- value:
- columns:
Column indices. Can include ranges, e.g. 0-5
- pattern:
Regexp pattern. Help
- column1:
- column2:
- columns:
Column indices. Can include ranges, e.g. 0-5
- columns:
Column indices. Can include ranges, e.g. 0-5
- rows:
One or more rows. -1 to the end. e.g., 0-3,5,10,-1
- columns:
Column indices. Can include ranges, e.g. 0-5
name | value1 | value2 |
jane | foo | 1 |
jane | bar | 2 |
john | doe | 3 |
bill | doe | 4 |
bill | doe | 4 |
Calling:
name | value1 | value2 |
jane | foo | 1 |
<blank> | bar | 2 |
john | doe | 3 |
bill | <blank> | 4 |
<blank> | <blank> | 4 |
- column:
- pattern:
Regexp pattern. Help
- delimiter:
- column:
- type:
values:lower,upper,proper,capitalize
- columns:
Column indices. Can include ranges, e.g. 0-5
- columns:
Column indices. Can include ranges, e.g. 0-5
- character:
Character to pad to
- length:
Length
- columns:
Column indices. Can include ranges, e.g. 0-5
- character:
Character to pad to
- length:
Length
- columns:
Column indices. Can include ranges, e.g. 0-5
- columns:
Column indices. Can include ranges, e.g. 0-5
- columns:
Column indices. Can include ranges, e.g. 0-5
- size:
- column_name:
New Column Name
- template:
Template - use ${column_name} ...
- columns:
Column indices. Can include ranges, e.g. 0-5
- substitution string:
use $1, $2, etc for pattern (...) matches
- columns:
Column indices. Can include ranges, e.g. 0-5
- columns:
Column indices. Can include ranges, e.g. 0-5
export NUMLOOKUPAPI_API_KEY= or export NUMVERIFY_API_KEY=This command is only available when running from the command line or if running within RAMADDA the user is a logged in user. We do this as the SeeSV service within RAMADDA can be run by any user.
A cache file, ismobile.txt, of past phone numbers will be created in the local working directory so this command can be run multiple times without resorting to the API calls for past phone numbers.
- column:
Phone number
- campaign:
Campaign
- message:
Message template
To enable this command create an account at Twilio and access your API keys. Add the following environment variables:
export TWILIO_ACCOUNT_SID= export TWILIO_AUTH_TOKEN= export TWILIO_PHONE=This command takes a column ID which holds the phone number, a campaign identifier and a message template. The campaign identifier is used to create a list of phone numbers that a message has already been sent to. For example if you first run:
- names:
New column names
- javascript:
javascript expression
- columns:
Column indices. Can include ranges, e.g. 0-5
- max length:
- suffix:
- column:
- pattern:
Regexp pattern. Help
- replace with:
use 'none' for no replacement
- name:
new column name
- column:
URL Column
- names:
Comma separated list of new column names
- pattern:
Regexp pattern. Help
- column:
URL Column
- replace_with:
Replace with
- column:
- new column name:
- value:
old_value new_value old_value new_value
- columns:
Column indices. Can include ranges, e.g. 0-5
- format:
Decimal format e.g. '##0.00'
- file:
From csv file
- from id idx:
- from value idx:
- to idx:
- new col name:
- mode replace add:
- label1:
- label2:
- columns:
Column indices. Can include ranges, e.g. 0-5
- columns:
Column indices. Can include ranges, e.g. 0-5
For example, if you had the below data:
Some header label with some punctuation-+/ | Some other header label |
... |
Running:
some_header_label_with_some_punctuation | some_other_header_label |
... |
- what:
firstname|lastname|fullname|etc
- columns:
Columns to change. If none given then add the fake value
firstname fullname lastname name namewithmiddle prefix suffix title username address city country state stateabbr streetname timezone zipcode latitude longitude countrycode boolean asin ean13 ean8 gtin13 gtin8 imei isbn10 isbn13 isbngroup isbngs1 isbnregistrant color department material price productname promotioncode demonym educationalattainment maritalstatus race sex bic creditcard iban ssn digit digits:number_of_digits numberbetween:first:last randomdigit randomdigitnotzero randomDouble:maxNumberOfDecimals:min:max randomnumber cellphone phonenumber diseasename hospitalname medicinename symptoms
- column:
key column
- -md: Make a message digest of the column values
- -tob64: Base 64 Encode
- -fromb64: Base 64 Decode
- -rot13: Rot 13
- -encrypt: Encrypt using AES with SHA-256 key
- -decrypt: Encrypt using AES with SHA-256 key
- -uuid: Add a UUID field
- -number: Add 1,2,3... as column
- -letter: Add 'A','B', ... as column
- -soundex: Generate a soundex code
- -wikidesc: Add a description from Wikipedia
- -image: Do a Bing image Search for an image
- -embed: Download the URL and embed the image contents
- -fetch: Fetch the URL and embed the contents
- -imagefill: Search for an image with the query column text if the given image column is blank. Add the given suffix to the search.
- -download: Download the URL
- -gender: Figure out the gender of the name in the column
- columns:
Column indices. Can include ranges, e.g. 0-5
- type:
values:MD5,SHA-1,SHA-256,SHA-512,SHA3-256,SHA3-512,
- columns:
Column indices. Can include ranges, e.g. 0-5
- password:
- columns:
Column indices. Can include ranges, e.g. 0-5
- password:
- columns:
Search string columns
- suffix:
Text to add after
- columns:
Column indices. Can include ranges, e.g. 0-5
- suffix:
Text to add after
- name:
Name of new column
- ignore_errors:
Ignore Errors e.g., true or false
- url:
URL template, e.g., https://foo.com/${column_name}
- querycolumn:
- suffix:
- imagecolumn:
- columns:
Column indices. Can include ranges, e.g. 0-5
- -indateformats: Specify one or more date formats for parsing
- -outdateformat: Specify date format for formatting
- -convertdate: Convert date
- -adddate: Add date
- -cleardate: Clear date components
- -extractdate: Extract date
- -formatdate: Format date
- -formatdateoffset: Format the date offset, e.g. the hours in year
- -elapsed: Calculate elapsed time (ms) between rows
- -msto: Convert milliseconds to
- -latest: Pass through rows whose date is the latest in the group of rows defined by the key column
- -datediff: Calculate elapsed time between columns column1-column2
- -datecompare: add a true/false column comparing the date values
- format:
e.g. yyyy-MM-dd HH:mm:ss. Use semi-colon separated formats for multiples. Help
- timezone:
- format:
e.g. yyyy-MM-dd HH:mm:ss. Help
- timezone:
- date_column:
Date Column
- value:
Value Column
- value_type:
Value type - millisecond,second,minute,hour,hour_of_day,week,month,year
- date_column:
Date Column
- component:
Date component values:millisecond,second,minute,hour_of_day,day_of_month,month
- date column:
- what:
What to extract, e.g., year, month, day_of_week, etc values:era,decade,year,month,day_of_month,day_of_week,week_of_month,
day_of_week_in_month,am_pm,hour,hour_of_day,
minute,second,millisecond,days_in_year, hours_in_year, minutes_in_year,seconds_in_year
- column:
- what:
What type of offset, e.g., year, month, day_of_week, etc values:days_in_year, hours_in_year, minutes_in_year,seconds_in_year
- columns:
Key columns
- column:
Date column
- format:
Date Format, e.g. yyyy-MM-dd
- column1:
Column 1
- column2:
Column 2
- unit:
Unit-milliseconds,seconds,minutes,hours,days values:milliseconds,seconds,minutes,hours,days
- column1:
- column2:
- operator:
<,<=,=,!=,>=,> values:<,<=,=,!=,>=,>
- -scale: Set value={value+delta1}*scale+delta2
- -makenumber: Try to parse as number
- -generate: Add row values
- -decimals: Round decimals
- -fuzz: fuzz the number. if num_places less than zero than that is the # of decimals. else that is the lower digits to fuzz out
- -ceil: Set the max value
- -floor: Set the min value
- -delta: Add column that is the delta from the previous step
- -runningsum: Make a running sum of the column values
- -trendcounter: Make counter field that is incremented everytime the value column decreases
- -operator: Apply the operator to the given columns and create new one
- -compare: Add a true/false column comparing the values
- -round: Round the values
- -abs: Make absolute values
- -clip: Clip the number to within the range
- -rand: make random value
- -even: Add true if the column starts with an even number
- -sum: Sum values keying on key column value. If no value columns specified then do a count
- -pivot: Make a pivot table
- -summary: count/sum/average/min/max values keying on key column value. If no value columns specified then do a count
- -histogram: Make a histogram with the given column and bins
- -percent: Add columns together. Replace with their percentage
- -increase: Calculate percent increase
- -diff: Difference from previous value
- -average: Calculate a moving average
- -ranges: Create a new column with the (string) ranges where the value falls in
- -bytes: Convert suffixed values (e.g., 2 MB) into the number
- -column_and: And values
- -colum_nor: Or values
- -column_not: Not value
- -check: Check that the values are numbers
- columns:
Column indices. Can include ranges, e.g. 0-5
- delta1:
- scale:
- delta2:
v2 = (value+offset1)*scale + offsetYou can specify random values for any of the values with:
#Generate a random number between 0-1 random: #Generate a random number between 10 and 11 random:10 #Generate a random number between 10 and 30 random:10:30For example, if you had a latitude value that you wanted to obfuscate by adding some random value between 1 and 2 you would do:
-scale latitude random:1:2 1 0
- columns:
Column indices. Can include ranges, e.g. 0-5
- num_decimals:
how many decimals to round to
- columns:
Column indices. Can include ranges, e.g. 0-5
- num_places:
how many places to round to. use <=0 for decimals
- num_random_digits:
how many random digits
The num_random_digits is used to generate a number with those random digits.
If the num_places argument is <= zero then
that is the number of decimal places the number is rounded to.
If there are fewer than num_places decimals than those decimals are converted to "0".
The randomized digits are then appended to the number.
For example with R = random digit:
12345.6789 -> 12345.67RR 145.678 -> 145.67RR 9.0 -> 9.RR
If num_places > 0 then the number is converted to an integer and
the num_places least significant digits are converted to 0.
If the number of digits is < num_places then the number is converted to 0.
The randomized digits are then addedto the number.
For example with R = random digit:
12345.678 -> 123RR 145.678 -> 1RR 1.678 -> RR
- columns:
Column indices. Can include ranges, e.g. 0-5
- value:
Value
- columns:
Column indices. Can include ranges, e.g. 0-5
- value:
Value
- key columns:
- columns:
Column indices. Can include ranges, e.g. 0-5
- columns:
Column indices. Can include ranges, e.g. 0-5
- column:
The value column
- name:
Name of counter column
- columns:
Columns
- new col name:
- operator:
Operator:+,-,*,/,%,average values:+,-,*,/,%,average
- column1:
- column2:
- operator:
<,<=,=,!=,>=,> values:<,<=,=,!=,>=,>
- columns:
Column indices. Can include ranges, e.g. 0-5
- min:
- max:
- column name:
- minrange:
Minimum range (e.g. 0)
- maxrange:
Maximum range (e.g. 1)
- columns:
Column indices. Can include ranges, e.g. 0-5
- key columns:
- value columns:
- carry over columns:
- key columns:
Columns to key on
- column columns:
The columns the values of which are used to make the new columns in the result
- value column:
The value column
- operator:
The operator to apply - count,sum,average,min,max
Region | Date | Units | Sales |
West | 2016 | 1 | 11.00 |
South | 2016 | 8 | 96.00 |
West | 2016 | 2 | 26.00 |
North | 2016 | 7 | 84.00 |
North | 2016 | 8 | 104.00 |
South | 2016 | 2 | 22.00 |
Running:
Region | 2016 | 2017 | 2018 | 2019 |
West | 15.0 | 51.0 | 49.0 | 42.0 |
South | 25.0 | 23.0 | 31.0 | 14.0 |
North | 15.0 | 27.0 | 56.0 | 39.0 |
East | 10.0 | 24.0 | 18.0 | 13.0 |
You can also specify multiple operators:
Region | 2016 - count | 2016 - min | 2016 - max | 2017 - count | 2017 - min | 2017 - max | 2018 - count | 2018 - min | 2018 - max | 2019 - count | 2019 - min | 2019 - max |
West | 15.0 | 11.0 | 96.0 | 51.0 | 11.0 | 104.0 | 49.0 | 11.0 | 112.0 | 42.0 | 13.0 | 120.0 |
South | 25.0 | 12.0 | 96.0 | 23.0 | 11.0 | 104.0 | 31.0 | 11.0 | 120.0 | 14.0 | 12.0 | 112.0 |
North | 15.0 | 12.0 | 104.0 | 27.0 | 12.0 | 96.0 | 56.0 | 11.0 | 120.0 | 39.0 | 13.0 | 120.0 |
East | 10.0 | 24.0 | 96.0 | 24.0 | 11.0 | 104.0 | 18.0 | 11.0 | 112.0 | 13.0 | 12.0 | 105.0 |
- key columns:
Columns to key on
- value columns:
Columns to apply operators on
- carry over columns:
Extra columns to include
- ops:
any of count,sum,average,min,max
Region | Color | Units | Sales |
West | Red | 1 | 11.00 |
South | Blue | 8 | 96.00 |
West | Green | 2 | 26.00 |
North | Blue | 7 | 84.00 |
North | Green | 8 | 104.00 |
South | Red | 2 | 22.00 |
East | Blue | 5 | 60.00 |
West | Green | 2 | 26.00 |
You would do:
Region | Units sum |
West | 683.0 |
South | 399.0 |
North | 672.0 |
East | 294.0 |
If you wanted to get the sum and average of the sales:
Region | Sales sum | Sales avg |
West | 8873.0 | 56.51592356687898 |
South | 5044.0 | 54.236559139784944 |
North | 8709.0 | 63.56934306569343 |
East | 3730.0 | 57.38461538461539 |
The carry over column just gets the first value for the key column. So, if you wanted to include the color column do:
Region | Sales sum | Sales avg | Color |
West | 8873.0 | 56.51592356687898 | Red |
South | 5044.0 | 54.236559139784944 | Blue |
North | 8709.0 | 63.56934306569343 | Blue |
East | 3730.0 | 57.38461538461539 | Blue |
- column:
The column
- bins:
Comma separated set of bin values
- value columns:
Extra columns to sum up
- ops:
ops to apply to extra columns - any of count,sum,average,min,max
bins="18,30,40,50,60,70"This results in bins of:
0-18 18-30 30-40 40-50 etc.You can also specify a range with an optional step value. For example this:
bins="18-30"Will give you the bins:
18-19 19-20 20-21 ... 29-30 30-infinityOr you can specify a step
bins="18-30:2"Will give you the bins:
18-20 20-22 22-24 ...
- columns:
Column indices. Can include ranges, e.g. 0-5
- how far back:
- columns:
Column indices. Can include ranges, e.g. 0-5
- how far back (default 1):
- columns:
Column indices. Can include ranges, e.g. 0-5
- name:
New column name
- start:
Numeric start of range
- size:
Numeric size of range
- unit:
values:binary,metric
- columns:
Column indices. Can include ranges, e.g. 0-5
- name:
New column name
- columns:
Column indices. Can include ranges, e.g. 0-5
- name:
New column name
- columns:
Column indices. Can include ranges, e.g. 0-5
- columns:
Column indices. Can include ranges, e.g. 0-5
- what:
How strict values:strict,ramadda
- -geocode: Geocode using given columns
- -geocodeifneeded: Geocode if needed
- -geocodeaddressdb: Geocode for import into RAMADDA's DB. The lat/lon is one semi-colon delimited column
- -geocodejoin: Geocode with file
- -bounds: Geocode within bounds
- -inbounds: Subset lat/lon bounds
- -decodelatlon: Decode latlon
- -getaddress: Get address from lat/lon
- -statename: Add state name from state ID
- -geoname: Look up location name
- -geocontains: Check for containment
- -elevation: Look up elevation(using 1/3 arc-second DEM)
- -mercator: Convert x/y to lon/lat
- -region: Add the state's region
- -population: Add in population from address
- -neighborhood: Look up neighborhood for a given location
- columns:
Address columns
- prefix:
optional prefix e.g., state: or county: or country:
- suffix:
To use Google geocoding set the environment variable:
GOOGLE_API_KEY=...To use geocod.io set the environment variable:
GEOCIDEIO_API_KEY=...For here.com set:
HERE_API_KEY=To geocode, e.g., with the following file:
city | state |
Boulder | CO |
Laramie | WY |
Buffalo | NY |
Run:
To use internal location tables set the prefix argument to one of:
seesv -geocode columns "county:" "" ...
seesv -geocode columns "state:" "" ...
seesv -geocode columns "zip:" "" ...
seesv -geocode columns "country:" "" ...
city |
Boulder |
Denver |
Grand Junction |
Pueblo |
You can geocode with:
- columns:
Address columns
- prefix:
optional prefix e.g., state: or county: or country:
- suffix:
- latitude:
latitude column
- longitude:
longitude column
- columns:
columns
- prefix:
optional prefix e.g., state: or county: or country:
- suffix:
- column:
key column
- csv file:
File to get lat/lon from
- key idx:
- lat idx:
- lon idx:
precinct | latitude | longitude |
4171107648 | 40.197055 | -105.100351 |
2171207308 | 39.996986 | -105.084938 |
2173307410 | 40.039912 | -105.081393 |
The source file is a list of voters, e.g.:
name | precinct |
John Doe | 4171107648 |
Jane Doe | 4171107648 |
Jim Smith | 2171207308 |
To add the lat/lon:
- latitude:
latitude column
- longitude:
longitude column
- north:
North
- west:
West
- south:
South
- east:
East
- columns:
Lat or Lon column
latitude | longitude |
40° 1' 12 N | 107° 40' 10 W |
40:1:12 | 107:40:10 |
40:1:12 N | 107:40:10 W |
40:1 | 107:40 |
Running the commands:
latitude | longitude |
40.0 | -107.0 |
40.019999999999996 | 107.66944444444445 |
40.019999999999996 | -107.66944444444445 |
40.016666666666666 | 107.66666666666667 |
If you wanted to have less precision then call:
latitude | longitude |
40.0 | -107.0 |
40.02 | 107.66944 |
40.02 | -107.66944 |
40.01667 | 107.66667 |
- latitude:
latitude column
- latitude:
latitude column
- lookup:
('counties' or 'states' or 'countries' or 'timezones')
- fields:
fields in shapefile
- latitude:
Latitude column
- longitude:
Longitude column
name | latitude | longitude |
City 1 | 40.197055 | -105.100351 |
City 2 | 39.996986 | -105.084938 |
City 3 | 40.039912 | -105.081393 |
You can add the country and timezone with:
- lookup:
('counties' or 'states' or 'countries' or 'timezones')
- name:
new column name
- latitude:
Latitude column
- longitude:
Longitude column
- latitude:
Latitude column
- longitude:
Longitude column
name | latitude | longitude |
City 1 | 40.197055 | -105.100351 |
City 2 | 39.996986 | -105.084938 |
City 3 | 40.039912 | -105.081393 |
You can add the elevation with:
- columns:
Columns with state name or abbrev.
Alaska=West AK=West Alabama=South AL=South Arkansas=South ...So with the following data:
name Wyoming California ColoradoThe region is added with:
- columns:
Column indices. Can include ranges, e.g. 0-5
- prefix:
e.g., state: or county: or city:
- suffix:
- latitude:
Latitude column
- longitude:
Longitude column
- default:
Default value
PRECISELY_API_KEY=Call:
- -apply: Apply the commands to each of the columns
- -sortby:
- -count: Show count
- -alias: Set a field alias
- -value: Define a macro value for later use
- -filepattern: Extract a macro value from a filename
- -changeline: Change the line
- -changeraw: Change input text
- -crop: Crop last part of string after any of the patterns
- -strict: Be strict on columns. any rows that are not the size of the other rows are dropped
- -flag: Be strict on columns. any rows that are not the size of the other rows are shown
- -prop: Set a property
- -goeasy: Go easy on missing columns
- -verify: Verify that all of the rows have the same # of columns
- -ext: Execute the external program
- -exec: Execute the external program for every line
- columns:
Columns to expand with
- commands:
Commands. Use the macro ${column}. End with -endapply
- columns:
Column to sort on
- direction:
Direction - up or down values:up,down
- how:
How to sort - string, length, date, extract (number) values:string,number,length,date,extract
- name:
Name
- alias:
Alias
some_really_long_field_name |
1 |
2 |
You can use the alias command to provide a short name, e.g.:
- name:
Macro name
- pattern:
Regexp pattern. Help
- columns:
Column indices. Can include ranges, e.g. 0-5
- patterns:
Comma separated list of patterns
- program_id:
matches with seesv.ext.<program_id>=/path
seesv_ext_<id>=/path/to/executableAny command line arguments given up to but not including an argument that equals "-" are passed in to the executable. The executable is called with the given arguments. Each row that is processed is converted into a CSV string which is passed to the external command on the stdin. A CSV text string is then read from the external command. This line is then passed on in the processing flow. If you want to skip a line the return the string "_null_".
- program_id:
matches with seesv.ext.<program_id>=/path
seesv_exec_<id>=/path/to/executableAny command line arguments given up to but not including an argument that equals "-" are passed in to the executable. For every row the executable is called with the given arguments. Each of the arguments can contain any number of macros of the form:
seesv -exec <id> ${url} ${name}The macros get replaced with the corresponding values in the given row.
So for example, if you have a list of urls that you want to fetch using cURL for set the environment variable:
export seesv_exec_curl=/usr/bin/curlThen assuming you have the file:
url url1.html url2.htmlCall the command
seesv -exec curl ${url} - -pThis produces a new file:
url,result url1.html,contents of url1 url2.html,contents of url2
- -print: Print text output
- -printdelim: Print with delimited output
- -o: Write to the given file (command line only)
- -comment: Add a comment to the output
- -outputprefix: Specify text to add to the beginning of the file
- -highlight: Highlight the columns
- -background: Background the columns
- -printheader: Print header
- -raw: Print the file raw
- -table: Print HTML table and stats
- -cols: Set the width of the columns for output. Use with -p
- -stats: Print summary stats
- -torecord: Print records
- -script: Generate script
- -toxml: Generate XML
- -tojson: Generate JSON
- -togeojson: Generate GeoJSON
- -tourl: Generate DB publish urls
- -todb: Write to Database
- -template: Apply the template to make the output
- -subd: Subdivide into different files
- -maptiles: Tile the data on lat/lon
- -chunk: Make a number of output files with a max number of rows
- -addheader: Add the RAMADDA point properties
- -db: Generate the RAMADDA db xml from the header. See Help
- -dbprops: Print to stdout props for db generation
- -write: Write the contents of a row to a named file
- -fields: Print the fields
- -run:
- -progress: Show progress
- -debugrows: Debug # rows
- -pointheader: Generate the RAMADDA point properties
- -args: Generate the CSV file commands
- -typexml: Generate the RAMADDA type xml
- delimiter:
Delimiter - ,|^ etc. Use "tab" for tab
- text:
The text. Use '_nl_' to add a new line. Use '_bom_' to write out the byte order mark.
- columns:
Column indices. Can include ranges, e.g. 0-5
- color:
Color values:red,green,yellow,blue,purple,cyan
- latitude:
latitude column
- longitude:
longitude column
- columns:
property columns - use * for all
- db id:
- table:
table name
- columns:
database columns
- properties:
name value properties
- prefix:
- template:
Use ${column_name} or indices: ${0},${1}, etc for values
- row_delimiter:
Output between rows
- suffix:
name | value |
joe | 5 |
jill | 2 |
bill | 7 |
Calling:
PREFIX name:joe value:5 ROW DELIMITER name:jill value:2 ROW DELIMITER name:bill value:7 SUFFIXNote: the row delimiter is only printed between rows.
If you wanted to generated custom XML, for example, call:
<mytags> <tag name="joe" value="5"></tag> <tag name="jill" value="2"></tag> <tag name="bill" value="7"></tag> </mytags>
- columns:
columns to subdivide on
- ranges:
Comma separated ranges min1;max1;step1,min2;max2;step2
- output_template:
Output template - use ${ikey} or ${vkey}, e.g., grid${ikey}.csv
The format is
The min1;max1;number1 are the min/max range and the number of steps the range is divided by for each column value.
The output file is a template to use to create the destination output file. You can use "${ikey}" for the indices or ${vkey} for the values.
For example, you can use this to do geographic tiling of a set of input data
that contains latitude and longitude. If your input.csv file has latitude and longitude
columns the below command:
grid116_102.csv grid116_103.csv grid119_84.csv grid119_85.csv grid119_86.csv ...The grid${ikey}.csv is the output file name macro where ${ikey} is made up of the indices. Using grid${vkey}.csv would give the following file names based on the values of the ranges:
grid30_31_-92_-91.csv grid39_40_-76_-75.csv grid38_39_-76_-75.csv ...
- columns:
lat/lon columns to subdivide on
- degrees:
Degrees per tile. Defaults to 1
- output_template:
Output template - use ${ikey} or ${vkey}, e.g., tile${vkey}.csv. Defaults to a tile${vket}.csv
- output_template:
Output template - use ${number}, e.g., output${number}.csv. Defaults to a output${number}.csv
- number:
Number of rows in each file
- properties:
name1 value1 ... nameN valueN
Set default: default.type double
#fields=field1[prop1=value1 prop2=value2 ...],field2[],...,fieldN[propn=valuen ...] v1,v2,v3 ...The addheader command is run as below passing in one set of name/value arguments. If the value has spaces then group the text with {...} brackets.
For example, say you have a file like this:
some date | year | some field |
2018-09-01 | 2023 | 10 |
2019-10-01 | 2024 | 20 |
2020-09-01 | 2024 | 30 |
Running this command:
#fields=some_date[label="Some Date" type="date" format="yyyy-MM-dd" ] ,year[label="Year" type="date" format="yyyy" ] ,temperature[label="Some Field" type="integer" chartable="true" ] 2018-09-01,2023,10 2019-10-01,2024,20This specifies:
- The type of the field "date" as date with the date format following the Java date format as yyyy-MM-dd. Other types include:
- "enumeration", e.g. field.type enumeration
- "integer", e.g. field.type integer
- "double", e.g. field.type double
- The year field is also a date with format yyyy
- The canonical id of "some field" field is changed to "temperature"
- The label of "some field" is changed to "The Temperature"
- The unit of "some field" is set to be celsius. This can be used when displaying the data.
- group, e.g. field.group {some group} - this is used to group a set of fields and is primarily used for display, e.g., providing a header in a table
- default.format, e.g. default.format {yyyy-MM-dd} - this is to set the default date format that is applied to all fields
- {some pattern}.{some property} - instead of the field name as the prefix you can specify a regular expression pattern. For example if you had data like:
value1,value2,value3 10,20,30
You can specify the type as "value.*.type integer".Adding a grouping to the pattern can be used to pull out text from the field name and use it in the value. For example, below would set the label:
seesv -addheader "(value.*).*.label {Some label \$1}"
- properties:
Name value pairs:
table.id <new id> table.name <new name> table.cansearch false table.canlist false table.icon <icon> , e.g., /db/database.png<column>.id <new id for column> <column>.label <new label> <column>.type <string|enumeration|double|int|date|latlon> <column>.format <yyyy MM dd HH mm ss format for dates> <column>.canlist false <column>.cansearch false db.install <true|false> install the new db tabledb.droptable <true|false> careful! this deletes any prior created dbsdb.yesreallydroptable true - this double checks
The -db command takes the form:
-db "name value pairs" e.g.: -db " table.id <new id> table.name <new name> table.cansearch false table.canlist false table.icon <icon>, e.g., /db/database.png <column>.id <new id for column> <column>.label <new label> lt;column>.type <string|enumeration|double|int|date> <column>.format <yyyy MM dd HH mm ss format for dates> <column>.canlist false <column>.cansearch false install <true|false install the new db table> nukedb <true|false> careful! this deletes any prior created dbs yesreallynukethewholedb true - this double checks "In the interactive Convert Data SeeSV form if you press "Table" then the schema will be shown. Clicking on the column links brings up a popup menu that sets the id, label, type, etc for the column.
If you have the install flag defined as true then when you hit "Process" the db.xml file will be generated and the plugin will be loaded into RAMADDA. You can then go and add a new entry of the Database name from the "Pick a type..." menu
If you have nukedb and the yesreallynukethewholedb flags set to true then when you press Process the database table will first be dropped and then recreated. This capability is there because you can iteratively change the type of the columns. Note: any database entry and data that has been loaded will be removed.
- id pattern:
- suffix pattern:
- file name template:
- contents template:
- type_id:
Type ID, prefix with type_, no spaces, lowercase, e.g.type_point_mypointdata
- type_label:
Type Label - human readable label
- database_columns:
List of database columns, one per line
e.g:id,label,type,prop,value1,prop2,value1
Types can be:
string,int,double,enumeration,enumerationplus
list,latlon,latlonbox,url,date,datetime
If enumeration then define values with escaped commas - \,:fruit,Fruit,enumeration,values,banana\,apple\,orange
id,label,type,prop1,value1,prop2,value2,... e.g.: #A String value some_field,Some Field,string #An integer value some_number,int,Some Integer Column,default=30,showinhtml,false #A double value some_double,double,Some Double number,cansearch,false #An enumeration. Use \n to escape the commas fruit,Fruit,enumeration,values,banana\,apple\,orange #An enumeration that can also be added new values for fruit_plus,Fruit Plus,enumerationplus,values,banana\,apple\,orange #Use id:label in the values sport,Sport,enumeration,values,baseball:Baseball\,football:Football\,hockey:HockeyThe id will be the column identifier in the database so it should be lower case Press Outputs-Print text output to generate the mytypes.xml file. Download it and copy it to your RAMADDA Home/plugins directory and restart your RAMADDA. The types.xml has a default wiki text that you can change