Introduction to retrieving data from your database - Perl DBI/DBD::ODBC Tutorial

Introduction
Pre-requisites
Assumptions
Simple methods of retrieving data
- Specifying what you want
- Simple prepare/execute
- Obtaining metadata from the result-set
- Using parameters
- Using bound columns
Hash and Reference methods of retrieving data
Special cases
- SQL_CHAR types and trialing spaces
- Long columns
- Handling NULL data
  - NULL in SQL
  - NULL in Perl

Introduction

Pre-requisites

Before you start part 2 of this tutorial you need to ensure you have satisfy all the pre-requisites:

Perl We used Perl 5.8 but you only need the minimum required by the DBI and DBD::ODBC modules which is currently 5.6. Use perl --version to see what version of Perl you have installed.
DBI module We used DBI 1.45 but this tutorial should work with anything after 1.40. To see if you have a recent enough version of DBI installed run:
```
perl -e 'use DBI 1.40;'
```
If you get an error like "DBI version 1.40 required--this is only version 1.30 at -e line 1." you need to upgrade DBI.
If you get an error saying DBI cannot be found in @INC you've probably not got DBI installed.
Go to CPAN to get an up to date version of the DBI module.
DBD::ODBC We used DBD::ODBC 1.11. You can use similar methods as above to determine if DBD::ODBC is installed and to see what version you have:
To check you have the DBD::ODBC module installed:
```
perl -e 'use DBD::ODBC;'
```
If you have not got DBD::ODBC installed you should see Enabling ODBC support in Perl with Perl DBI and DBD::ODBC for instructions.
To show the DBD::ODBC version:
```
perl -MDBD::ODBC -e 'print $DBD::ODBC::VERSION;'
```
To show all drivers DBI knows about and their versions:
```
perl -MDBI -e 'DBI->installed_versions;'
```
Go to CPAN to get an up to date version of the DBD::ODBC module.
ODBC driver and driver manager Unsurprisingly you will need an ODBC driver for most of this tutorial and we recommend you use an ODBC driver manager under DBD::ODBC

Assumptions

Previous tutorials in this series
This tutorial assumes you have read or understand all the concepts in the previous tutorial DBD::ODBC Tutorial Part 1 - Drivers, Data Sources and Connection.
Operating System
This tutorial was designed on UNIX and we have assumed you are using UNIX too. However, all the Perl examples should work equally well on Windows so long as minor alterations for the command line are made.
ODBC Driver Manager
We have assumed you are using the unixODBC driver manager. All discussion in this document relating to the location and definition of ODBC data sources is for unixODBC.

DBI and ODBC drivers

DBI, DBD::ODBC Architecture

Perl script using DBI methods
            |
            v
           API
            |
            v
           DBI
            |
            v
         DBD::ODBC
            |
            v
    ODBC Driver Manager (e.g. unixODBC)
            |
            v
        ODBC Driver

Simple methods of retrieving data

Specifying what you want

You specify the data you want to retrieve from the database using SQL select statements. This is not a SQL tutorial and we have attempted to keep the SQL simple. The basic form of a SQL select statement we use here is:

select <column_list> from <table>

where:

<column_list> is a comma separated list of columns, or * (for all columns). In fact it can be many more things; consult a SQL reference manual.
<table> is a table or view in the database.

You can qualify the rows you want back by adding a where clause like:

where <column> = 'text'

Your SQL select statement is passed to the DBI prepare method which in ODBC terms ends up in a call to the SQLPrepare or SQLExecDirect ODBC APIs.
Be careful when creating SQL from Perl variables as you can include characters which invalidate the SQL. e.g.

my $name = "can't";
my $sql = qq/select * from table where column = '$name'/;

$sql is now invalid because it is:

select * from table where column = 'can't'

Most databases would require the second ' to be doubled up. You can use the DBI quote method to do this for you with:

my $name = $dbh->quote("can't");
my $sql = qq/select * from table where column = $name/;

The other way (recommended) is to use bound parameters

Simple prepare/execute

The most simple way of obtaining all the rows in a table and displaying them is like this:

my $dbh = DBI->connect();               # connect
my $sql = qq/select * from table/;      # the query to execute
my $sth = $dbh->prepare($sql);          # prepare the query
$sth->execute();                        # execute the query
my @row;
while (@row = $sth->fetchrow_array) {  # retrieve one row
    print join(", ", @row), "\n";
}

Here we prepare the SQL, execute it, then use the fetchrow_array method to return an array of column values for each row. Repeatedly calling the fetchrow_array method until fetchrow_array returns undef, returns all the rows in the result-set.
NOTE that NULL fields are returned as undef in the array returned by fetchrow_array (more about NULL values later).
NOTE. Don't use fetchrow_array in a scalar context unless you really know what you are doing as it can return the first or last column (dependent on the driver) which could be a NULL value and hence undef. Also fetchrow_array can return undef if there are no more rows or if an error occurs.
There are other methods or obtaining the result-set - see later.

Obtaining metadata from the result-set

ODBC can describe the result-set for you listing the number of columns and the names of the columns. You can obtain this information with the statement attributes NUM_OF_FIELDS, NAME, NAME_uc, NAME_lc, NAME_hash, NAME_lc_hash and NAME_uc_HASH.
Assuming you have the table 'fred' with columns 'a' (integer), 'b' (char(10)) and 'c' (float):

my $sth = $dbh->prepare(select * from fred);

$sth->{NUM_OF_FIELDS} returns 3.
$sth->{NAME}->[0] returns 'a' although it could return 'A'
$sth->{NAME_uc}->[0] returns 'A'.
$sth->{NAME_lc}->[0] returns 'a'.

NOTE the NAME attribute can return lowercase or uppercase column names depending on the database. Some database will always uppercase column names in create statements if they are unquoted and some databases retain the case of column names if they are enclosed in the identifier quote character. TIP: For this reason it is best to use NAME_uc or NAME_lc.
NOTE. Some drivers may be unable to return a column name at all e.g. if the column was an expression like 1 or a function. e.g. "select 1 from table" in MS SQL Server returns an empty string for $sth-{NAME}->[0]. TIP: You can get around this using column aliases as in "select 1 as col1 from table".
NAME_hash, NAME_lc_hash and NAME_uc_hash are like NAME, NAME_lc and NAME_uc except the result is a hash instead of an array with the keys being the column names and the values being the column index starting at 0.
Similarly, the TYPE attribute returns an array reference of column types. For the 'fred' table above:

print join(", ", @{$sth->{TYPE}}), "\n";

returns 4, 1, 6. The column types are defined by international standards (see the DBI manual).

Using parameters

The main reasons for using parameters are:

You can prepare the SQL once then execute many times with different parameters thus saving the prepare parsing.
With bound parameters you don't need to bother about quoting issues.

Expanding the simple example in Simple prepare/execute to include a where statement we have:

my $dbh = DBI->connect();              # connect
my $var = $dbh->quote("value to search for");
my $sql = qq/select * from table where column = $var/;
                                        # the query to execute
my $sth = $dbh->prepare($sql);         # prepare the query
$sth->execute();                       # execute the query
my @row;
while (@row = $sth->fetchrow_array) { # retrieve one row
    print join(", ", @row), "\n";
}

This is fine but what if we want to execute the query multiple times with different values for $var. We can repeat the $sql assignement with different $var and re-run $dbh->prepare but this is inefficient because it causes the ODBC driver and database to re-parse the SQL each time and is unnecessary.
A better solution is the following:

my @row;                                # row data
my $dbh = DBI->connect();              # connect
my $var = "value to search for";
my $sql = qq/select * from table where column = ?/;
                                        # the query to execute with parameter
my $sth = $dbh->prepare($sql);         # prepare the query
$sth->execute(($var));                 # execute the query with parameter
while (@row = $sth->fetchrow_array) { # retrieve one row
    print join(", ", @row), "\n";
}
$var = "another value to search for";
$sth->execute(($var));                 # execute the query (no need to re-prepare)
while (@row = $sth->fetchrow_array) { # retrieve one row
    print join(", ", @row), "\n";
}

Here the SQL contains a parameter marker (the '?') indicating to the driver that we will provide this later. The SQL is passed to the prepare method where the database will parse the SQL and note a parameter is required. When the execute method is called we pass the parameters required. You can use multiple parameter markers e.g.

$sql = q/select * from table where col1 = ? and col2 = ?/;
.
.
$sth->execute(($param1, $param2));

Note You must pass the array of parameters in the order which you want them to match the parameter markers in the SQL. i.e. in the above example $param1 will substitute the first parameter marker in "col1 = ?" and $param2 will substitute the parameter marker in "col2 = ?".
Note You cannot use parameter markers in place of column names e.g. select ? from table or select * from table where ? = 'A'.
When passing the parameters to the execute method a default binding type is used. DBD::ODBC attempts to find out what the parameter markers represent and bind the parameters as the same type. If your ODBC driver does not have the ODBC API SQLDescribeParam then a string binding is used (SQL_VARCHAR). You can also bind the parameters yourself before the execute call and in this way you can specify how the parameters should be bound:

# import SQL types for use in bind_param method:
use DBI qw(:sql_types);

$sql = q/select * from table where col1 = ?/;
$sth = $dbh->prepare($sql);

# uses default bind type:
$sth->bind_param(1, $param1);
# Specify the parameter should be bound as an SQL_INTEGER
$sth->bind_param(1, $param1, {TYPE => SQL_INTEGER});

Note Once the parameters are bound you cannot change the type they are bound as although you can obviously change the bound values.

Using bound columns

In previous examples in this tutorial we have used the fetchrow_array method to retrieve the result-set. DBI also provides a method to bind the returned column data to perl variables using the bind_col and bind_columns methods:
Going back to the first example we had:

my $dbh = DBI->connect();             # connect
my $sql = qq/select * from table/;     # the query to execute
my $sth = $dbh->prepare($sql);         # prepare the query
$sth->execute();                       # execute the query
my @row;
while (@row = $sth->fetchrow_array) { # retrieve one row
    print join(", ", @row), "\n";
}

which can be rewritten to use bound columns as follows:

my $dbh = DBI->connect();             # connect
my $sql = qq/select * from table/;    # the query to execute
my $sth = $dbh->prepare($sql);        # prepare the query
$sth->execute();                      # execute the query
my ($col1, $col2, $col3);
# Note columns start at 1 (not 0).
$sth->bind_col(1, \$col1);
$sth->bind_col(2, \$col2);
$sth->bind_col(3, \$col3);
while ($sth->fetch) { # retrieve one row
    print "$col1, $col2, $col3\n";
}

Binding columns and using the fetch method is usually faster than using methods like fetchrow_array. As with bind_param you can specify the type the column is bound as.

$sth->bind_col(1, \$col1, {TYPE => SQL_INTEGER});
$sth->bind_col(2, \$col2, {TYPE => SQL_CHAR});$sth->bind_col(3, \$col3, {TYPE => SQL_FLOAT});

In ODBC there is usually no need to specify the type.
Instead of separate calls to bind_col you can bind all the columns in the above example in one go with:

$sth->bind_columns(\$col1, \$col2, \$col3);

$sth->bind_columns(\($col1, $col2, $col3));

Hash and Reference methods of retrieving data

DBI supports a number of methods returning the result-set (or part of it) in hash or array references. All the following examples assume you have already prepared and executed a select on a table containing two rows of two columns named a and b which look like:

column a | Column b
---------+---------
       1 |      one
       2 |      two

fetchrow_arrayref

my $ref;
while($ref = $sth->fetchrow_arrayref) {
    print join (", ", @{$ref}), "\n";
}

1, one  
2, two

fetchrow_hashref

my $ref;
while($ref = $sth->fetchrow_hashref) {
        print join (", ", keys %$ref), "\n";
        print join (", ", values %$ref), "\n";
}

a, b
1, one
a, b
2, two

fetchall_arrayref

#
# You must set RaiseError or check $sth->err because
# fetchall_arrayref returns the data fetched so far.
#
$sth->{RaiseError} = 1;
my $ref;
$ref = $sth->fetchall_arrayref;
print "Number of rows returned is ", 0 + @{$ref}, "\n";
foreach $r (@{$ref})
{
    print join(", ", @{$r}), "\n";
}

1, one
2, two

You can add a slice to fetchall_arrayref to specify which columns to return:

To return all columns as above:

$ref = $sth->fetchall_arrayref([]);

1, one
2, two

To return only the first column

$ref = $sth->fetchall_arrayref([0]);

1
2

To return only the last column

$ref = $sth->fetchall_arrayref([-1]);

one
two

To return each row as a hashref

$ref = $sth->fetchall_arrayref({});
print "Number of rows returned is ", 0 + @{$ref}, "\n";
print join (", ", keys %{$ref->[0]}), "\n";
foreach $r (@{$ref})
{
    print join(", ", (values %$r)), "\n";
}
print "Number of rows returned is ", 0 + @{$ref}, "\n";
print join (", ", keys %{$ref->[0]}), "\n";
foreach $r (@{$ref})
{
    print join(", ", (values %$r)), "\n";
}

Number of rows returned is 2
a
1
2

Note When specifing the names of the columns to return in this way the case of keys for the returned hashes always matches the case used in the parameter to fetchall_hashref regardless of what FetchKeyHashName is set to.

You can add a max_rows argument to fetchall_arrayref to restrict the number of rows:

$ref = $sth->fetchall_arrayref([], 1);

1, one

This method is particularly useful if your database engine does not support the "top n" SQL syntax or if you want the increased performance of fetchall_arrayref, want to keep memory down but also need to process all the rows because fetchall_arrayref can be called repeatedly on the same result-set to get rows in chunks.
The rather elegant way of doing this in the DBI documentation is:

$sth->{RaiseError} = 1;
my $rows = []; # cache for batches of rows
while(my $row = (shift(@$rows) || # get row from cache, or reload cache:
                 shift(@{$rows=$sth->fetchall_arrayref(undef,1)||[]}))) {
    print join(", ", @{$row}), "\n";
}

1, one       
2, two

Special cases

SQL_CHAR types and trialing spaces

Databases store char(n) columns as exactly n characters so if you have a char(10) column which you insert 'FRED' into when you retrieve it you will get 'FRED '. This often leads to confusion, especially in tests like $var eq 'FRED'. e.g. With the following table definition and inserts:

create table tut2_8 (a int, b char(10))
insert into tut2_8 values (1, 'one')
insert into tut2_8 values (2, 'two')

the following code never prints out "Found 'two'".

my ($col1, $col2);
$sth->bind_columns(\($col1, $col2));
while ($sth->fetch) { # retrieve one row
    print "$col1, $col2\n";
    print "Found 'two'" if ($col2 eq 'two');
}

The ChopBlanks attribute is provided to help you here. ChopBlanks may be set on the connection or statement handle and is inherited from connections. The default is for ChopBlanks to be false. If you insert "$sth->{ChopBlanks} = 1;" before the call to execute method call then the above test now works.
Note ChopBlanks only works on fixed-length CHAR columns.

Long columns

Suppose you have the table created with the following code:

my $dbh = DBI->connect();             # connect
$dbh->do(q/create table tut2_9 (a int, b text)/);
my $sth = $dbh->prepare(q/insert into tut2_9 values (1, ?)/);
my $p = 'x' x 500;
$sth->execute($p);

The 'text' column type here is a MS SQL Server long data type. Other databases have memo, blob etc.
The following retrieval code:

$sth = $dbh->prepare(q/select * from tut2_9/);
$sth->execute;
my ($col1, $col2);
$sth->bind_columns(\($col1, $col2));
while ($sth->fetch) { # retrieve one row
    print "$col1, $col2\n";
}

fails with

DBD::ODBC::st fetch failed: [unixODBC][][Microsoft]
[ODBC SQL Server Driver]
String data, right truncation (SQL-01004)
(DBD: st_fetch/SQLFetch 
(long truncated DBI attribute LongTruncOk not set 
and/or LongReadLen too small) err=-1) at ./tut2_9.pl line 19.

The attributes LongReadLen and LongTruncOk control how long columns are retrieved. By default, DBD::ODBC has LongReadLen set to 80 and LongTruncOk is false. You can find the current settings with:

print "LongReadLen is '", $h->{LongReadLen}, "'\n";
print "LongTruncOk is ", $h->{LongTruncOk}, "\n";

When LongTruncOk is false (as above), if you attempt to retrieve more than LongReadLen bytes from a column you will get an error like above. If you are not bothered about the column data being truncated then set LongTrunkOk to 1. If you need more than 80 bytes from the column then you need to set LongReadLen. If you want long columns ignored (i.e. no data fetched from them but undef returned instead) then you need to set LongReadLen to 0 and LongTruncOk to true.

Handling NULL data

NULL in SQL

Columns which have the NULL value are special. NULL is not the same as an empty string. You need to be careful using comparisons on columns which are nullable. For instance:
Given the table:
create table fred (a int, b char(1) null)

Col1 | Col2
-----+-----
   1 |    Y
   2 |    N
   3 | NULL

select * from fred where col2 = 'Y' returns:

col1 | col2
-----+-----
   1 |    Y

and select * from fred where col2 <> 'Y' returns:

col1 | col2
-----+-----
   1 |    N

Some people expect to see the result of the second query including row 3 with the NULL col2 because they think "col2 <> 'Y'" includes NULL; it doesn't. In relational operations, unlike normal binary, there are THREE states that a operation can result in, TRUE, FALSE, and UNKNOWN. The last state, is the one that needs a closer look.
The UNKNOWN state occurs for any logical operation where the result cannot be evaluated as either TRUE or FALSE, and is connected with the NULL concept.
A NULL in a SQL table, can be regarded as a placeholder for missing information, its not a VALUE as such, just a marker to indicate the lack of information. So in the query for all rows where col2 is not equal to 'Y' the rows containing a NULL are not returned, as the answer to the question "Does the col2 column contain values not equal to Y" returns false, as the column contains the unknown NULL and not a value that could or could not be equal to 'Y'.
The normal way to select NULL columns in SQL syntax is using "column is null" (the opposite being "column is not null"). So to find all the rows in table Fred where col2 has a NULL value you use:
select * from fred where col2 is null

NULL in Perl

In perl DBI, NULL column values are represented by undef (the undefined value). As a result, the following Perl run against the table above:

$sth = $dbh->prepare(q/select * from tut2_19/);
$sth->execute;

my @row;
while(@row = $sth->fetchrow_array) {
        print join(", ", @row), "\n";
}

1, Y
2, N
3,

and if you have warnings enabled (and you should) then you will see "Use of uninitialised value in join or string at XXX.pl line NNN". You can use either of the following examples to avoid this:

# print out row
print DBI::neat_list(\@row), "\n";

or

# change all NULLs to empty strings
foreach (@row) {$_ = '' unless defined};

If you are using a parameterised query you may expect to do:

$sql = q/select * from table where a = ?/;
$stmt = $dbh->prepare($sql);
$stmt->execute(undef);  # pass NULL as parameter

but with some databases this does not work. In ODBC terms, the above code results in a call to the ODBC API SQLBindParameter with a data pointer of NULL and a indicator of SQL_NULL_DATA. Some ODBC drivers and databases will not select all the rows where the column is NULL when this method is used. As shown above a standard way of selecting a column which is NULL is "where column is null" but you cannot parameterise this. This has been the subject of much discussion and one method put forward is:

my $test = undef;
$sql = q/select * from table where (col = ? or (col is null and ? = 1))/;
$stmt->execute($test,defined($test)?0:1)

Laddu Jo Sono Putra

Tuesday, September 2, 2014

Data Retrieval From Database