MSSQL, SQLSRV, PHP and UTF-8
Recently I was trying to setup Microsoft SQL Server 2005 for PHP application with UTF-8 support. Looking like a trivial task I hoped to solve the issue within 1-2 hours. In fact, I spent 2 nights figuring out the solution... So here is everything about setting up MS SQL server 2005 and 2008 for PHP with UTF-8 support.
Chaos around PHP drivers for MS SQL
Part of the problems is caused by the availability of various drivers for the same server under different names distributed via different channels. Currently, following drivers are available for MS SQL server:
|Driver for Microsoft SQL server||Supplied with
|Last update||Supports UTF-8||Download link|
|DBLIB patched by MOODLE (also called FreeTDS)||none||2009||no||http://docs.moodle.org|
What's the right PHP driver for Microsoft SQL Server?
You should only use the driver maintained by Microsoft - SQLSRV or PDO_SQLSRV. This is the only driver capable of storing UTF-8 strings into two-byte columns (NVARCHAR instead of VARCHAR) which is currently maintained and supported by Microsoft for PHP 5.2.X and 5.3.X.
Unfortunatelly, this driver is not supplied with PHP package downloaded from PHP.NET site. You only can get it from Microsoft's site (or PECL site), which makes it harder for community to find and actually causes confuson around the driver. Historically, all PHP related stuff has always been available from PHP sites.
On the other hand it might be understandable, that maintaining all driver versions compiled for 5.2 and 5.3, non-PDO and PDO, VC6 and VC9, apache and IIS just takes quite a lot effort which only a big vendor can dedicate over time efficiently.
Nevertheless, I believe, it would be at least worth to include file like "mssql.readme" within PHP distributions and thus make things clearer to the community.
Fig. 1: Loaded SQLSRV driver for MS SQL Server
SQLSRV DSN connect string
SQLSRV introduces some new features for DSN connect string. Following is valid DSN connect string example:
The MultipleActiveResultSets is server specific setting, that turns on/off processing SQL queries in a batch. I was not able to run standard transactions with MultipleActiveResultSets turned on, however it is possible that I missed some configuration. With
MultipleActiveResultSets=false I was able to run all scripts written originally for old standard driver PDO_MSSQL and MSSQL, this time with correctly stored UTF-8 strings.
There are more newly introduced and not that well known DSN connect parameters, like Encrypt, MultipleActiveResultSets (MARS), TransactionIsolation, TrustServerCertificate. You may want to inspect also MS SQL Server specific PDO attributes, like PDO::SQLSRV_ENCODING_UTF8 or PDO::SQLSRV_ATTR_DIRECT_QUERY.
Storing UTF-8 strings in MS SQL Server
Do the following to store UTF-8 strings via PHP in MS SQL Server database:
- Download Microsoft Drivers for PHP for SQL Server »
- Unpack files into temporary directory. You will have bunch of SQLSRV driver for various PHP versions, including complete API documentation and manual »
- Identify correct driver file. You should know whether you need thread safe (ts) or not (nts), compiled under VC6 or VC9, PHP target version and PDO or not PDO, e.g. "php_pdo_sqlsrv_52_ts_vc6.dll".
- Copy your driver file into extensions directory and add into php.ini line e.g.
- Restart web server and check
phpinfo()- SQLSRV or PDO_SQLSRV driver(s) should be loaded.
- Define database scheme with multibyte column types (NVARCHAR instead of VARCHAR, see MSSQL data types)
That's it. By default SQLDRV drivers handles all strings like UTF-8 and does internal conversions from and into native UCS-2 (Unicode) encoding.
Case sensitivity and accent sensitivity concerns
Stored UTF-8 strings are truly converted into native UCS-2 encoding by SQLSRV driver when writing into SQL server. This ensures correct handling case sensitivity and accent sensitive searches. Unlike other drivers you don't need to do any encoding workarounds.
Case sensitivity and accent sensitivity are set when you create new table. Setting correct case and accent sensitivity may significantly influence search results.
- Case sensitivity (CS = Case Sensitive, CI = Case Insensitive)
- If you want to disable duplicate records for "Computer" and "computer", set collation to CI. As for Slovak, you must choose Slovak_CI collation.
- Accent sensitivity (AS = Accent Sensitive, AI = Accent Insensitive)
- If you want to disable duplicate records for "Peter Novák" and "Peter Novak", set collation to AI. As for Slovak, you must choose Slovak_AI collation.
- Combinations AI, CI
- You can combine the case / accent sensitivity as you wish. Very common collation is AI, CI to prevent from duplicate records (e.g. Slovak_CI_AI)
Fig. 2: Correctly stored accented UTF8 characters in MS SQL database
Currently, the only maintained driver for Microsoft SQL Server (2005, 2008, Azure) is named SQLSRV and comes in two basic flavours SQLSRV and PDO_SQLSRV. These have some server specific features and have built-in support for storing UTF-8 strings into MSSQL big endian native encoding. If you are writing cross-database portable applications, you should only use PDO_SQLSRV. You can download drivers from the Microsoft site and read the documentation at the PHP.NET site.
- SQLSTATE[IMSSP]: This extension requires either the Microsoft SQL Server 2008 Native Client (SP1 or later) or the Microsoft SQL Server 2008 R2 Native Client ODBC Driver to communicate with SQL Server. Neither of those ODBC Drivers are currently installed. Access the following URL to download the Microsoft SQL Server 2008 R2 Native Client ODBC driver for x86: http://go.microsoft.com/fwlink/?LinkId=163712
- On Windows, the SQLSRV driver requires to have Native Client installed. You can download it from Microsoft site »