Archive

Posts Tagged ‘SQL’

Optimised way how to get country by IP

January 19th, 2009

Some time ago I was developing project what reports all action of user on web site. I hit big number of problems, mainly performance problem. Biggest site had around 40,000 unique users each day. And one requirement is that it must recognize country by IP of the user. I get list of countries from MaxMind what I used for previous projects. I was able to insert the CSV file by BULK LOAD.


CREATE TABLE [dbo].[GeoIPCountry_Import] (
  [startIp] [varchar](200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
  [endIp] [varchar](200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
  [startIpNum] [varchar](200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
  [endIpNum] [varchar](200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
  [countryCode] [char](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
  [country] [nvarchar](500) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
);

CREATE PROCEDURE [dbo].[UpdateGeoCountryList]
  @file nvarchar(400)
AS
BEGIN
  SET NOCOUNT ON;

  TRUNCATE TABLE [dbo].[GeoIPCountry_Import]
  EXEC('BULK INSERT [dbo].[GeoIPCountry_Import]
  FROM N''' + @file + ''' WITH
  (
    FIELDTERMINATOR ='','',
    ROWTERMINATOR =''\n'',
    CODEPAGE = ''OEM'',
    KEEPNULLS
  )')
  DECLARE @count int
  SELECT @count = COUNT(*) FROM [dbo].[GeoIPCountry_Import]
  IF(@count < 90000)
  BEGIN
    RAISERROR('File is too small or in wrong format. It must contain at least 90k records, fieldterminator = '','', rowterminator = ''\\n''.',15,1)
  END

  UPDATE [dbo].[GeoIPCountry_Import]
  SET
    startIp = [staging].[Trim](REPLACE(startIp, '"', '')),
    endIp = [staging].[Trim](REPLACE(endIp, '"', '')),
    startIpNum = [staging].[Trim](REPLACE(startIpNum, '"', '')),
    endIpNum = [staging].[Trim](REPLACE(endIpNum, '"', '')),
    countryCode = [staging].[Trim](REPLACE(countryCode, '"', '')),
    country = [staging].[Trim](REPLACE(country, '"', ''))

  SELECT @count = COUNT(*) FROM [dbo].[GeoIPCountry_Import]
  WHERE startIpNum IS NULL OR startIpNum = '' OR endIpNum IS NULL OR endIpNum = ''
  IF(@count > 0)
  BEGIN
    RAISERROR('Some IP''s cannot be converted into ip number.',15,1)
  END
END

This is the first part and it’s possible to use this format to getting country representation for ip. What is realy faster than to have ip range like ‘from 45.10.8.50 to 45.10.8.138′ is to have them like integer. We will create scalar function which will convert ip into integer.


CREATE FUNCTION [staging].[IPtoInt]  (@strIP varchar(15))
RETURNS bigint
AS
BEGIN
DECLARE @intIPNum bigint
IF (LEN(@strIP)-LEN(REPLACE(@strIP,'.','')))/LEN('.') = 3 -- if there 3 periods
SET @intIPNum = (16777216 * CAST(PARSENAME(@strIP,4) as bigint)  + 65536 * PARSENAME(@strIP,3) + 256 * PARSENAME(@strIP,2) + PARSENAME(@strIP,1))
RETURN @intIPNum
END

When we have this function we can create better performing table for storing data about countries and IPs. It’s possible to use city list from MaxMind. We will prepare the import procedure to be able import new database at anytime, when newer version is released.


CREATE TABLE [dbo].[GeoIPCountry](
  [startIp] [varchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
  [endIp] [varchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
  [startIpNum] [bigint] NOT NULL,
  [endIpNum] [bigint] NOT NULL,
  [countryCode] [char](4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
  [country] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
  CONSTRAINT [PK_GeoIPCountry] PRIMARY KEY CLUSTERED
  (
    [startIpNum] ASC,
    [endIpNum] ASC
  )
)

And add few lines at the end of [GeoIPCountry_Import] procedure.


TRUNCATE TABLE dbo.GeoIPCountry
INSERT INTO dbo.GeoIPCountry (startIp, endIp, startIpNum, endIpNum, countryCode, country)
SELECT
  startIp, endIp,
  staging.IPtoInt(startIp), staging.IPtoInt(endIp),
  countryCode, country
FROM [dbo].[GeoIPCountry_Import]

We prepared everything what is needed for this. I even created table with most frequent IPs which help a lot, because the site had 90% of visitors from one country. This table contains IPs for this country an make selecting much more faster with less records in table. This solution was more than 10 times faster than table with IPs. And if you need to parse 100,000 records each day, it will help a lot.


SELECT
  IP, CASE WHEN GeoIPCountry.Country IS NOT NULL THEN GeoIPCountry.Country ELSE 'NA' END AS Country
FROM
  WebLog WITH (NOLOCK)
LEFT JOIN staging.GeoIPCountry WITH (NOLOCK) ON
  GeoIPCountry.startIPNum <= staging.IPtoInt(WebLog .IPNum)
  AND GeoIPCountry.endIPNum >= staging.IPtoInt(WebLog .IPNum)

I’m not really sure why, but there is faster to use GeoIPCountry.startIPNum <= IP AND GeoIPCountry.endIPNum >= IP then to use BETWEEN. If anyone will be interested in the cached solution or wants codes for all part I can provide them.

Database , , , ,

Execute dynamic SQL in MSSQL

December 18th, 2008

I hit this need when I was developing own custom module into Commerce Server. Commerce Server have all tables separated into catalogs. Catalog is group of tables which contains records for particular client (site), but have same structure like all other catalogs. Lets say that we have two clients ‘Client1′, ‘Client2′. Each client has tables for products ‘Client1_Products’, ‘Client2_Products’. Now we can create two procedures for retrieving data from both tables or better we can create one procedure, which will have one input parameter - Catalog name.

We can create the procedure:


CREATE PROCEDURE [dbo].[GetProducts]
(
  @CatalogName nvarchar(85),
  @Language nvarchar(85) = 'en-US',
  @ProductID nvarchar(255)
)
AS
BEGIN
  SET NOCOUNT ON;

  DECLARE @sql nvarchar(4000)
  SET @sql = 'SELECT
      *
    FROM [dbo].[' + @CatalogName + '_Products] AS [Products]
    INNER JOIN [dbo].[' + @CatalogName + '_Products_' + @Language + '] AS [ProductsLang] ON
      [Products].[ProductID] = [ProductsLang].[ProductID]
    IF(@ProductID IS NOT NULL)
      SET @sql = @sql + ' WHERE [Products].[ProductID] = @ProductID '

  EXEC sp_executesql @sql,
    N'@ProductID nvarchar(85)',
    @ProductID
END

Now we are able to call this procedure for any client and as bonus get language related data for products. EXEC [dbo].[GetProducts] ‘Client1′, ‘en-US’; will return all products for ‘Client1′.

We can specify product ID if we want to receive only one product. Product ID condition will be added to where.


IF(@ProductID IS NOT NULL)
  SET @sql = @sql + ' WHERE [Products].[ProductID] = @ProductID '

At the end of stored procedure we will execute generated SQL.


EXEC sp_executesql @sql,
  N'@ProductID nvarchar(85)',
  @ProductID

First parameter of sp_executesql is a Unicode string that contains a Transact-SQL statement or batch. More complex Unicode expressions, such as concatenating two strings with the + operator, are not allowed at this point. On 64-bit servers, the size of the string is limited to 2 GB, the maximum size of nvarchar(max).

Second parameter is list of strings that contains the definitions of all parameters that have been embedded in first parameter. Every parameter must be defined in @params.

Third parameter are values for the parameters that is defined in the second parameter.

OUT or OUTPUT indicates that the parameter is an output parameter. By this, you can get last inserted sequence number for example.

Database , , , ,

Storing Database Connection String in Web.Config

October 9th, 2008

Today I found, that storing connection strings were little bit changed between .NET 2.0 and .NET 3.5.

Here is an example of how to store a database connection string in the application’s Web.Config file for .NET 2.0. The code for accessing the connection string is located in the C# code below.

.NET 2.0


<appSettings>
<add key="ConnectionString" value="server=localhost;database=TestDB;uid=sa;password=secret;" />
</appSettings>

And here is the C# code how to get it.


using System.Configuration;

public class TestGetConnection
{
  public TestGetConnection()
  {
    try
    {
      // Get connection string from Web.Config
      string strConnection = ConfigurationSettings.AppSettings("ConnectionString");
    }
  }
}

.NET 3.5

ConfigurationSettings.AppSettings is now deprecated.


<connectionStrings>
  <add name="YourConnectionString" connectionString="server=localhost;database=TestDB;uid=sa;password=secret;" providerName="System.Data.SqlClient" />
  <add name="NewString" connectionString="server=localhost;database=TestDB;uid=sa;password=secret;" providerName="System.Data.SqlClient" />
</connectionStrings>

And here is the C# code how to get it.


using System.Configuration;

public class TestGetConnection
{
  public TestGetConnection()
  {
    try
    {
      // Get connection string from Web.Config
      string strConnection = ConfigurationManager.ConnectionStrings["YourConnectionString"].ConnectionString;
    }
  }
}

You can use <remove /> to delete connection setting from web.config.


<connectionStrings>
  <remove name="YourConnectionString"/>
  <add name="YourConnectionString" connectionString="server=localhost;database=TestDB;uid=sa;password=secret;" providerName="System.Data.SqlClient" />
</connectionStrings>

Connection String Syntax Notes

  • Values may be delimited by single or double quotes, (for example, name=’value’ or name=”value”). Either single or double quotes may be used within a connection string by using the other delimiter, for example, name=”value’s” or name=’value”s’,but not name=’value’s’ or name=”"value”". The value type is irrelevant.
  • All blank characters, except those placed within a value or within quotes, are ignored.
  • Keyword value pairs must be separated by a semicolon ( ; ). If a semicolon is part of a value, it also must be delimited by quotes.
  • Names are not case sensitive. If a given name occurs more than once in the connection string, the value associated with the last occurence is used.
  • No escape sequences are supported.

If you need to connect to SQL Express, you shold use connection string with specifying the file you want to attach.


connectionString="Server=.\SQLExpress;AttachDbFilename=c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\SpamTestimonial.mdf;Database=SpamTestimonial;Trusted_Connection=Yes;"

If you need to find all possible connection string, go to web page dedicated to them for more databases than you can image. http://www.connectionstrings.com/

.NET , , , ,

Tree traversal data structure - stored procedures

October 8th, 2008

In the first part of this article I wrote about tree traversal database structure. Now I want to show you how to prepare all stored procedures and triggers to be able to work with it effectively. From the first part we have already database structure and we filled the table with data. Now we need to have these data always with correct informations.

What can happen

There are many scenarios what can happen and you need to recreate lft and rtg in the tree. For inserting new category, deleting old one and move category under another parent, we can use one procedure:


ALTER PROCEDURE dbo.RecreateCategoryTree
  @id INT = NULL,
  @cnt INT = 0
AS
BEGIN
  SET NOCOUNT ON;
  DECLARE @actId INT,
    @actLft INT,
    @actRtg INT;
  DECLARE categories_cursor CURSOR LOCAL
  FOR
    SELECT ID, lft, rtg
    FROM dbo.Categories
    WHERE (ParentID = @id) OR (@id IS NULL AND ParentID IS NULL);
  OPEN categories_cursor
  FETCH NEXT FROM categories_cursor
  INTO @actId, @actLft, @actRtg

  WHILE @@FETCH_STATUS = 0
  BEGIN
    SET @cnt = @cnt + 1;

    UPDATE dbo.Categories SET lft = @cnt WHERE ID = @actID;

    EXEC @cnt = [dbo].[RecreateCategoryTree] @actId, @cnt;
    SET @cnt = @cnt + 1;

    UPDATE dbo.Categories SET rtg = @cnt WHERE ID = @actID;

    FETCH NEXT FROM categories_cursor
    INTO @actId, @actLft, @actRtg
  END
  CLOSE categories_cursor
  DEALLOCATE categories_cursor

  RETURN @cnt;
END
GO

When we call this stored procedure it will recreate all lft and rtg information in Categories table. Second step will be to create trigger what will be fired after insert, update or delete.


CREATE TRIGGER dbo.RecreateCategories
ON  dbo.Categories
AFTER INSERT,DELETE,UPDATE
AS
BEGIN
  SET NOCOUNT ON;
  EXEC [dbo].[RecreateCategoryTree]
END
GO

This is realy the easiest way. It will not be fast with many records. For table with a few hundred record and occasional update it’s enought. Now we can try if everything is working as we need. Try to insert into category:


INSERT INTO dbo.Categories (CategoryName, ParentID) VALUES ('Celeron', 2);

C#, Database , , , , ,

Tree traversal data structure

September 29th, 2008

Tree traversal is a systematic way of ordering the nodes of a tree. When you are going throw three traversal structure, you will visit each node exactly once.

I use this method in work a few years ago and using it till now. Imagine that you have category table (i’ll be using MSSQL syntax for this post):


CREATE TABLE Categories
(
  ID int IDENTITY(1,1) NOT NULL,
  CategoryName NVARCHAR(100) NOT NULL,
  ParentID INT NULL,
  CONSTRAINT pk_CategoryID PRIMARY KEY (ID)
)

INSERT INTO Categories (CategoryName, ParentID) VALUES ('Computers', NULL); -- ID: 1
INSERT INTO Categories (CategoryName, ParentID) VALUES ('PC', 1); -- ID: 2
INSERT INTO Categories (CategoryName, ParentID) VALUES ('AMD', 2); -- ID: 3
INSERT INTO Categories (CategoryName, ParentID) VALUES ('Pentium', 2); -- ID: 4
INSERT INTO Categories (CategoryName, ParentID) VALUES ('Notebooks', 1); -- ID: 5
INSERT INTO Categories (CategoryName, ParentID) VALUES ('Books', NULL); -- ID: 6
INSERT INTO Categories (CategoryName, ParentID) VALUES ('Sci-fi', 6); -- ID: 7
INSERT INTO Categories (CategoryName, ParentID) VALUES ('Fantasy', 7); -- ID: 8

We have two parent categories ‘Computers’ and ‘Books’. Both of them have subcategories and ‘Computers’ category has subsubcategories as well. Now you will need to get all products from ‘Computers’ category and all subcategories. With this database structure it will be realy hard to get all products. You will need to get all category Ids and join them in IN clause. There is better way how to get them, it’s called tree traversal.

Tree traversal example

Here is image which shows previous data we inserted into category table.

Now we will add numbers to left and right side of each elipse, begining from left parent category, go to subcategories and continuing to right side.

Numbers on left side will be called ‘lft, on right side ‘rtg’. We can now update database structure to contain these informations.


CREATE TABLE Categories
(
  ID INT NOT NULL,
  CategoryName NVARCHAR(100) NOT NULL,
  ParentID INT NULL,
  lft INT NOT NULL,
  rtg INT NOT NULL,
  CONSTRAINT pk_CategoryID PRIMARY KEY (ID)
)

INSERT INTO Categories (CategoryName, ParentID, lft, rtg) VALUES ('Computers', NULL, 1, 10); -- ID: 1
INSERT INTO Categories (CategoryName, ParentID, lft, rtg) VALUES ('PC', 1, 2, 7); -- ID: 2
INSERT INTO Categories (CategoryName, ParentID, lft, rtg) VALUES ('AMD', 2, 3, 4); -- ID: 3
INSERT INTO Categories (CategoryName, ParentID, lft, rtg) VALUES ('Pentium', 2, 5, 6); -- ID: 4
INSERT INTO Categories (CategoryName, ParentID, lft, rtg) VALUES ('Notebooks', 1, 8, 9); -- ID: 5
INSERT INTO Categories (CategoryName, ParentID, lft, rtg) VALUES ('Books', NULL, 11, 16); -- ID: 6
INSERT INTO Categories (CategoryName, ParentID, lft, rtg) VALUES ('Sci-fi', 6, 12, 13); -- ID: 7
INSERT INTO Categories (CategoryName, ParentID, lft, rtg) VALUES ('Fantasy', 7, 14, 15); -- ID: 8

Get child categories

Now we can get all child categories of ‘Computers’ category.


SELECT * FROM Categories WHERE lft >= 1 AND rtg <= 10;

Get all parent categories

Another posibility is when we need to create breadcrumbs for actual product page. We know, that the product is in ‘Pentium’ category. To get all parent categories we will use similar sql to the one above


SELECT * FROM Categories WHERE lft <= 5 AND rtg >= 6 ORDER BY lft ASC;

Have the category any childs?

If you have the category data and need to know if the category have any child categories you will need to make another select which will search for all categories which have ParentID same as your category. With this you can only compare ‘lft’ and ‘rtg’. If ‘lft’ +1 = ‘rtg’ than the category doesn’t have any subcategories.

Database , , , ,