SQL SERVER Concepts

1. SQL SERVER:

SQL Server is a relational database management system (RDBMS) developed by Microsoft. It stores and manages data in structured tables using SQL (Structured Query Language). SQL Server is known for its scalability, performance, security features, and support for high availability. It is widely used for business applications, data warehousing, and reporting.

Key features:

Efficient data storage and query management.

Strong security and access control.

High availability with backup and disaster recovery options.

Integration with business intelligence tools.

Scalability for handling large data volumes.

2. Database:

A database in SQL Server is a logical container that holds and organizes data in a structured format. It is a discrete unit that manages and stores data, encompassing various database objects such as tables, views, stored procedures, and indexes. SQL Server can host multiple databases within a single instance, allowing independent management of each database. Although each database is isolated, they can be linked through relationships or cross-database queries. The database provides efficient access, modification, and querying capabilities to facilitate data management and manipulation using SQL.

Syntax:

CREATE DATABASE database_name;

Example:

CREATE DATABASE ShoppingDatabase;

3. Data Type:

In SQL Server, data types define the type of data that can be stored in a column of a table. Choosing the appropriate data types is crucial for efficient storage, retrieval, and manipulation of data.

1. Numeric Data Types

2. Character String Data Types

3. Date and Time Data Types

4. Binary Data Types

5. Other Data Types

6. Spatial Data Types

7. User-Defined Data Types

3.1. Numeric Data Types:

Numeric Data types include INT, BIGINT, SMALLINT, TINYINT, DECIMAL/NUMERIC, FLOAT.

1. INT:

Integer data type, typically used for whole numbers ranging from -2,147,483,648 to 2,147,483,647.

Scenarios:

A library tracks the number of books in its collection, with a total of 120,000 books, stored in an INT field.

2. BIGINT:

Large integer data type, for very large whole numbers ranging from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.

Scenarios:

A global e-commerce website tracks the number of orders placed worldwide. For example, it may have processed 1,234,567,890 orders over the course of its existence, stored as a BIGINT.

A financial institution tracks the total amount of transactions processed, which could be billions of dollars, such as $9,000,000,000, stored as a BIGINT.

3. SMALLINT:

Small integer data type, for smaller whole numbers ranging from -32,768 to 32,767.

Scenarios:

A local gym tracks the number of members in each branch. A particular branch has 2,500 members, which is stored in a SMALLINT field.

A regional office tracks the number of employees in each department, such as 150 employees in the finance department, using SMALLINT.

4. TINYINT:

Tiny integer data type, for very small whole numbers ranging from 0 to 255.

Scenarios:

An inventory system for a small shop tracks the number of items in stock for a particular product. If the store has fewer than 255 units of an item, this can be stored using TINYINT.

A survey system stores ratings from users on a scale of 0 to 10, and the number of ratings is under 255, stored in a TINYINT field.

5. DECIMAL(p, s) or NUMERIC(p, s):

Fixed precision and scale numeric data types. `p` specifies the maximum total number of digits that can be stored, and `s` specifies the number of digits to the right of the decimal point.

Scenarios:

An online store tracks the price of products, such as a product priced at $123.45, stored as DECIMAL(10, 2).

A real estate company records the prices of properties, with some properties priced up to $999,999.99, using DECIMAL(10, 2).

6. FLOAT:

Floating-point numeric data type for approximate numeric values, suitable for scientific calculations.

Scenario:

A scientific organization tracks temperature readings with floating-point precision, like 23.4567°C, stored using FLOAT.

A stock trading system tracks the daily price of stocks with decimal precision but does not require exact values, such as 342.57, stored in a FLOAT field.

3.2. Character String Data Types

1. CHAR(n):

Fixed-length character string where `n` specifies the maximum number of characters.

Scenarios:

A system stores ISO country codes, such as 'US' and 'IN', which are always two characters long, in a CHAR(2) field.

A library management system stores book identification numbers that always consist of 6 characters, such as 'B12345', in a CHAR(6) field.

2. VARCHAR(n):

Variable-length character string where `n` specifies the maximum number of characters.

Scenarios:

A user registration system stores the names of users, which can vary in length. For instance, "John Doe" (8 characters) or "Alexander Smith" (15 characters), stored in a VARCHAR(50) field.

An e-commerce site stores product descriptions, which can range from a few words to several sentences, in a VARCHAR(255) field.

3. VARCHAR(MAX):

Variable-length character string for large text data up to 2GB in size.

Scenarios:

A blog platform stores user-generated content, such as blog posts or articles, which can exceed 100,000 characters, using VARCHAR(MAX).

A content management system stores large documents or reports that can contain hundreds of pages, using VARCHAR(MAX).

4. TEXT:

Variable-length character string for very large text data, deprecated in favor of `VARCHAR(MAX)`.

Scenarios:

An older version of a website stores long user comments in a deprecated TEXT field. These comments can be several paragraphs long.

A legacy document management system stores full-text documents (e.g., PDFs converted to text) using TEXT fields.

3.3. Date and Time Data Types

1. DATE:

Date data type representing a date value (year, month, day).

Scenarios:

A CRM system stores customer birthdates, such as '1985-08-15', using DATE.

An event scheduling system stores the dates of upcoming events like '2024-12-01' for a product launch.

2. TIME:

Time data type representing a time of day (hour, minute, second, fraction of a second).

Scenarios:

A time-tracking system records the time employees clock in, such as '08:00:00', stored in TIME.

A retail store stores opening hours, like '09:00:00' for opening time, using TIME.

3. DATETIME:

Date and time data type representing date and time values from January 1, 1753, through December 31, 9999, accurate to 3.33 milliseconds.

Scenarios:

An order system tracks the exact time an order was placed, like '2024-11-25 14:30:22.123', stored in DATETIME.

A service logging system records the time a service request is made, such as '2024-11-25 09:45:01.456'.

4. DATETIME2:

Extended date and time data type with a larger date range and fractional seconds precision.

Scenarios:

A scientific experiment records time with precision to microseconds, like '2024-11-25 10:20:15.1234567', using DATETIME2.

An IoT device logs precise event times with microsecond accuracy, such as '2024-11-25 08:00:05.9876543', stored in DATETIME2.

5. SMALLDATETIME:

Date and time data type with smaller date range and less precision than `DATETIME`.

Scenarios:

A conference scheduling system stores appointment times like '2024-11-25 09:00:00', using SMALLDATETIME.

A simple logging system records events with minute-level precision, such as '2024-11-25 15:30:00', in SMALLDATETIME.

3.4. Binary Data Types

1. BINARY(n):

Fixed-length binary data where `n` specifies the maximum number of bytes.

Scenarios:

A system stores encrypted passwords as hash values, where each password hash is 256 bits, stored as BINARY(32).

A secure messaging system stores fixed-length encryption keys used for message encryption, such as a 128-bit key, in BINARY(16).

2. VARBINARY(n):

Variable-length binary data where `n` specifies the maximum number of bytes.

Scenarios:

A user profile picture is stored as binary data, such as a 2MB image, in a VARBINARY(5000) field.

A content management system stores video files as binary data, such as a 500KB thumbnail image, in a VARBINARY field.

3. VARBINARY(MAX):

Variable-length binary data for large binary data up to 2GB in size.

Scenarios:

A file-sharing system stores user-uploaded videos, which can range from 10MB to 1GB, using VARBINARY(MAX).

A music streaming service stores audio tracks in high-quality format, where files range from 5MB to 50MB, in VARBINARY(MAX).

3.5. Other Data Types

1. BIT:

Integer data type that can store 0, 1, or NULL.

Scenarios:

A website tracks whether a user has verified their email address. If verified, the field is set to 1; if not, it's set to 0, stored as BIT.

A system stores the active status of a user account, where 1 indicates an active account and 0 indicates a deactivated account, using BIT.

2. UNIQUEIDENTIFIER:

Globally unique identifier (GUID).

Scenarios:

Storing unique transaction IDs or session IDs.

Storing unique identifiers for records in a distributed database.

3. XML:

Stores XML data.

Scenarios:

Storing user preferences or configuration settings in XML format.

Storing structured document data like invoices or shipping information.

4. CURSOR:

A data type used to reference a cursor, which allows for iterative processing of query results row-by-row.

Scenarios:

Iterating over a result set in a stored procedure to perform specific actions on each row.

Processing a result set in an application that requires individual handling of each row.

3.6. Spatial Data Types

1. GEOMETRY:

Stores spatial data in a two-dimensional plane.

Scenarios:

Storing coordinates for locations on a map (e.g., city locations).

Storing the boundaries of a building or area for real estate applications.

2. GEOGRAPHY:

Stores spatial data that represents data in a round-earth coordinate system.

Scenarios:

A database generates a unique identifier for each user, such as '8f01ac2f-04a3-4f90-a8da-fab303b7de59', stored as UNIQUEIDENTIFIER.

A shipment tracking system generates unique GUIDs for each shipment, like 'e7627a91-1f4e-4321-bf92-04c9e431f8f7', stored in a UNIQUEIDENTIFIER column.

3.7. User-Defined Data Types

SQL Server allows you to create your own data types using the `CREATE TYPE` statement.

Scenarios:

Creating a custom data type to represent a "phone number" with specific formats and validation rules.

Creating a data type to represent a custom structured object, like a "product" with a price, description, and ID.

4. TEMP

In SQL Server, the term TEMP usually refers to temporary tables or temporary objects that are used for storing intermediate results during the execution of queries or sessions. These temporary objects are stored in the tempdb system database, and they only exist during the duration of a session or query.

4. 1. Temporary Tables

Temporary tables are used to store intermediate data and can be of two types.

4.1.1 Local Temporary Tables

4.1.2 Global Temporary Tables

4.1.1 Local Temporary Tables:

Local temporary tables are only visible to the session (or connection) in which they are created.

They are created using a single # symbol before the table name.

Once the session is closed or the connection is terminated, the local temporary table is automatically dropped.

Scope:

Available only to the current session or connection.

Lifetime:

The table is automatically dropped when the session ends.

Syntax:

CREATE TABLE #TempTable
(
Column1 INT,
Column2 NVARCHAR(100)
);

Example:

CREATE TABLE #TempCustomers (
CustomerID INT,
CustomerName NVARCHAR(100)
);
INSERT INTO #TempCustomers (CustomerID, CustomerName) VALUES (1, 'John Doe');
SELECT * FROM #TempCustomers;
DROP TABLE #TempCustomers;

4.1.2 Global Temporary Tables:

Global temporary tables are visible to all sessions and all connections after they are created.

They are created using two ## symbols before the table name.

A global temporary table persists until all sessions that are using it are closed.

Scope:

Available to all sessions and users.

Lifetime:

The table remains until all sessions using it are closed.

Syntax:

CREATE TABLE ##GlobalTempTable
(
Column1 INT,
Column2 NVARCHAR(100)
);

Example:

CREATE TABLE ##GlobalTemp (OrderID INT, ProductName NVARCHAR(100));
INSERT INTO ##GlobalTemp (OrderID, ProductName) VALUES (1001, 'Laptop');
SELECT * FROM ##GlobalTemp;
DROP TABLE ##GlobalTemp;

Scenarios:

Breaking Down Complex Queries:

Splitting complex queries into smaller, more manageable parts and storing intermediate results for easier handling and troubleshooting.

Improving Query Performance:

Storing subsets of large datasets in temporary tables, applying indexes, and performing operations like joins or aggregations more efficiently.

Handling Data Transformation Tasks:

Storing and transforming data before generating final reports, allowing modifications to be applied to cleaned data.

Data Warehousing (ETL):

Staging data during the Extract, Transform, and Load (ETL) process before loading it into final target tables.

Dealing with Large Dataset Filtering:

Storing filtered results of large datasets based on complex criteria, enabling more efficient further analysis.

Recursive Queries:

Managing hierarchical or recursive data by storing intermediate results during recursive query operations.

5. Functions:

In SQL Server, functions are named, reusable blocks of code that can accept parameters, perform calculations, and return a value. They help encapsulate logic, making it easier to perform operations that can be used within SQL statements. Functions are typically used for calculations, data manipulation, and querying, and they can be invoked in SELECT, WHERE, or other SQL clauses.

Functions in SQL Server improve code reuse, modularity, and clarity, making it easier to encapsulate logic and perform complex operations in SQL queries.

Types of Functions:

5.1. Scalar Functions:

Scalar functions return a single value based on the input parameters.

They are used in SQL queries where a single value is expected, such as in SELECT, WHERE, HAVING, or ORDER BY clauses.

Example:

SELECT UPPER(FirstName) AS UppercaseName FROM Employees;

5.2. Table-Valued Functions (TVFs):

TVFs return a table as a result set, allowing users to treat the function like a table in queries.

5.2.1Types:

Inline TVFs:

Defined with a single SELECT statement and no procedural logic. They are simpler and often used when the function simply needs to return a result set.

Example:

CREATE FUNCTION GetEmployeeDetails (@DeptID INT)

RETURNS TABLE

RETURN

(

SELECT EmployeeID, EmployeeName

FROM Employees

WHERE DepartmentID = @DeptID

);

Multi-statement TVFs:

Defined with procedural logic using BEGIN...END blocks. These can contain multiple SQL statements and are used when more complex operations are required.

Example:

CREATE FUNCTION CalculateSalary (@EmployeeID INT)

RETURNS TABLE

RETURN

BEGIN

DECLARE @Salary DECIMAL(10, 2);

SELECT @Salary = Salary FROM Employees WHERE EmployeeID = @EmployeeID;

RETURN (SELECT @Salary AS EmployeeSalary);

END;

5.3. System Functions:

System Functions are built-in functions provided by SQL Server for common operations, such as date manipulation, string operations, and mathematical calculations like LEN(), ABS(), ROUND(), DATEDIFF(), etx...

System functions are available globally and do not need to be explicitly created. They are often used for tasks such as formatting dates, calculating values, and querying system information.

Example:

SELECT GETDATE() AS CurrentDate, LEN('Hello') AS StringLength;

6. Normalization

Normalization in SQL Server is the process of organizing database tables to minimize redundancy and enhance data integrity. The primary objective is to ensure that data is stored in an efficient, non-redundant manner while maintaining its accuracy. Proper normalization structures data effectively, reducing duplication and improving consistency. However, it’s important to strike a balance, as excessive normalization can negatively impact performance due to complex joins.

OrderID	CustomerName	OrderDate	ProductID	ProductName	Quantity	UnitPrice	TotalPrice
1	Alice	01-11-2024	101	Laptop	2	1000	2000
1	Alice	01-11-2024	102	Mouse	1	25	25
2	Bob	02-11-2024	103	Keyboard	1	50	50
3	Carol	03-11-2024	101	Laptop	1	1000	1000
3	Carol	03-11-2024	104	Monitor	2	300	600

Normal Forms

First Normal Form (1NF):

Ensure all columns contain atomic (indivisible) values, and each row is unique.

Example:

OrderID	CustomerName	OrderDate	ProductID	ProductName	Quantity	UnitPrice	TotalPrice
1	Alice	01-11-2024	101	Laptop	2	1000	2000
1	Alice	01-11-2024	102	Mouse	1	25	25
2	Bob	02-11-2024	103	Keyboard	1	50	50
3	Carol	03-11-2024	101	Laptop	1	1000	1000
3	Carol	03-11-2024	104	Monitor	2	300	600

Although it meets 1NF, the redundancy issue (repetition of CustomerName, OrderDate, etc.) still exists, so we need to move to 2NF.

Second Normal Form (2NF):

2NF requires that the table is in 1NF and that all non-key columns depend on the entire primary key (not just part of it).

Remove partial dependencies—non-key columns must depend on the entire primary key, not just part of it.

Example:

To move to 2NF, we need to split the table into two:

1. Orders Table (storing order-level information):

OrderID	CustomerName	OrderDate
1	Alice	01-11-2024
2	Bob	02-11-2024
3	Carol	03-11-2024

2. OrderDetails Table (storing product-level details for each order):

OrderID	ProductID	ProductName	Quantity	UnitPrice	TotalPrice
1	101	Laptop	2	1000	2000
1	102	Mouse	1	25	25
2	103	Keyboard	1	50	50
3	101	Laptop	1	1000	1000
3	104	Monitor	2	300	600

Now, the Orders table holds only order-level information, and the OrderDetails table holds product-level information. There is no longer a partial dependency, so the tables are in 2NF.

Third Normal Form (3NF):

3NF requires that the table is in 2NF and that there are no transitive dependencies. This means that non-key columns must not depend on other non-key columns.

Eliminate transitive dependencies, meaning non-key columns should not depend on other non-key columns.

Boyce-Codd Normal Form (BCNF):

Be in 3NF.

Every determinant (attribute that determines others) must be a candidate key.

To move to 3NF, we separate the ProductName and UnitPrice into a Products table:

Example:

Orders Table (remains the same):

OrderID	CustomerName	OrderDate
1	Alice	01-11-2024
2	Bob	02-11-2024
3	Carol	03-11-2024

2. OrderDetails Table (stores order-product relationships):

OrderID	ProductID	Quantity	TotalPrice
1	101	2	2000
1	102	1	25
2	103	1	50
3	101	1	1000
3	104	2	600

3. Products Table (stores product details):

ProductID	ProductName	UnitPrice
101	Laptop	1000
102	Mouse	25
103	Keyboard	50
104	Monitor	300

Now, the non-key attributes (ProductName, UnitPrice) are stored in the Products table, and they are no longer transitively dependent on OrderID. This structure satisfies 3NF.

Fourth Normal Form (4NF):

Be in BCNF. 4NF requires that the table is in 3NF and that it has no multi-valued dependencies.

Eliminate multi-valued dependencies, where one attribute determines a set of independent values.

Fifth Normal Form (5NF):

Be in 4NF.

Ensure no join dependency exists, and the table cannot be decomposed further without losing data.

Example:

1. Orders Table (order-level information remains the same):

ProductID	ProductName	UnitPrice
101	Laptop	1000
102	Mouse	25
103	Keyboard	50
104	Monitor	300

2. OrderDetails Table (remains unchanged):

OrderID	ProductID	Quantity	TotalPrice
1	101	2	2000
1	102	1	25
2	103	1	50
3	101	1	1000
3	104	2	600

3. Products Table (remains unchanged):

ProductID	ProductName	UnitPrice
101	Laptop	1000
102	Mouse	25
103	Keyboard	50
104	Monitor	300

4. Payments Table (stores payment methods for each order):

OrderID	PaymentMethod
1	Credit Card
1	PayPal
2	Credit Card
3	PayPal

5. DeliveryAddresses Table (stores delivery addresses for each order)

OrderID	DeliveryAddress
1	123 Main St, NY
2	456 Oak St, CA
3	789 Pine St, TX

By splitting multi-valued dependencies into separate tables, we satisfy 4NF.

Normalization Process

Identify primary keys.

Ensure attributes are atomic (no multi-valued data).

Remove partial and transitive dependencies.

Split tables as needed to remove redundancy and dependencies.

Conclusion:

We have now normalized our Order and OrderDetails tables from the initial unnormalized form to 5NF:

1NF: Ensure atomic values, each row is unique.

2NF: Eliminate partial dependencies.

3NF: Remove transitive dependencies.

BCNF: Ensure that every determinant is a candidate key.

4NF: Eliminate multi-valued dependencies.

Denormalization

Although normalization enhances data integrity, it can lead to performance issues due to the complexity of joins. In such cases, denormalization may be used to improve performance, sacrificing some redundancy in exchange for faster queries.

Search This Blog