UNION vs. UNION ALL

UNION and UNION ALL-

when 2 select statements have the sames set of columns, the rows from the 2 select statements can be appended one after another.

The difference being that UNION remove duplicate rows, while UNION ALL does not. Another difference between them is that UNION ALL is faster than UNION but may look slow because it returns more data which takes more time to travel via network.

This keyword is very well supported by all major database e.g. Oracle, Microsoft SQL Server, MySQL and PostgreSQL. To do this in SQL Server, it must build a temporary index on all the columns returned by both queries. If the index cannot be build for the queries, you will get a SQL error. In this situation, UNION ALL may be the solution if you don’t have rights to do this. Alternatively, you may need to use a temporary table with INSERT queries to store the results of each query and use a standard SELECT query to group and get the unique values.

Let us delve in more detail:

SQL UNION /UNION ALLSyntax

SELECT column_name(s) FROM table1

UNION <ALL>

SELECT column_name(s) FROM table2;

 

 

 

SQL for Count of duplicate values Multiple columns

declare @YourTable table (id int, name varchar(10), email varchar(50))

INSERT @YourTable VALUES (1,'John','John-email')
INSERT @YourTable VALUES (2,'John','John-email')
INSERT @YourTable VALUES (3,'fred','John-email')
INSERT @YourTable VALUES (4,'fred','fred-email')
INSERT @YourTable VALUES (5,'sam','sam-email')
INSERT @YourTable VALUES (6,'sam','sam-email')

SELECT
    name,email, COUNT(*) AS CountOf
    FROM @YourTable
    GROUP BY name,email
    HAVING COUNT(*)>1

OUTPUT:

name       email       CountOf
---------- ----------- -----------
John       John-email  2
sam        sam-email   2

(2 row(s) affected)

if you want the IDs of the dups use this:

SELECT
    y.id,y.name,y.email
    FROM @YourTable y
        INNER JOIN (SELECT
                        name,email, COUNT(*) AS CountOf
                        FROM @YourTable
                        GROUP BY name,email
                        HAVING COUNT(*)>1
                    ) dt ON y.name=dt.name and y.email=dt.email

OUTPUT:

id          name       email
----------- ---------- ------------
1           John       John-email
2           John       John-email
5           sam        sam-email
6           sam        sam-email

(4 row(s) affected)

to delete the duplicates try:

DELETE d
    FROM @YourTable d
        INNER JOIN (SELECT
                        y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
                        FROM @YourTable y
                            INNER JOIN (SELECT
                                            name,email, COUNT(*) AS CountOf
                                            FROM @YourTable
                                            GROUP BY name,email
                                            HAVING COUNT(*)>1
                                        ) dt ON y.name=dt.name and y.email=dt.email
                   ) dt2 ON d.id=dt2.id
        WHERE dt2.RowRank!=1
select * FROM @YourTable

OUTPUT:

id          name       email
----------- ---------- --------------
1           John       John-email
3           fred       John-email
4           fred       fred-email
5           sam        sam-email

(4 row(s) affected)

 

Importance of SQL performance tuning

SQL performance is vital for driving value from a data warehouse. With today’s growing query complexity, optimizing SQL can be a daunting task. This series provides a fundamental method for SQL tuning that has been used and refined through thousands of tuning exercises. The structure and process in this method generates a context that minimizes the complexities and gets to the root of the performance issues, which enables faster remediation. Further, the method provides a consistent, repeatable process that can be leveraged across the organization from the end user to database administrator. In addition to the structured method, the presenters will share insight on typically seen performance problems and provide sample remedies. This provides the reader with insight to commence their tuning efforts.

SQL tuning can be a daunting task. This method leads an analyst to diagnose & see the root cause of a performance issue in the process of developing a solution.
The reader will receive a proven tuning method that has been used to systematically diagnose & fix 1000s of SQL tuning issues.
SQL tuning diagnostics with typical problem types & sample remedies are provided in this series.

SQL to return the first record in a group

Lets say we have a log table containing a record of page views for an intranet:

DateStamp IpAddress Page
20/06/2011 192.168.0.10 home
21/06/2011 192.168.0.11 about
21/06/2011 192.168.0.10 home
22/06/2011 192.168.0.12 home
22/06/2011 192.168.0.10 home
23/06/2011 192.168.0.11 about

To query this table to show the earliest date when each page was viewed requires just a simple group-by query:

SELECT Page, min(DateStamp) as FirstViewed
FROM LogTable
GROUP BY Page

which will give:

Page FirstViewed
about 21/06/2011
home 20/06.2011

But if we wanted to include the IP address associated with each of those first page views, then a simple join will no longer suffice and our query becomes significantly more complex.

SELECT detail.Page, detail.DateStamp as FirstViewed, detail.IpAddress
FROM LogTable detail
INNER JOIN (
   SELECT Page, min(DateStamp) as FirstViewed
   FROM LogTable
   GROUP BY Page) inside
ON inside.Page = detail.Page
AND inside.FirstViewed = detail.DateStamp

This returns:

Page FirstViewed IpAddress
home 20/06/2011 192.168.0.10
about 21/06/2011 192.168.0.11

Delete Duplicate Records using SQL

By duplicate record I mean that every field in one record is identical to every field in a different record, i.e. a duplicate is where there is no way of telling two or more records apart. If you just need to remove records which are similar (i.e. one or more fields are identical but there are one or more fields which are different) then instead refer to how to delete similar records.

To check that you have duplicate records in your table do the following:

select count(*) from MyTable

and

select distinct * from MyTable

unfortunately SQL does not permit a select count(distinct). You have duplicate records if the number of records returned by the second query is less than the number of records returned by the first.
Unfortunately there is no way in SQL to delete one of these duplicates without deleting all of them. They are identical after all, so there is no SQL query that you could put together which could distinguish between them.
What you can do is to copy all the distinct records into a new table:

select distinct *

into NewTable
from MyTable

This query will create a new table (NewTable in my example) containing all the records in the original table but without any records being duplicated. It will therefore preserve a single copy of those records which were duplicated.
Because this query creates (and populates) a new table, it will fail if the table already exists.

SQL Self Join – All employees in same country

self-join is joining a table to itself

Example
A query to find all pairings of two employees in the same country is desired. If there were two separate tables for employees and a query which requested employees in the first table having the same country as employees in the second table, a normal join operation could be used to find the answer table. However, all the employee information is contained within a single large table.

Consider Employee table such as the following:

Employee Table
EmployeeID LastName Country DepartmentID
123 Rafferty Australia 31
124 Jones Australia 33
145 Steinberg Australia 33
201 Robinson United States 34
305 Smith Germany 34
306 John Germany NULL

An example solution query could be as follows:

SELECT F.EmployeeID, F.LastName, S.EmployeeID, S.LastName, F.Country

FROM Employee F
INNER JOIN Employee S ON F.Country = S.Country
WHERE F.EmployeeID < S.EmployeeID
ORDER BY F.EmployeeID, S.EmployeeID;
Which results in the following table being generated.







Employee Table after Self-join by Country
EmployeeIDLastNameEmployeeIDLastNameCountry
123Rafferty124JonesAustralia
123Rafferty145SteinbergAustralia
124Jones145SteinbergAustralia
305Smith306JohnGermany