Friday, 19 July 2013

SQL

1) What is SQL ? 
A) SQL stands for Structured Query Language. SQL is used to communicate with a database.  it is the standard language for relational database management systems. SQL statements are used to perform tasks such as update data on a database, or retrieve data from a database.
Example  relational database management systems that use SQL are:
  • Oracle
  • Sybase
  • Microsoft SQL Serve
  • Access
2)What is OLTP and OLAP?
Online transaction processing and online analytical process
Oltp is transactional
Olap is an analatic
Oltp provides source data to datawarehouse.
Olap helps to analyse it
Oltp is characterized by large no. of short online transactions(insert,update,delete)
Olap is charecterised by relatively low volumes of transations
In oltp fast query processing
In olap queries are complex and involved aggregation
In oltp it maintained data integrity in multi access environments.
In oltp effectiveness is measured by transactions per second
In olap response time is an effectiveness measure
Detailed and current data, schema used to store transactional DB
Aggregated, historical data,
3)What is RDBMS? 
Relational Data Base Management Systems (RDBMS) are database management systems that maintain data records and indices in tables. Relationships may be created and maintained across and among the data and tables.
An RDBMS has the capability to recombine the data items from different files, providing powerful tools for data usage.
Relational tables have the following five properties:
Values are atomic.
Column values are of the same kind.
The sequence of columns is insignificant.
The sequence of rows is insignificant.
Each column must have a unique name
4) What is normalization? 
In relational database design, the process of organizing data to minimize redundancy is called normalization.
Normalization usually involves dividing database data into different tables and defining relationships between the tables. Database normalization is a data design and organizational process applied to data structures based on rules that help build relational databases.
The key traits for Normalization are eliminating redundant data and ensuring data dependencies.
5) What are the different normalization forms? 
1NF: Eliminate repeating groups
2NF: Eliminate redundant data
3NF: Eliminate columns not dependent on the Pkey
6) What is de-normalization?
De-normalization is the process of attempting to optimize the performance of a database by adding redundant data.
De-normalizing the database design allows for fewer joins with tables and foreign key requirements.
This method is commonly used for Reporting and OLAP workloads
7) SQL Commands:
SQL commands are instructions used to communicate with the database to perform specific task that work with data. SQL commands can be used not only for searching the database but also to perform various other functions like, for example, you can create tables, add data to tables, or modify data, drop the table, set permissions for users. SQL commands are grouped into four major categories depending on their
functionality:
Data Definition Language (DDL) - These SQL commands are used for creating, modifying, and dropping the structure of database objects. The commands are CREATE, ALTER, DROP, RENAME, and TRUNCATE.
Data Manipulation Language (DML) - These SQL commands are used for storing, retrieving, modifying, and deleting data. These commands are SELECT, INSERT, UPDATE, and DELETE.
Transaction Control Language (TCL) - These SQL commands are used for managing changes affecting the data. These commands are COMMIT, ROLLBACK, and SAVEPOINT.
Data Control Language (DCL) - These SQL commands are used for providing security to database objects. These commands are GRANT and REVOKE.
The CREATE TABLE Statement is used to create tables to store data.
CREATE TABLE table_name
(column_name1 datatype,
column_name2 datatype,
… column_nameN datatype
);
The INSERT Statement is used to add new rows of data to a table.
INSERT INTO TABLE_NAME
[ (col1, col2, col3,...colN)]
VALUES (value1, value2, value3,…valueN);
The UPDATE Statement is used to modify the existing rows in a table.
UPDATE table_name
SET column_name1 = value1,
column_name2 = value2, ..
[WHERE condition]
NOTE:In the Update statement, WHERE clause identifies the rows that get affected. If you do not include the WHERE clause, column values for all the rows get affected.
The DELETE Statement is used to delete rows from a table.
DELETE FROM table_name [WHERE condition];
TRUNCATE statement: This command is used to delete all the rows from the table and free the space containing the table.
8) Difference between delete and truncate?
Data deleted with truncate command cannot be roll back once the transaction is committed. While data deletion with delete command can be roll back even if the transaction is committed.
Delete Performs row by row with given condition while truncate command deletes the complete data in a table but the structure of the table remains.
TRUNCATE is faster and uses fewer system and transaction log resources while transaction logs are maintained by delete which slows the operation .
DELETE does not reset the identity column of the table while TRUNCATE resets the identity field of the table.
NOTE:: If you want to remove table definition and its data, use the DROP TABLE statement.
9) Difference between DROP and TRUNCATE Statement:
If a table is dropped, all the relationships with other tables will no longer be valid, the integrity constraints will be dropped, grant or access privileges on the table will also be dropped, if we want use the table again it has to be recreated with the integrity constraints, access privileges and the relationships with other tables should be established again. But, if a table is truncated, the table structure remains the same , therefore relations created on columns between the tables will exist.
10) What are primary keys and foreign keys?
A PRIMARY KEY constraint is a unique identifier for a row within a database table. Every table should have a primary key constraint to uniquely identify each row and only one primary key constraint can be created for each table and PK cannot be null. The primary key constraints are used to enforce entity integrity.
NOTE:: It is not possible to change the length of a column defined with a PRIMARY KEY constraint. If you need to change the length then you must first delete the existing PRIMARY KEY constraint and then re-create it with the new definition.
A FOREIGN KEY constraint prevents any actions that would destroy links between tables with the corresponding data values. A foreign key in one table points to a primary key in another table. Foreign keys prevent actions that would leave rows with foreign key values when there are no primary keys with that value. The foreign key constraints are used to enforce referential integrity.
11) What is a UNIQUE KEY Constraint?
A UNIQUE constraint enforces the uniqueness of the values in a set of columns; so no duplicate values are entered. The unique key constraints are used to enforce entity integrity as the primary key constraints.
12) What’s the difference between a primary key and a unique key?
Both primary key and unique key enforces uniqueness of the column on which they are defined. But by default primary key creates a clustered index on the column, where are unique creates a non-clustered index by default. Another major difference is that, primary key doesn’t allow NULLs, but unique key allows one NULL only
NOTE:: Primary key is also a unique key internally, but cannot allow NULLs. Unique keys on the other hand allow a single NULL but not multiple NULLs over the columns.
13) What is a CHECK constraint?
A CHECK constraint is used to limit the values that can be placed in a column. CHECK constraints are most often used to enforce domain integrity. (Read more here http://bit.ly/sqlinterview17)
14) What is a NOT NULL constraint?
A not null constraint enforces that the column will not accept null values. Not null constraints are used to enforce domain integrity.
Define candidate key, alternate key, composite key.
A candidate key is one that can identify each row of a table uniquely. Generally a candidate key becomes the primary key of the table. If the table has more than one candidate key, one of them will become the primary key, and the rest are called alternate keys. A key formed by combining at least two or more columns is called composite key.
15) What is Identity?
Identity (or AutoNumber) is a column that automatically generates numeric values. A start and increment value can be set, but most DBA leave these at 1. A GUID column also generates numbers; the value of this cannot be controlled. Identity/GUID columns do not need to be indexed.
Note:: TRUNCATE TABLE resets the IDENTITY column to its base value. The DELETE command doesn’t do this.
16) What are the different types of data types in SQL Server?
There are different types of data types used in SQL Server
Int
Small int
Big int
Tiny int
Float
Double
Decimal
Money
Char
Varchar
Nvarchar
Nchar
Varbinary
Varbinary max
Small date time
Date time
Ntext
Numeric
Sql variant
Blob.
Glob
Unique identifier
Nvarchar max
17) SQL Operators
There are two type of Operators, namely Comparison Operators and Logical Operators. These operators are used mainly in the WHERE clause, HAVING clause to filter the data to be selected.
Comparison operators are used to compare the column data with specific values in a condition. ‘=, != or , , =’
18) Logical operators:
There are three Logical Operators namely AND, OR and NOT. These operators compare two conditions at a time to determine whether a row can be selected for the output. When retrieving data using a SELECT statement, you can use logical operators in the WHERE clause, which allows you to combine more than one condition.
OR operator:: If you want to select rows that satisfy at least one of the given conditions, you can use the logical operator, OR.
For example: if you want to find the names of students who are studying either Maths or Science, the query would be like,
SELECTfirst_name,last_name,subject FROMstudent_details
WHERE subject = ‘Maths’ OR subject = ‘Science’ AND operator:: If you want to select rows that must satisfy all the given conditions, you can use the logical operator, AND.
For Example: To find the names of the students between the age 10 to 15 years, the query would be like:
SELECTfirst_name,last_name,age FROMstudent_details
WHERE age >= 10 AND age =10ANDage 25000
19) Difference between where clause and having clause?
Aggregated functions cannot be used in WHERE Clause but can be used in HAVING clause.
Where clause can be used in select/update/delete clause while having can be used only in select clause.
Having Clause is basically used only with the GROUP BY function in a query whereas WHERE Clause is applied to each row before they are part of the GROUP BY function in a query.
20) What is Group by function?
The SQL GROUP BY Clause is used along with the group functions to retrieve data grouped according to one or more columns
The group by clause should contain all the columns in the select list expect those used along with the aggregated functions.
21) What is order by function?
The ORDER BY clause is used in a SELECT statement to sort results either in ascending or descending order.
NOTE: The columns specified in ORDER BY clause should be one of the columns selected in the SELECT column list.
22) What is a join and explain different types of joins?
In order to join two or more tables we use Joins.
Types of joins: INNER JOINs, OUTER JOINs, CROSS JOINs. OUTER JOINs are further classified as LEFT OUTER JOINS, RIGHT OUTER JOINS and FULL OUTER JOINS.
23) Inner Join
A join that displays only the rows that have a match in both joined tables is known as inner Join.
This is the default type of join in the Query and View Designer.
24) Outer Join
A join that includes rows even if they do not have related rows in the joined table is an Outer Join.  You can further create three different outer join to specify the unmatched rows to be included:
Left Outer Join: It pulls all the records from left table which appears left most in the JOIN clause and substitute null vales in the right table for no matched records.
Right Outer Join It pulls all the records from right table which appears rightmost in the JOIN clause and substitute null vales in the left table for no matched records
Full Outer Join: In Full Outer Join all rows in all joined tables are included, whether they are matched or not.
25) Cross join
A cross join that does not have a WHERE clause produces the Cartesian product of the tables involved in the join. The size of a Cartesian product result set is the number of rows in the first table multiplied by the number of rows in the second table
26) Self Join
This is a particular case when one table joins to itself, with one or two aliases to avoid confusion. A self join can be of any type, as long as the joined tables are the same.The common example is when company has a hierarchal reporting structure whereby one member of staff reports to another. Self Join can be Outer Join or Inner Join. (Read More Here)
27) What is Sub-query?
Subquery or Inner query or Nested query is a SELECT statement that is nested within another T-SQL statement.
A subquery SELECT statement can return any number of values, and can be found in, the column list of a SELECT statement, a FROM, GROUP BY, HAVING, and/or ORDER BY clauses of a T-SQL statement
A query is called correlated subquery when both the inner query and the outer query are interdependent.
SELECT p.product_name FROM product p
WHERE p.product_id = (SELECT o.product_id FROM order_items o
WHERE o.product_id = p.product_id);
28) Properties of sub-query:
•A sub-query must be enclosed in the parenthesis.
•A sub-query must be put in the right hand of the comparison operator, and
•A sub-query cannot contain an ORDER-BY clause.
•A query can contain more than one sub-query.
A Subquery can also be used as a parameter to a function call
29) Difference between join and subquery?
A join will join two or more tables together by a field related to both tables (ie, relationship of primary and foreign keys). It is typically easy to understand.
A sub-query statement involves a SELECT statement that selects particular values from a table. The values that the select query selects is dependent upon the sub-query. The sub-query itself is another SELECT statement.
30) What is Union and Union all? What is difference between them?
UNION If you need to combine two or more result set into single result set we use union . The no.of columns and their datatypes also should match
If you have any duplicate data in your result set those records will be ignored with UNION while union all will not eliminate the duplicate rows from the result set.
Difference between join and Union?
31) What is Variable?
variable is an object that can hold a single data value of a specific type.
When a variable is first declared, its value is set to NULL. To assign a value to a variable, use the SET statement.
A variable can also have a value assigned by being referenced in a select list.
Syntax:
Declare @variable_name datatype
Set variable_name = value
EX: Declare @st_id int
Set @st_id = 5
Select *
From student where st_id=@st_id
32) What are local variables and global variable?
A local variable must be declared (using DECLARE or as a formal parameter) before use. A local variable can be set using either the SET command or the SELECT command.
Global variables are system-supplied, predefined variables. They are distinguished from local variables by the two @ signs preceding their names–for example, @@error. The two @ signs are considered part of the identifier used to define the global variable.
Users cannot create global variables and cannot update the value of global variables directly in a select statement. If a user declares a local variable that has the same name as a global variable, that variable is treated as a local variable.
The difference is global variable is a system defined or predefined variable while local variable is a user defined and must be declared before using.Local variable declared with a single @ preceding their name while global variable by the two @ signs preceding their names.
Some of the global variable are:: @@error, @@rowcount
33) What are string functions?
These are system defined functions.
Ltrim – If we need to extract the first four characters from a given column we use ltrim
EX : select customer_name,LEFT(customer_name,4) from customer
Rtrim – If we need to extract the last four characters from a given column we use rtrim
EX: select customer_name,right(customer_name,4) from customer
Length – It will count the function of given string.
Ex: select customer_name,len(customer_name) from customer
Datalength – It will count the function of given string includes empty spaces.
Ex: select customer_name,datalength(customer_name),len(customer_name) from customer
34) Difference between length and data length?
Length function will not count the empty space and control keys like f1,f2 while data length counts.
Ex: select customer_name,datalength(customer_name),len(customer_name) from customer
where datalength(customer_name) len(customer_name)
Sub-string: To extract a string from a sub-string we use sub-string.
Ex: select customer_name,SUBSTRING(customer_name,5,4) from customer
Char-index : It is going to return index value of first apperence of character.
Ex: select CHARINDEX(‘-’,'raja-testing’)
35) What are user defined function?
User-Defined Functions allow defining its own T-SQL functions that can accept 0 or more parameters and return a single scalar data value or a table data type.
Scalar value function: If any function returns a single value as a result set.
In scalar we pass parameters an inputs.
It is used to call in select clause.
create function fun_name (@a int,@b int,@op char)
returns int
as begin
declare @output int
select stmnt
end
Return @output
end
Tabular value function: If any function returns result set as a table.
It is used to call in from clause.
create function tab_months()
returns table
return
select stmnts
)
The diff b/w scalar valued and tabular valued function is in scalar we return single value as a result set where as in tabular valued functions it returns a table as a result set and scalar we pass parameters an inputs and scalar used to call in select clause while tabular used to call in from clause.
36) What is Difference between Function and Stored Procedure?
UDF can be used in the SQL statements anywhere in the WHERE/HAVING/SELECT section where as Stored procedures cannot be. UDFs that return tables can be treated as another rowset. This can be used in JOINs with other tables. Inline UDF’s can be thought of as views that take parameters and can be used in JOINs and other Rowset operations.
37) What is Views?
A simple view can be thought of as a subset of a table. It can be used for retrieving data, as well as updating or deleting rows. The results of using a view are not permanently stored in the database. Rows updated or deleted in the view are updated or deleted in the table the view was created with. It should also be noted that as data in the original table changes, so does data in the view, as views are the way to look at part of the original table.
38) What is a stored procedure?
A stored procedure (SP) is a named group of SQL statements that have been previously created and stored in the server database. Stored procedures accept input parameters so that a single procedure can be used over the network by several clients using different input data. Stored procedures reduce network traffic and improve performance.
reuse and improved security from SQL injection are some of the advantages of using SPs
39) What is a Trigger?
A trigger is a SQL procedure that initiates an action when an event (like INSERT, DELETE or UPDATE) occurs on an object.
Triggers are stored in and managed by the DBMS.
Triggers can be used to maintain the referential integrity of data by changing the data in a systematic way.
A trigger cannot be called or executed directly;
DBMS automatically fires the trigger as a result of a data modification to the associated table.
A trigger is called a nested trigger when it is fired off from another trigger.
Triggers are similar to stored procedures in that both consist of procedural logic that is stored at the database level.
40) Difference between stored procedures and triggers?
Stored procedures, however, are not event-driven and are not attached to a specific table as most triggers are.
Stored procedures are explicitly executed by invoking a call to the procedure while triggers are implicitly executed by events.
In addition, triggers can also execute stored procedures.
41) What are the different types of triggers?
There are three types of triggers.
1) DML trigger
There are two kinds of DML triggers
a. Instead of Trigger
Instead of Triggers are fired in place of the triggering action such as an insert, update, or delete.
b. After Trigger
After triggers execute following the triggering action, such as an insert, update, or delete.
2) DDL trigger
This type of trigger is fired against DDL statements like Drop Table, Create Table, or Alter Table. DDL Triggers are always after Triggers.
3) Logon trigger
This type of trigger is fired against a LOGON event before a user session is established to the SQL Server.
42) What is a linked server?
A linked server configuration enables SQL Server to execute commands against OLE DB data sources on remote servers. With a linked server, you can create very clean, easy–to-follow SQL statements that allow remote data to be retrieved, joined, and combined with local data. The ability to issue distributed queries and perform commands with transactions on heterogeneous sources is one of the benefits of using linked servers.
The system supplied stored procedures sp_addlinkedserver and sp_addlinkedsrvlogin are used to add new linked server(s). The stored procedure sp_linkedservers is used to list all the linked servers defined on the server.
43) What is a cursor?
Cursors are used to hold the data temporarily for processing for some logic. It stores the data from table and process one by one each record
A cursor is a database object used by applications in the procedural logic to manipulate data in a row-by-row basis, instead of the typical SQL commands that operate on all or parts of rows as sets of data.
Cursor is required when there is any operation required on data by traversing one by one.
In order to work with a cursor, we need to perform these steps in the following order:
Declare a cursor
Open the cursor
Fetch a row from the cursor
Process the fetched row
Close cursor
Deallocate the cursor
44) What is Index?
An index is a physical structure containing pointers to the data. Indices are created in an existing table to locate rows more quickly and efficiently. It is possible to create an index on one or more columns of a table, and each index is given a name. The users cannot see the indexes; they are just used to speed up queries.
What are the difference between clustered and a non-clustered index? (Read More Here)
A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages.
A non clustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a non clustered index does not consist of the data pages. Instead, the leaf nodes contain index rows.
45) How to Enable/Disable Indexes?
–DisableIndex
ALTER INDEX [IndexName] ON TableName DISABLE GO
–EnableIndex
ALTER INDEX [IndexName] ON TableNameREBUILDGO
(Read more here)
46) Data model:
It is a type of data abstraction .It is a set of concepts that can be used to describe the structure of a database.
47) What is isolation level in sql server?
Isolation Levels are used to handle transations in sql server i.e to implement concurrency across the transactions.
-Read committed
-Phantom reads
-Dirty read
What is default isolation for sql server
Read committed
@@error?
Stores error details used in stored procedures
@@Raise error?
To print error messege.
@@Row count?
To check how many no. of rows updated in the last transaction.
32 levels of nesting can be done for SPs,functions,triggers.
How many columns can inserted in table?
4096
But at a time only 1000rows can be inserted
48) Table sample in sql server?
Is used for getting the sample of rows from a result set. The sample is in the form of percentage.
Syntax::
Select table sample(%) from table_name.
49) What is CTE?
It is a virtual view which exists at run time. it is introduced in sql server 2008.
Is is mainly used for handing for hierarchal and recursive data.
50) What is difference between CTE and View?
CTE is a not a physical object and View is a physical object.
Write a query to get the no rows of a table without using count or any other clause.
There is the view sys.partitions is the view which contains the information about the rows in the table. By providing the object id of table you can get rows for table.
51) What are dynamic queries?
Dynamic Queries are the query in which logic can be change at run time. These queries are created by using the variables.
52) What is SQL Profiler?
SQL Profiler is a graphical tool that allows system administrators to monitor events in an instance of Microsoft SQL Server. You can capture and save data about each event to a file or SQL Server table to analyze later. For example, you can monitor a production environment to see which stored procedures are hampering performances by executing too slowly.
Use SQL Profiler to monitor only the events in which you are interested. If traces are becoming too large, you can filter them based on the information you want, so that only a subset of the event data is collected. Monitoring too many events adds overhead to the server and the monitoring process and can cause the trace file or trace table to grow very large, especially when the monitoring process takes place over a long period of time.
53) What is B-Tree?
The database server uses a B-tree structure to organize index information. B-Tree generally has following types of index pages or nodes:
Root node: A root node contains node pointers to only one branch node.
Branch nodes: A branch node contains pointers to leaf nodes or other branch nodes, which can be two or more.
Leaf nodes: A leaf node contains index items and horizontal pointers to other leaf nodes, which can be many.
54) Can SQL Servers linked to other servers like Oracle?
SQL Server can be linked to any server provided it has OLE-DB provider from Microsoft to allow a link. E.g. Oracle has an OLE-DB provider for oracle that Microsoft provides to add it as linked server to SQL Server group
55) What is BCP? When does it used?
Bulk Copy is a tool used to copy huge amount of data from tables and views. BCP does not copy the structures same as source to destination. BULK INSERT command helps to import a data file into a database table or view in a user-specified format.
Explain Few of the New Features of SQL Server 2008 Management Studio
SQL Server 2008 Microsoft has upgraded SSMS with many new features as well as added tons of new functionalities.
A few of the important new features are as follows:
A)IntelliSense for Query Editing: we will not have to remember all the syntax or browse online references. IntelliSense offers a few additional features besides just completing the keyword.
b)Multi Server Query: SSMS 2008 has a feature to run a query on different servers from one query editor window. First of all, make sure that you registered all the servers under your registered server. Once they are registered, right click on server group name and click New Query.
c)Query Editor Regions: When the T-SQL code is more than hundreds of lines, after a while, it becomes more and more confusing.
The regions are defined by the following hierarchy:
From first GO command to the next GO command.
d)Object Explorer Enhancements: In Object Explorer Detail, the new feature is Object Search. Enter any object name in the object search box and the searched result will be displayed in the same window as Object Explorer Detail.
Additionally, there are new wizards which help you perform several tasks, from policy management to disk monitoring. One cool thing is that everything displayed in the object explorer details screen can be right away copied and pasted to Excel without any formatting issue
e)Activity Monitors: There are four graphs
percent; Processor Time,
Waiting Tasks,
Database I/O,
Batch Requests/Sec
All the four tabs provide very important information; however, the one which I refer most is “Recent Expensive Queries.” Whenever I find my server running slow or having any performance-related issues, my first reaction is to open this tab and see which query is running slow.

SSIS Interview Questions

Interview Questions -SSIS

What is SQL Server Integration Services (SSIS)?
SQL Server Integration Services (SSIS) is component of SQL Server 2005 and later versions. SSIS is an enterprise scale ETL (Extraction, Transformation and Load) tool which allows you to develop data integration and workflow solutions. Apart from data integration, SSIS can be used to define workflows to automate updating multi-dimensional cubes and automating maintenance tasks for SQL Server databases.
How does SSIS differ from DTS?
SSIS is a successor to DTS (Data Transformation Services) and has been completely re-written from scratch to overcome the limitations of DTS which was available in SQL Server 2000 and earlier versions. A significant improvement is the segregation of the control/work flow from the data flow and the ability to use a buffer/memory oriented architecture for data flows and transformations which improve performance.
What is the Control Flow?
When you start working with SSIS, you first create a package which is nothing but a collection of tasks or package components. The control flow allows you to order the workflow, so you can ensure tasks/components get executed in the appropriate order.
What is the Data Flow Engine?
The Data Flow Engine, also called the SSIS pipeline engine, is responsible for managing the flow of data from the source to the destination and performing transformations (lookups, data cleansing etc.). Data flow uses memory oriented architecture, called buffers, during the data flow and transformations which allows it to execute extremely fast. This means the SSIS pipeline engine pulls data from the source, stores it in buffers (in-memory), does the requested transformations in the buffers and writes to the destination. The benefit is that it provides the fastest transformation as it happens in memory and we don’t need to stage the data for transformations in most cases.
What is a Transformation?
A transformation simply means bringing in the data in a desired format. For example you are pulling data from the source and want to ensure only distinct records are written to the destination, so duplicates are removed. Anther example is if you have master/reference data and want to pull only related data from the source and hence you need some sort of lookup. There are around 30 transformation tasks available and this can be extended further with custom built tasks if needed.
What is a Task?
A task is very much like a method of any programming language which represents or carries out an individual unit of work. There are broadly two categories of tasks in SSIS, Control Flow tasks and Database Maintenance tasks. All Control Flow tasks are operational in nature except Data Flow tasks. Although there are around 30 control flow tasks which you can use in your package you can also develop your own custom tasks with your choice of .NET programming language.
What is a Precedence Constraint and what types of Precedence Constraint are there?
SSIS allows you to place as many as tasks you want to be placed in control flow. You can connect all these tasks using connectors called Precedence Constraints. Precedence Constraints allow you to define the logical sequence of tasks in the order they should be executed. You can also specify a condition to be evaluated before the next task in the flow is executed.
• These are the types of precedence constraints and the condition could be either a constraint, an expression or both
o Success (next task will be executed only when the last task completed successfully) or
o Failure (next task will be executed only when the last task failed) or
o Complete (next task will be executed no matter the last task was completed or failed).
What is a container and how many types of containers are there?
• A container is a logical grouping of tasks which allows you to manage the scope of the tasks together.
• These are the types of containers in SSIS:
o Sequence Container – Used for grouping logically related tasks together
o For Loop Container – Used when you want to have repeating flow in package
o For Each Loop Container – Used for enumerating each object in a collection; for example a record set or a list of files.
• Apart from the above mentioned containers, there is one more container called the Task Host Container which is not visible from the IDE, but every task is contained in it (the default container for all the tasks).
What are variables and what is variable scope?
A variable is used to store values. There are basically two types of variables, System Variable (like ErrorCode, ErrorDescription, PackageName etc) whose values you can use but cannot change and User Variable which you create, assign values and read as needed. A variable can hold a value of the data type you have chosen when you defined the variable.
Variables can have a different scope depending on where it was defined. For example you can have package level variables which are accessible to all the tasks in the package and there could also be container level variables which are accessible only to those tasks that are within the container
What are SSIS Connection Managers?
When we talk of integrating data, we are actually pulling data from different sources and writing it to a destination. But how do you get connected to the source and destination systems? This is where the connection managers come into the picture. Connection manager represent a connection to a system which includes data provider information, the server name, database name, authentication mechanism,
What is the RetainSameConnection property and what is its impact?
Whenever a task uses a connection manager to connect to source or destination database, a connection is opened and closed with the execution of that task. Sometimes you might need to open a connection, execute multiple tasks and close it at the end of the execution. This is where RetainSameConnection property of the connection manager might help you. When you set this property to TRUE, the connection will be opened on first time it is used and remain open until execution of the package completes.
What are a source and destination adapters?
A source adaptor basically indicates a source in Data Flow to pull data from. The source adapter uses a connection manager to connect to a source and along with it you can also specify the query method and query to pull data from the source.
Similar to a source adaptor, the destination adapter indicates a destination in the Data Flow to write data to. Again like the source adapter, the destination adapter also uses a connection manager to connect to a target system and along with that you also specify the target table and writing mode, i.e. write one row at a time or do a bulk insert as well as several other properties.
Please note, the source and destination adapters can both use the same connection manager if you are reading and writing to the same database.
What is the Data Path and how is it different from a Precedence Constraint?
Data Path is used in a Data Flow task to connect to different components of a Data Flow and show transition of the data from one component to another. A data path contains the meta information of the data flowing through it, such as the columns, data type, size, etc. When we talk about differences between the data path and precedence constraint; the data path is used in the data flow, which shows the flow of data. Whereas the precedence constraint is used in control flow, which shows control flow or transition from one task to another task.
What is a Data Viewer utility and what it is used for?The data viewer utility is used in Business Intelligence Development Studio during development or when troubleshooting an SSIS Package. The data viewer utility is placed on a data path to see what data is flowing through that specific data path during execution.
What is an SSIS breakpoint? How do you configure it? How do you disable or delete it?A breakpoint allows you to pause the execution of the package in Business Intelligence Development Studio during development or when troubleshooting an SSIS Package. You can right click on the task in control flow, click on Edit Breakpoint menu and from the Set Breakpoint window, you specify when you want execution to be halted/paused. For example OnPreExecute, OnPostExecute, OnError events, etc. To toggle a breakpoint, delete all breakpoints and disable all breakpoints go to the Debug menu and click on the respective menu item. You can event specify different conditions to hit the breakpoint as well.
What is SSIS event logging?
Like any other modern programming language, SSIS also raises different events during package execution life cycle. You can enable or write these events to trace the execution of your SSIS package and its tasks. You can also can write your custom message as a custom log. You can enable event logging at the package level as well as at the tasks level. You can also choose any specific event of a task or a package to be logged. This is essential when you are troubleshooting your package and trying to understand a performance problem or root cause of a failure
What are the different SSIS log providers?
There are several places where you can log execution data generated by an SSIS event log:
o SSIS log provider for Text files
o SSIS log provider for Windows Event Log
o SSIS log provider for XML files
o SSIS log provider for SQL Profiler
o SSIS log provider for SQL Server, which writes the data to the msdb..sysdtslog90 or msdb..sysssislog table depending on the SQL Server version.
How do you enable SSIS event logging?
SSIS provides a granular level of control in deciding what to log and where to log. To enable event logging for an SSIS Package, right click in the control flow area of the package and click on Logging. In the Configure SSIS Logs window you will notice all the tasks of the package are listed on the left side of the tree view. You can specifically choose which tasks you want to enable logging. On the right side you will notice two tabs; on the Providers and Logs tab you specify where you want to write the logs, you can write it to one or more log providers together. On the Details tab you can specify what events do you want to log for the selected task.
Please note, enabling event logging is immensely helpful when you are troubleshooting a package, but also incurs additional overhead on SSIS in order to log the events and information. Hence you should only enabling event logging when needed and only choose events which you want to log. Avoid logging all the events unnecessarily.
What is the LoggingMode property?
SSIS packages and all of the associated tasks or components have a property called LoggingMode. This property accepts three possible values: Enabled – to enable logging of that component, Disabled – to disable logging of that component and UseParentSetting – to use parent’s setting of that component to decide whether or not to log the data.
What is the transaction support feature in SSIS?
When you execute a package, every task of the package executes in its own transaction. What if you want to execute two or more tasks in a single transaction? This is where the transaction support feature helps. You can group all your logically related tasks in single group. Next you can set the transaction property appropriately to enable a transaction so that all the tasks of the package run in a single transaction. This way you can ensure either all of the tasks complete successfully or if any of them fails, the transaction gets roll-backed too.
What properties do you need to configure in order to use the transaction feature in SSIS?Suppose you want to execute 5 tasks in a single transaction, in this case you can place all 5 tasks in a Sequence Container and set the TransactionOption and IsolationLevel properties appropriately.
o The TransactionOption property expects one of these three values:
 Supported – The container/task does not create a separate transaction, but if the parent object has already initiated a transaction then participate in it
 Required – The container/task creates a new transaction irrespective of any transaction initiated by the parent object
 NotSupported – The container/task neither creates a transaction nor participates in any transaction initiated by the parent object
• Isolation level dictates how two more transaction maintains consistency and concurrency when they are running in parallel
When I enabled transactions in an SSIS package, it failed with this exception:
“The Transaction Manager is not available. The DTC transaction failed to start.” What caused this exception and how can it be fixed?

SSIS uses the MS DTC (Microsoft Distributed Transaction Coordinator) Windows Service for transaction support. As such, you need to ensure this service is running on the machine where you are actually executing the SSIS packages or the package execution will fail with the exception message as indicated in this question.
What is event handling in SSIS?
Like many other programming languages, SSIS and its components raise different events during the execution of the code. You can write an even handler to capture the event and handle it in a few different ways. For example consider you have a data flow task and before execution of this data flow task you want to make some environmental changes such as creating a table to write data into, deleting/truncating a table you want to write, etc. Along the same lines, after execution of the data flow task you want to cleanup some staging tables. In this circumstance you can write an event handler for the OnPreExcute event of the data flow task which gets executed before the actual execution of the data flow. Similar to that you can also write an event handler for OnPostExecute event of the data flow task which gets executed after the execution of the actual data flow task. Please note, not all the tasks raise the same events as others. There might be some specific events related to a specific task that you can use with one object and not with others.
How do you write an event handler?
First, open your SSIS package in Business Intelligence Development Studio (BIDS) and click on the Event Handlers tab. Next, select the executable/task from the left side combo-box and then select the event you want to write the handler in the right side combo box. Finally, click on the hyperlink to create the event handler. So far you have only created the event handler, you have not specified any sort of action. For that simply drag the required task from the toolbox on the event handler designer surface and configure it appropriately.
What is the DisableEventHandlers property used for?
Consider you have a task or package with several event handlers, but for some reason you do not want event handlers to be called. One simple solution is to delete all of the event handlers, but that would not be viable if you want to use them in the future. This is where you can use the DisableEventHandlers property. You can set this property to TRUE and all event handlers will be disabled. Please note with this property you simply disable the event handlers and you are not actually removing them. This means you can set this value to FALSE and the event handlers will once again be executed.
What is SSIS validation?
SSIS validates the package and all of it’s tasks to ensure it has been configured correctly. With a given set of configurations and values, all the tasks and package will execute successfully. In other words, during the validation process, SSIS checks if the source and destination locations are accessible and the meta data about the source and destination tables are stored with the package are correct, so that the task will not fail if executed. The validation process reports warnings and errors depending on the validation failure detected. For example, if the source/destination tables/columns get changed/dropped it will show as error. Whereas if you are accessing more columns than used to write to the destination object this will be flagged as a warning.
Define design time validation versus run time validation.
• Design time validation is performed when you are opening your package in BIDS whereas run time validation is performed when you are actually executing the package.
Define early validation (package level validation) versus late validation (component level validation).
• When a package is executed, the package goes through the validation process. All of the components/tasks of package are validated before actually starting the package execution. This is called early validation or package level validation. During execution of a package, SSIS validates the component/task again before executing that particular component/task. This is called late validation or component level validation.
What is DelayValidation and what is the significance?
As I said before, during early validation all of the components of the package are validated along with the package itself. If any of the component/task fails to validate, SSIS will not start the package execution. In most cases this is fine, but what if the second task is dependent on the first task? For example, say you are creating a table in the first task and referring to the same table in the second task? When early validation starts, it will not be able to validate the second task as the dependent table has not been created yet. Keep in mind that early validation is performed before the package execution starts. So what should we do in this case? How can we ensure the package is executed successfully and the logically flow of the package is correct? This is where you can use the DelayValidation property. In the above scenario you should set the DelayValidation property of the second task to TRUE in which case early validation i.e. package level validation is skipped for that task and that task would only be validated during late validation i.e. component level validation. Please note using the DelayValidation property you can only skip early validation for that specific task, there is no way to skip late or component level validation.
What are the different components in the SSIS architecture?The SSIS architecture comprises of four main components:
o The SSIS runtime engine manages the workflow of the package
o The data flow pipeline engine manages the flow of data from source to destination and in-memory transformations
o The SSIS object model is used for programmatically creating, managing and monitoring SSIS packages
o The SSIS windows service allows managing and monitoring packages
How is SSIS runtime engine different from the SSIS dataflow pipeline engine?The SSIS Runtime Engine manages the workflow of the packages during runtime, which means its role is to execute the tasks in a defined sequence. As you know, you can define the sequence using precedence constraints. This engine is also responsible for providing support for event logging, breakpoints in the BIDS designer, package configuration, transactions and connections. The SSIS Runtime engine has been designed to support concurrent/parallel execution of tasks in the package.
The Dataflow Pipeline Engine is responsible for executing the data flow tasks of the package. It creates a dataflow pipeline by allocating in-memory structure for storing data in-transit. This means, the engine pulls data from source, stores it in memory, executes the required transformation in the data stored in memory and finally loads the data to the destination. Like the SSIS runtime engine, the Dataflow pipeline has been designed to do its work in parallel by creating multiple threads and enabling them to run multiple execution trees/units in parallel.
How is a synchronous (non-blocking) transformation different from an asynchronous (blocking) transformation in SQL Server Integration Services?
A transformation changes the data in the required format before loading it to the destination or passing the data down the path. The transformation can be categorized in Synchronous and Asynchronous transformation.
A transformation is called synchronous when it processes each incoming row (modify the data in required format in place only so that the layout of the result-set remains same) and passes them down the hierarchy/path. It means, output rows are synchronous with the input rows (1:1 relationship between input and output rows) and hence it uses the same allocated buffer set/memory and does not require additional memory. Please note, these kinds of transformations have lower memory requirements as they work on a row-by-row basis (and hence run quite faster) and do not block the data flow in the pipeline. Some of the examples are : Lookup, Derived Columns, Data Conversion, Copy column, Multicast, Row count transformations, etc.
A transformation is called Asynchronous when it requires all incoming rows to be stored locally in the memory before it can start producing output rows. For example, with an Aggregate Transformation, it requires all the rows to be loaded and stored in memory before it can aggregate and produce the output rows. This way you can see input rows are not in sync with output rows and more memory is required to store the whole set of data (no memory reuse) for both the data input and output. These kind of transformations have higher memory requirements (and there are high chances of buffer spooling to disk if insufficient memory is available) and generally runs slower. The asynchronous transformations are also called “blocking transformations” because of its nature of blocking the output rows unless all input rows are read into memory.
What is the difference between a partially blocking transformation versus a fully blocking transformation in SQL Server Integration Services?
Asynchronous transformations, as discussed in last question, can be further divided in two categories depending on their blocking behavior:
o Partially Blocking Transformations do not block the output until a full read of the inputs occur. However, they require new buffers/memory to be allocated to store the newly created result-set because the output from these kind of transformations differs from the input set. For example, Merge Join transformation joins two sorted inputs and produces a merged output. In this case if you notice, the data flow pipeline engine creates two input sets of memory, but the merged output from the transformation requires another set of output buffers as structure of the output rows which are different from the input rows. It means the memory requirement for this type of transformations is higher than synchronous transformations where the transformation is completed in place.
o Full Blocking Transformations, apart from requiring an additional set of output buffers, also blocks the output completely unless the whole input set is read. For example, the Sort Transformation requires all input rows to be available before it can start sorting and pass down the rows to the output path. These kind of transformations are most expensive and should be used only as needed. For example, if you can get sorted data from the source system, use that logic instead of using a Sort transformation to sort the data in transit/memory.
What is an SSIS execution tree and how can I analyze the execution trees of a data flow task?The work to be done in the data flow task is divided into multiple chunks, which are called execution units, by the dataflow pipeline engine. Each represents a group of transformations. The individual execution unit is called an execution tree, which can be executed by separate thread along with other execution trees in a parallel manner. The memory structure is also called a data buffer, which gets created by the data flow pipeline engine and has the scope of each individual execution tree. An execution tree normally starts at either the source or an asynchronous transformation and ends at the first asynchronous transformation or a destination. During execution of the execution tree, the source reads the data, then stores the data to a buffer, executes the transformation in the buffer and passes the buffer to the next execution tree in the path by passing the pointers to the buffers.
To see how many execution trees are getting created and how many rows are getting stored in each buffer for a individual data flow task, you can enable logging of these events of data flow task: PipelineExecutionTrees, PipelineComponentTime, PipelineInitialization, BufferSizeTunning, etc.
How can an SSIS package be scheduled to execute at a defined time or at a defined interval per day?You can configure a SQL Server Agent Job with a job step type of SQL Server Integration Services Package, the job invokes the dtexec command line utility internally to execute the package. You can run the job (and in turn the SSIS package) on demand or you can create a schedule for a one time need or on a reoccurring basis.
What is an SSIS Proxy account and why would you create it?When we try to execute an SSIS package from a SQL Server Agent Job it fails with the message “Non-SysAdmins have been denied permission to run DTS Execution job steps without a proxy account”. This error message is generated if the account under which SQL Server Agent Service is running and the job owner is not a sysadmin on the instance or the job step is not set to run under a proxy account associated with the SSIS subsystem.
How can you configure your SSIS package to run in 32-bit mode on 64-bit machine when using some data providers which are not available on the 64-bit platform?
In order to run an SSIS package in 32-bit mode the SSIS project property Run64BitRuntime needs to be set to “False”. The default configuration for this property is “True”. This configuration is an instruction to load the 32-bit runtime environment rather than 64-bit, and your packages will still run without any additional changes. The property can be found under SSIS Project Property Pages -> Configuration Properties -> Debugging.