In the digital age, businesses are flooded with data—and the pace is only accelerating. To keep up with market changes,make smarter decisions, and drive growth, organizations need to analyze data from multiple sources effectively. But how do you capture, move, and store these massive datasets in relationaldatabases without bottlenecks?
Enter Apache Hadoop, a powerhouse for big data processing and storage. Its HDFS (Hadoop Distributed File System) is designed to handle petabytes of data across distributed environments. In this article, we’ll show you how to migrate large-scale data between Microsoft SQL Server and HDFS using SSIS, bridging the gap between traditional databases and modern big data solutions.
In this guide, you’ll discover how to seamlessly transfer data from Microsoft SQL Server Database to HDFS
using SSIS (SQL Server Integration Services).
Now we check both Hadoop Web Interface and Linux Environment.
Figure: 1-Hadoop Web Interface
Figure: 2-Linux
Clearly we see there is no data file in the hdfs folder /data/data
Create a new project in Visual Studio. The name of our project is Migrate_Data_between_HDFS_and_SQL. For creating project in details follow my project Import and Export Excel data into SQL Server using SQL Server Integration Services(SSIS)
SSIS packages enable efficient data exchange between HDFS and SQL Server using these two methods:
2. Create SQL Server Connection
For creating SQL Server Connection in details follow my project Import and Export Excel data into SQL Server
using SQL Server Integration Services(SSIS)
3. Create a Data Flow Task
With the Data Flow Task, you can effortlessly transfer data from one place to another—whether it’s from databases,
files, or other sources—and transform it along the way to meet your requirements.
Drag the data flow task from the SSIS toolbox to Control Flow and double-click it to go to Data Flow.
Figure: 3-SSIS Toolbox and Data Flow Task
Now we drag and drop OLE DB Source into Data Flow Task and connect it to the SalesOrderDetail table in the SampleDB database. Again we drag and drop HDFS File Destination tool, double-clicking on it to go HDFS File Destination Editor dialogbox.
Figure: 4-HDFS File Destination Editor
Upon clicking the 'New...' button, the following dialog box appears.
Figure: 5-Hadoop Connection Manager Editor
In this dialog box, select WebHDFS as the connection type, populate all required input fields, and click the Test Connection button to verify the configuration, as demonstrated in Figure: 5 and click OK.
Configure the HDFS File Destination Editor (Figure 4) with the following settings:
4. Execution of Package
To ensure the data export process is successful:
Execute the SSIS package and wait for completion.
Figure: 6-Executed Data Dlow
Once the SSIS package execution completes successfully, you can verify the successful export of data by checking both the Hadoop Web Interface and the Linux environment.
Figure: 7-Data import verified using the Hadoop Web Interface.
Also, the text file is exported successfully into the Linux Environment.
Figure: 8-File exported successfully into Linux.
Next, we load data from the SalesOrder.txt file (stored in HDFS) into the SalesOrder table
within the SampleDB SQL Server database.
SalesOrder
Figure: 9-SalesOrder table
1. Create a New Package:
2. Add a Data Flow Task:
3. Configure the Data Flow:
4. Configure the HDFS File Source:
5. Configure the OLE DB Destination:
6. Verify the Configuration:
7. Execution of The Package.
8. Verify the results
Conclusion
In this article, we discover how to leverage SSIS Hadoop components at the data flow
level to effortlessly move data in and out of our on-premise Hadoop cluster.
This article provides a step-by-step walkthrough for importing and exporting data with ease.

Image:freepik