This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature: Reviews

Explore your database with Talend Open Profiler

By James F. Koopmann on July 18, 2008 (7:00:00 PM)

Share    Print    Comments   

Over time, organizations replicate, migrate, or add complexity within database systems, often times losing control of the quality of their data. When applications begin to fail because of invalid, corrupted, or out-of-date data, the free, GPL-licensed Talend Open Profiler can give data analysts, database administrators (DBA), and business users the ability to research data structures and improve data quality. Through the use of Open Profiler, users can be alerted to hidden inconsistencies and incompatibilities between data sources and target applications. Through data analysis, business users and technical analysts can communicate both data structure and content needs.

Open Profiler offers functions for both technical and business users. It can quickly build statistics that reflects the usability of the information within a database table. As it finds corrupt or inconsistent data, it can scrub bad information from database structures. Additionally, Open Profiler simplifies the repetitive nature of statistical analysis while reducing labor costs and errors.

Getting familiar with Open Profiler

I found it easy to use Open Profiler to find the information needed to perform an analysis of columns. For only a few items did I find myself looking at documentation, and unfortunately was unable to find specifics on how certain aspects fit together. For instance, a metadata repository is supposed to keep a history of profiles, but nowhere did I find how this works.

Data profiling with Open Profiler is easy and is accomplished within a few simple steps. Talend provides executables for Windows, Solaris, Mac OS X, Linux, and AIX. Since I was using Linux I invoked TalendOpenProfiler-linux-gtk-x86 from the command prompt. When the application loads it displays a three-column panel. The left column shows a simple tree structure that allows you to explore and select past data profiling runs and drill down into database connections. The middle work panel is where you build, through selection of databases and objects, the analysis work for a data profiling run. The right help panel provides wizards for defining database connections and analysis runs.

To get started, the first thing you need to do is create a connection to a database by traversing the tree structure in the left panel from Metadata -> DB Connections. Right-click on DB Connections and click on "Create a new connection" to bring up a connection definition screen. Here you can enter the standard connection information for your database. For Oracle, for instance, you must enter in a login, password, hostname, port, and system ID (SID). Once you enter the DB Connection information, Open Profiler allows you to begin drilling down the tree structure, depending on your privileged access to the database. You can traverse through owners, tables, columns, attributes, and views for the specified database connection.

Once you've made a database connection you can being analyzing data in tables, which is the main purpose of this tool. As you traverse the tree structure you find the table you want to analyze, expand the tables columns, then highlight the columns to include for analysis. Right-click on them to bring up the option to analyze, then the Create New Analysis screen. Here you define the type of statistics to run on your selected columns by selecting indicators for each column, which will enable you to choose the type of statistics to use (Simple, Text, Summary, or Advanced). You can then run the analysis. For assistance in these steps, Talend provides the Open Profiler Getting Started Guide, which walks you through these steps and gives you an understanding of the options involved. I wish this manual went a bit more in depth, but it will get you started quickly.

I tested Talend Open Profiler on a CentOS 5 system configured with Java version 1.6.0_06, Perl v5.8.8, and an Oracle 11g database. The download and installation process was quick and easy; from download to GUI took about 10 minutes.

In my testing, the refresh rate could have been a bit better. I noticed some table, view, and column counts that did not seem to be updated until I traversed out and then back into the tree structure. And I'd like to see Talend add support for database connections besides Oracle and MySQL.

Data analysts, DBAs, and business professionals constantly investigate the validity of data. Usually the investigation requires not only a specialized skill set but also a highly paid technical professional or expensive tool set. Talend Open Profiler effectively brings the data closer to business users through a common interface with analysts and DBAs.

Share    Print    Comments   

Comments

on Explore your database with Talend Open Profiler

Note: Comments are owned by the poster. We are not responsible for their content.

Explore your database with Talend Open Profiler

Posted by: Yves de Montcheuil on July 24, 2008 05:57 PM
James - thanks for this great review. I would like to comment on a few of your points, if you don't mind.
First, let's keep in mind that this is still a version 1.0 of Talend Open Profiler, that was released not even a moth ago. This explains its meager documentation, but we are in the process of fixing this.
You also said you wished there were more databases supported. We fully agree with this. We simply started with the ones that were the most in demand from our community. Others are in the works, and BTW if anyone wants a specific database support, please ask it through Talend's BugTracker: http://www.talendforge.org/bugs. We are user driven and will do what our users need!
Regards,

Yves de Montcheuil
Talend

#

Explore your database with Talend Open Profiler

Posted by: Anonymous [ip: 72.19.171.102] on July 31, 2008 02:28 AM
Yves - Agreed, this is a v1. And I'll add a very nice v.1, not buggy at all, and look forward to v.2

#

Free Webinar: Leverage open source for cleansing your data with a demo of Talend Open Profiler

Posted by: Anonymous [ip: 88.176.176.60] on August 01, 2008 12:54 PM
Do you want to learn more about Talend Open Profiler? Are you interested in seeing a live demo of the tool? Reserve your Webinar seat now at: https://www1.gotomeeting.com/register/472014645

To view a full calendar of Talend Webinars:
http://www.talend.com/campaign/campaign.php?id=125&src=PostCalendarWebinar

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya