January, 2007
The Newspaper of the NYSSCPA
Vol. 10, No.1

Data Analysis: Tools and Techniques
Reducing the Time Needed for Fraud Detection

By Mudit Gupta, CPA

Imagine a scenario in which a CPA firm is providing annual audit services to a financial services company. As part of the fraud considerations, the firm is reviewing the adjusting and closing entries posted by the company’s controller’s office. These entries could be manual or systematic and could amount to approximately 100,000 entries that are above the materiality threshold. If firm’s audit team spends five minutes per entry, they will have to spend approximately 1,000 days just reviewing these entries. However, if the audit team uses the right data-analysis tools, it should take an average of just seven days to complete the entries’ review.

Data Analysis Versus Data Mining

Data analysis (DA) is an efficient method of analyzing large sets of data in a variety of fields, for internal, external and forensic audits. Most DA engagements involve working on existing data extracted by the IT departments of the audit client. Preparing the data for analysis can be a time-intensive task.

Data mining (DM) is defined as the process of automatically searching large volumes of data for patterns such as association rules. It is a generic term used to explain a variety of tasks involving the analysis of data.

DA, as it pertains to technology assurance, is an analytical and problem-solving process that identifies and interprets relationships among variables. It is used primarily to analyze data based on predefined relationships, while DM, as it pertains to computer science, is used to identify new relationships in an otherwise bland dataset.

More often than not, DA is considered as the knowledge to operate one of the DA tools (e.g., Microsoft Excel, etc.). Like auditing, DA needs a specific mindset as opposed to having merely the capability to use a given tool. It requires an analytical and problem-solving mindset with the ability to identify and interpret the relationships among the variables.

Successfully solving a DA problem requires a deep understanding of the definition and application of various elements of DA. Following are some of the key elements of DA.

The Nature of the Data

Knowing the nature of the data is necessary to determine the field formats (e.g., numeric, alphanumeric, Boolean, etc) to be used to store the data. Some programs truncate the leading zeroes from a numeric field automatically. Preprocessing knowledge of the data and the field formats will therefore allow the professional to define the field as an alphanumeric field.

The Size of the Data

Size is necessary to determine the DA tool that should be used for the problem at hand. For example, Microsoft Excel can hold a maximum of 65,536 records. Other tools, like ACL, can process approximately more than 1 million records.

Certain data types have a maximum limit on the length of data they can hold. For example, SQL decimal data type can hold a maximum value precision value of 38 and a maximum scale value of 8. Default values are the values stored in fields at the time of field creation. These values can be changed or overwritten by processing logic, but any values that remain unchanged will retain the default value of the field. For example, numeric fields are initialized to a default value of 0 to enforce data integrity. Some fields can be restricted to store only certain constants that can be used as a key to look up other fields in the dataset. For example, a basis field can contain “LC” or “C” denoting Lower of Cost/Market or Cost, respectively.

The Source of the Data

The source of the data is important, as it facilitates understanding its nature. For example, data originating from a business source is likely to contain more functional information, whereas IT data might include administrative information like user details and other system security parameters. Knowing the source helps in clearly defining field types, length, etc. The source of the data also becomes very important when combined with the end result. If the task at hand is to perform trade reconciliation, the expectation should be that the source data containing trade and account details will be primarily numeric. This helps in saving time, as the data can be reobtained in case it’s incorrect, earlier in the process and before too many resources are spent on the project.

Once data is obtained from the client source, it is important to perform control totals (record counts and dollar amounts) to validate that the data received is complete.

Problem Logic

Problem logic is the process with which the professional defines the issues and the objectives of the engagement. Naturally, many analysts see the problem logic as the center of any DA engagement. Generally, understanding problem logic includes defining the filtration criteria, the nature and attributes of key fields used to uniquely identify records, summarization criteria, elimination criteria and external relationships to other data bases.

Expected Results

DA for auditing is used to independently reperform the client’s procedures and reconcile back to the client output. Discrepancies may include audit differences and/or issues. Having a sample client output is important, as it indicates what the expected result should be and allows for effective designing of the logic and procedures. Performing DA for nonaudit work involves preparing some analytical output. Knowledge of granularity, classifications and presentations therefore helps in effectively designing the logic and procedures to be performed.

Examples of Data Analysis Tools

DA tools are out-of-the-box software that can be used to analyze data in a way to identify and interpret the relationship amongst variables, as described earlier. Knowledge of a DA tool alone cannot allow a person to effectively perform DA and related audit test procedures.

Professionals looking for prepackaged software can start by learning how to use one of several off-the-shelf products, such as ACL, IDEA, SAS, or Tableau. For a higher level and more granular control over the DA process, traditional programming languages can also be used to facilitate DA, including SQL, C# or C++. In addition, basic desktop software can be used to perform basic tasks. Typically, these are spreadsheets and database environment, such as Microsoft Excel and Access.

The Benefits of Data Analysis

A clear understanding of data analysis and some of its baseline rules can help attest professionals perform their work and reach their objectives with greater efficiency and effectiveness. The use of data analysis services adds value in both external and internal audit engagements. Proper application of data analysis helps professionals who perform audit work by streamlining resources while maintaining an effective audit process. Simultaneously, auditors can address the risk associated with the specific audit areas as required by professional standards.

Mudit Gupta, CPA, a senior IS auditor with Ernst & Young, is a member of the NYSSCPA’s Technology Assurance Committee. He can be reached at mudit.gupta@ey.com.

Home | Print Story | E-mail Story


Home
| About Us | Continuing Education | Future CPAs | Government Affairs | Professional Resources | Publications | Sound Advice | Tax Resources

Chapters | Committees | Member Center | Events Calendar | Classifieds | Careers | E-zine Subscriptions | The Trusted Professional | The CPA Journal



Search | Site Map | Become a Member | Jobs | Press Room | Contact Us | Feedback

©1997 - 2009 New York State Society of Certified Public Accountants. Legal Notices