Publications

Intelligent
SAS®
Log
Manager
The SAS log generated during program execution is one of the richest sources of application health, especially when dealing with macros which build and invoke code dynamically. Explicitly sending messages to the Log Manager at critical steps throughout the application one may disable all low-level SAS feedback and present only relevant, highly specialized information to the SAS log. One is led immediately to the source of both compile-time and run-time errors. Resulting is a clean, well-structured and specific trace of application flow throughout the entire job submitted.
presentation  article     macro code  (  SAS  )  examples
Presented at the Twenty-sixth International SAS User Group Conference, Long Beach, California.
Presented at the Thirteenth Regional Western SAS User Group Conference, San Jose, California.

Creating
Efficient
SQL
Schema Query Language is a very flexible, general grammar for fetching and processing large amounts of data directly at the database level. Given this flexibility, there are many ways of achieving the same result. Some approaches yield terribly poor database performance especially when simple processing steps are overlooked. Most notable are incorrect uses of the ORDER BY and DISTINCT statements, whose often desired results almost always lead to sub-optimal database behavior. It is instructive to think about any “table”, which are named in the FROM clause, as just another type of “object” in an object-oriented sense. Tables consist of one or more fields or columns, each having well defined types, and when related or joined to each other hold references between themselves. An SQL statement is thus nothing more than an object model diagram of sorts. In that regard, I recommend the following sequence of four simple steps when building any SQL query statement.
presentation  article
Presented at the Twenty-seventh International SAS User Group Conference, Orlando, Florida.
Presented at the Twelfth Regional Western SAS User Group Conference, Pasadena, California.

Demeter
in
the
Database
Relational database normal form is nothing more than variable dereference. Used properly by following the two most important rules of Object-Oriented design, separation and encapsulation, one's database model will always work well. Violating the Law of Demeter enforces a concrete implementation on an otherwise abstract model, effectively multiplying in cartesian fashion weakly related variables and needlessly increasing table object size with redundant information. Instead, refer to table objects no more than one level away from each other. Empower each level in the model heirarchy with the privaledge and responsibility it deserves, giving each table suitably named verbs (methods), behavior and characteristics (fields). In the following example, an idea of Consumption & Replenish, we will create a dynamically adapting database model, helping Farmer Jack predict when he needs to order more fertilizer to feed each of his many stalks.
presentation  article
Presented at the Twenty-eighth International SAS User Group Conference, Seattle, Washington.

Dynamic
Macro
Invocation
Utility
Often times it is necessary to repeatedly call a macro with only minor differences in its argument list. Most common is the application of producing multiple graphs panelized into an aggregate. The following simple macro %invoke() builds an argument list with a collection of index number suffixed positional parameters and calls a desired macro with a subset of these parameters as needed until all parameters have been accomodated. Resulting is a set of output having identical structure with customized content, all for the price of a single macro call.
presentation  article
Presented at the Twenty-ninth International SAS User Group Conference, Montreal, Canada.

Union Join
without
the
Union Clause
Following good object oriented practice of building query models we address the issue of poor query performance during use of the UNION clause. Although column-wise joins are rather well understood, the notion of the UNION or row-wise join is often perceived incorrectly as a gluing together of independent fullselect result sets rather than via model relations through higher levels of abstraction. This perception is excusable because there often does not exist suitable abstraction tables or normalization key values, and we must think creatively how a necessary abstraction can be derived. In the example to follow we help farmer Jack analyze taste and laboratory test data to determine the quality of his bean stalks.
presentation  article     handout  (  SQL  DDL  )
Presented at the Twenty-ninth International SAS User Group Conference, Montreal, Canada.

The
De-normalize
Transpose
Normalization and warehousing are battle cries of two waring factions. Although vertical or row-wise designs ultimately minimize storage access and retrieval time, the law of the relational model precludes simultaneous access from more than one table row. Quite often, we desire knowledge of two variables for the same aggregate observation, when performing high level arithmetic or reporting spreadsheet like tablature. This article presents an efficient query method for turning rows into columns at SQL run time, which farmer Jack uses for his bean quality audit policy, and discusses some finer points which make the transposition process database system independent.
presentation  article     handout
Presented at the 2005 Regional Pharmaceutical SAS User Group Conference, Phoenix, Arizona.

WHERE
The Statement
Is Not
The Clause
Improving clock time and running smaller are goals for any programmer. In the SAS® Data step, it is well known that certain filtering expressions placed in a WHERE statement often lead to better processing efficiency. These expressions are simple "static" equalities, those without arithmetic operations or function calls. The Data step engine processes these statements at a very low level – as observations are being read in from the input dataset – so that later and more complicated Data step operations have as few rows as possible to deal with. Therefore a SAS programmer learns early on to do their easy work – first things, first – in the WHERE statement.

This is unfortunate. Folks in the RDBMS camp know that the FROM clause is the first point of contact to one's data model. Only after processing all specifications and expressions in the FROM clause will the database system go on to the WHERE clause. In SAS Proc SQL code, position of the data filters (e.g., WHERE gndr eq "M";) can affect how fast the query runs and how much disk space it takes while running. Proper coding of the data filter allows Proc SQL to "delegate" the filtering to a "lower level" internal SAS subroutine called the Data Engine.

We learn in this paper that
   in SAS, the data step WHERE is a statement, and processed first
   but in SQL, the query WHERE is a clause, and processed last.

This paper suggests, and justifies, that SAS SQL programmers can often improve performance by adopting a few simple rules and coding style. Many annotated demonstrations, and a few humerous examples, are included as well as on the companion web site.
presentation  article
Presented at the Thirteenth Regional Western SAS User Group Conference, San Jose, California.

Presentations  Intelligent SAS Log Manager  Creating Efficient SQL  Demeter in the Database  Dynamic Macro Invocation Utility  Union Join without the UNION Clause  The De-normalize Transpose  WHERE The Statement Is Not The Clause