Data Analysis as an Operational Resource
Over the past decade, there has been a huge increase in collected data by organizational enterprise. The information has been generated both from traditional sources such as computerized systems for tracking
operations, customers, and sales and new data sources such as website visits, social network chatter, and public records accessible over the Internet in digital form. This data explosion is an
untapped asset at most organizations which lack the tools and skills to exploit it. The challenge is to use data for fine-grained analysis of markets, customer behavior and operations, and transform
business operations more toward evidence-based decision-making. This market is a huge opportunity for software companies.
SAS Business Analytics
For the better part of 40 years, SAS Institute software has been the leader in the business intelligence software. Historically, this has been a niche market where SAS software is used to analyze huge
datasets and generate predictive statistical models for large corporations and government agencies. According to leading market research firms, over ninety percent of the largest companies 100 worldwide use SAS software. Developed in response to industry specific operational requirements, SAS has a significant advantage in its software technology, programming language, and tools. SAS has always invested heavily in research and development. The SAS stronghold is advanced analytics and predictive modeling software which uses historical and current data to model and predict future outcomes. However, SAS comparative advantage has been for the most part derived from the legacy world of statisticians and programmers.
Developed in the 1970's for the mainframe platform, SAS EIP: Enterprise Intelligence Platform is comprised of the business intelligence and analytic/data mining toolset and data integration
stack analytics. There are also versions available on other platforms: MS Windows and Linux-variants.
The challenge facing SAS is to transition and develop software for deployment to be used by information technology and business professionals in a global market. Towards that end, SAS has been moving toward the Internet model of software delivery as a service that customers access into over the web; some initial products have
Advances in Business Intelligence Software
Business intelligence software has become increasingly mainstream; there are now both free open source and commercial software alternatives to
SAS emerging in the market. Until recently SAS has been slow to recognize the challenge from free, open source alternatives to some of its products. The R programming language is a free programming language and set of software tools for statistical computing; it has become increasingly popular at universities and labs. Programs written with R work integrate with SAS technology and there is a
long-term commitment from management at SAS to work with the open source community. Red Hat is working with SAS Institute to optimize performance and I/O throughput for SAS software and applications running on the Red Hat Enterprise Linux-based operating system. The preliminary results have been excellent scalability in tests up to 64 cores on a single system.
In response to the growth and success of SAS software, new competition is emerging from major software companies to gain market share with lower prices and substitute technology. Oracle, SAP, Microsoft, and especially, IBM have invested considerable resources in bringing competitive business intelligence software to market. Oracle bought Hyperion and SAP bought Business Objects. In the
summer of 2009, IBM bought SPSS, a maker of predictive modeling software, and prior to that purchased Cognos. IBM has placed SPSS and Cognos into a new business analytics and optimization group. That business will be supported by 200 scientists, and the company has said it will retrain or hire 4,000 consultants and analysts to work in the group. IBM has stated that business intelligence
software is part of its strategy for growth.
Apache Hadoop and SAS Platform
The SAS platform software can be used to work with Hadoop. SAS combines analytics with Hadoopís ability to use commodity-based storage and perform distributed processing. Hadoop data can be accessed with the SAS language. SAS abstracts the complexity of Hadoop by making it function as a data source. Data stored in Hadoop can be consumed across the SAS software stack. SAS support
for Hadoop provides a framework to the information management life cycle with metadata, lineage, monitoring, federation and security capabilities. Data stored in Hadoop can be integrated with other
sources. There is SAS text mining and predictive analytics and business intelligence which can be applied to Hadoop data. There is a SAS Metadata Server which creates and manages metadata.
SAS/ACCESS makes Hive-based tables appear native to SAS. Development of analytics or data processes can be performed using SAS tools and run-time execution within either Hadoop or the SAS environment.
The advantages associated with SAS/ACCESS Interface to Hadoop includes:
- Integrating data stored in Hadoop with data from other sources.
- Directly accessing data with native interfaces.
- LIBNAME statement makes Hive tables look like SAS datasets.
- PROC SQL for performing explicit HiveQL commands into Hadoop.
- Leverage Hadoop data for existing SAS capabilities.
- Improve performance by minimizing data movement.
Hadoop Distributed Processing from within SAS
Hadoop functionality can be executed by enabling MapReduce programming, scripting, and the execution of HDFS commands from within the SAS environment. This complements SAS/ACCESS and Hive and extends support for Pig, MapReduce, and HDFS commands. By running a thin-layer SAS process on each Hadoop node, SAS commands have efficient throughput for large analytic workloads. Hadoop provides the distributing processing, memory management, and job flow control.
Base SAS integration with Hadoop includes support for external file reference to Hadoop files and parameters for file processing and procedures: PROC FREQ, PROC MEANS, PROC RANK, PROC REPORT, PROC SORT, PROC SUMMARY, and PROC TABULATE.
SAS Information Management provides:
- An interface to Hadoop which uses Pig, Hive, MapReduce, and HDFS commands inline.
- A visual editor and built-in syntax checker for Pig and MapReduce code.
- Submissions of Hive queries through PROC SQL, Base SAS, and other SAS components.
- The capability to create UDF: user-defined functions for deployment within HDFS.
- Apply data quality capabilities to Hadoop data.
- Data Integration Studio transformations to Hadoop data and building job flows.
SAS Training by SYS-ED
SYS-ED SAS courses teach industry standard subject matter in interrelated information technology required for efficient data management, data mining, and report generation such as:
- Credit card companies for detecting unusual buying patterns in real time, and spotting potentially fraudulent charges.
- Retail chains for tailoring pricing and product offerings down to the store level.
- Telecommunications companies for identifying the few thousand customers, among millions, that are most likely to switch to another cellphone carrier.
- Energy companies parsing sensor signals from oil rigs and combining that information with weather and structural data, to predict failure of parts.
Analysis SMF data to determine bottlenecks, I/O activity, CPU utilization, and contention issues.
SYS-ED's multi-platform SAS training teaches the Base SAS Procedures and the language enhancements.
||Invoking RFC-Remote Function Call or RFC-compatible functions on an SAP System from a SAS program.
||Creating, testing, and storing SAS functions and subroutines before using them in other SAS procedures.
||Provides diagnostic information to the user about the Java environment that SAS is using.
||Registering in batch mode, external functions that are written in the C or C++ programming languages.
||Specifying a filename or fileref that will contain the output of the SAS Code Analyzer and write output to the file.
||Reads XML input from a file that has a fileref and writes XML output to another file that has a fileref. The procedure PROC SOAP can run on any platform.
||The HTTP procedure invokes a Web service that issues requests.
SYS-ED's system consultant instructors average 30 years experience in information technology. We provide insights required for effective maintenance coding and web-based utilization of SAS
analytics with a variety of databases - IBM DB2, MS SQL Server, Oracle database platform, and operating system environments: IBM mainframe, MS Windows, and Red Hat Linux.
Few independent training companies have been providing SAS training longer or better than SYS-ED.
Qualifying a Request for SAS Training
A telephone consultation with our Director of Education is required prior to SYS-ED's accepting a SAS training assignment. As part of this process we will review the subject matter in relation to the system software
installation standards and provide a training plan in writing. Client-specific sample data, examples, and exercises are used to address query and reporting requirements with SAS software in increasingly
commercial and open source operating environments.
- How to code and utilize SAS procedures for data manipulation, information storage, information retrieval, statistical analysis, and report writing.
- How to utilize the Java client, XML Mapper, for importing and exporting XML documents to the SAS platform.
- How to use ODS: Output Delivery System for reporting, report formatting, and report delivery for: 1- Capturing data output from a Base procedure into a SAS dataset for additional processing or reporting.
2- Creating and separating datasets for a subgroup.
It is our prerogative to ensure that the employee has the prerequisite background for the training. Not everyone is allowed to enroll in a SYS-ED course.
Upon completion of a instructor-led course at the client location, it is standard policy to organize subject matter for future utilization in a web-based training infrastructure.