About Me

I now work for Microsoft Federal in Chevy Chase, MD.

Dedicated to providing customer-driven, results-focused solutions to the complex business problems of today... and tomorrow.

At SQLTrainer.com, LLC  we understand that the technical challenges faced by businesses today are much greater in both scope and complexity than they have ever been. Businesses today are faced with understanding both local IT infrastructures as well as cloud-based technologies.

What is SQLTrainer.com?

Founded in 1998 by Ted Malone, SQLTrainer.com is a technical consulting, training and content development firm dedicated to the following core principles:

  • Technology Alone is NOT the Answer! - Implementing a particular technology because it is interesting or "cool" will not solve customer problems.
  • Technology Solutions do NOT need to be Overly Complex! - Many times developers and technical practitioners will attempt to build the most clever solution possible. While this serves to stroke the egos of those involved, it doesn't provide a maintainable solution.
  • Consultants Should be Mentors First! - When looking to hire an external consultant, businesses should look to the consultant who's willing to train themselves out of a paycheck.

Why the name, SQLTrainer.com?

SQL (pronounced See-Quell) stands for Structured Query Language, which is at the heart of every modern-day relational database system. Since many technology solutions today rely on some form of database storage or interaction, it was only logical to find a way to incorporate SQL into the name of the organization. Given that one of our core principles is to be a mentor/training above everything, the name SQLTrainer made sense. Since we also wanted to represent our embracing of the cloud, it seemed logical to add the ".com", referring to the biggest "cloud" of them all.

Live Feeds

RSS Feeds RSS
Thursday, August 21, 2014 11:30:00 AM

I am pleased to announce previews of new NoSQL database and search services and the evolution of our Hadoop-based service. Available as previews today are Azure DocumentDB, a fully-managed transactional NoSQL document database-as-a-service, and Azure Search, which enables developers to easily add search capabilities to mobile and cloud applications. Generally available today, Azure HDInsight, our Hadoop-based solution for the cloud, now supports Apache HBase clusters.

With these new and updated services, we’re continuing to make it easier for customers to work with data of any type and size – using the tools, languages and frameworks they want to -- in a trusted cloud environment. From Microsoft products like Azure Machine Learning, Azure SQL Database and Azure HDInsight to data services from our partners, we’re committed to supporting the broadest data platform so our customers get data benefits, in the cloud, on their terms.

Preview of Azure DocumentDB

Applications today must support multiple devices, multiple platforms with rapid iterations from the same data source, and also deliver high-scale and reliable performance. NoSQL has emerged as the leading database technology to address these needs. According to Gartner inquiries, flexible data schemas and application development velocity are cited as primary factors influencing adoption. Secondary factors attracting enterprises are global replication capabilities, high performance and developer interest.*

However, while NoSQL technologies address some document database needs, we’ve been hearing feedback that customers want a way to bridge document database functionality with the transactional capabilities of relational databases. Azure DocumentDB is our answer to that feedback – it’s a NoSQL document database-as-a-service that provides the benefits of a NoSQL document database but also adds the query processing and transaction semantics common to relational database systems.

Built for the cloud, Azure DocumentDB natively supports JSON documents enabling easy object mapping and iteration of data models for application development. Azure DocumentDB offers programming libraries for several popular languages and platforms, including.Net, Node.js, JavaScript, and Python. We will be contributing the client libraries to the open source community, so they can incorporate improvements into the versions published on Azure.com.

One DocumentDB customer, Additive Labs, builds online services to help their customers move to the cloud. "DocumentDB is the NoSQL database I am expecting today,” said Additive Labs Founder Thomas Weiss. “The ease and power of SQL-like queries had me started in a matter of minutes. And the ability to augment the engine’s behavior with custom JavaScript makes it way easier to adapt to our customers’ new requirements.”

Preview of Azure Search

Search has become a natural way for users to interact with applications that manage volumes of data.  However, managing search infrastructure at scale can be difficult and time consuming and often requires specialized skills and knowledge. Azure Search is a fully-managed search-as-a-service that customers can use to integrate complete search experiences into applications and connect search results to business objectives through fine-tuned, ranking profiles. Customers do not have to worry about the complexities of full-text search or deploying, maintaining or managing a search infrastructure.

With Azure Search developers can easily provision a search service, quickly create and tune one or more indexes, upload data to be indexed and start issuing searches. The service offers a simple API that’s usable from any platform or development environment and makes it easy to integrate search into new or existing applications. With Azure Search, developers can use the Azure portal or management APIs to increase or decrease capacity in terms of queries per second and document count as load changes, delivering a more cost effective solution for search scenarios. 

Retail platform provider Xomni is already using Azure Search to help the company manage its cloud infrastructure. "We have the most technically advanced SaaS solution for delivering product catalogue data to the retail industry in the market today,” said Xomni CTO Daron Yondem. “Integrating Azure Search into our platform will help solidify our leadership as datasets and faceted search requirements evolve over time."

General availability of Apache HBase for HDInsight

In partnership with Hortonworks, we’ve invested in the Hadoop ecosystem through contributions across projects like Tez, Stinger and Hive. Azure HDInsight, our Hadoop-based service, is another outcome of that partnership.

Azure HDInsight combines the best of Hadoop open source technology with the elasticity and manageability that enterprises require. Today, we’re making generally available HBase as a managed cluster inside HDInsight. HBase clusters are configured to store data directly in Azure Blob storage. For example, customers can use HDInsight to analyze large datasets in Azure Blobs generated from highly-interactive websites or can use it to analyze sensor and telemetry data from millions of end points.

Microsoft data services

Azure data services provide unparalleled choice for businesses, data scientists, developers and IT pros with a variety of managed services from Microsoft and our partners that work together seamlessly and connect to our customers’ data platform investments-- from relational data to non-relational data, structured data to unstructured data, constant and evolving data models. I encourage you to try out our new and expanded Azure data services and let us know what you think.

*Gartner, Hype Cycle for Information Infrastructure, 2014, Mark Beyer and Roxane Edjlali, 06 August 2014

 

###

Thursday, July 31, 2014 10:00:00 AM

PASS VP of Marketing Denise McInerney – a SQL Server MVP and Data Engineer at Intuit – began her career as a SQL Server DBA in 1998 and attended her first PASS Summit in 2002. The SQL Server Team caught up with her ahead of this year’s event, returning to Seattle, WA, Nov. 4-7, to see what she’s looking forward to at the world’s largest conference for SQL Server and BI professionals.

For those who’ve never attended or who’ve been away for a while, what is PASS Summit?
PASS Summit is the world’s largest gathering of Microsoft SQL Server and BI professionals. Organized by and for the community, PASS Summit delivers the most technical sessions, the largest number of attendees, the best networking, and the highest-rated sessions and speakers of any SQL Server event.

We like to think of PASS Summit as the annual reunion for the #sqlfamily. With over 200 technical sessions and 70+ hours of networking opportunities with MVPs, experts and peers, it’s 3 focused days of SQL Server. You can take hands-on workshops, attend Chalk Talks with the experts, and get the answers you need right away at the SQL Server Clinic, staffed by the Microsoft CSS and SQLCAT experts who build and support the features you use every day. Plus, you can join us early for 2 days of pre-conference sessions with top industry experts and explore the whole range of SQL Server solutions and services under one roof in the PASS Summit Exhibit Hall.

Nowhere else will you find over 5,000 passionate SQL Server and BI professionals from 50+ countries and 2,000 different companies connecting, sharing, and learning how to take their SQL Server skills to the next level.

What’s on tap this year as far as sessions?
We’ve announced a record 160+ incredible community sessions across 5 topic tracks: Application and Database Development, BI Information Delivery, BI Platform Architecture, Development and Administration; Enterprise Database Administration and Deployment, and Professional Development. And watch for over 60 sessions from Microsoft’s top experts to be added to the lineup in early September.

You can search by speaker, track, session skill level, or session type – from 10-minute Lightning Talks, to 75-minute General Sessions, to 3-hour Half-Day Sessions and our full-day pre-conference workshops.

And with this year’s new Learning Paths, we’ve made it even easier to find the sessions you’re most interested in. Just use our 9 Learning Path filters to slice and dice the lineup by everything from Beginner sessions to Big Data, Cloud, Hardware Virtualization, and Power BI sessions to SQL Server 2014, High Availability/Disaster Recovery, Performance, and Security sessions.

Networking is at the heart of PASS Summit – what opportunities do you have for attendees to connect with each other?
PASS Summit is all about meeting and talking with people, sharing issues and solutions, and gaining knowledge that will make you a better SQL Server professional. Breakfasts, lunches, and evening receptions are all included and are designed to offer dedicated networking opportunities. And don't underestimate the value of hallway chats and the ability to talk to speakers after their sessions, during lunches and breaks, and at the networking events.

We have special networking activities for first-time attendees, for people interested in the same technical topics at our Birds of a Feather luncheon, and at our popular annual Women in Technology luncheon, which connects 600+ attendees interested in advancing role of women in STEM fields. Plus, our Community Zone is THE place to hang out with fellow attendees and community leaders and learn how to stay involved year-round.

You mentioned the networking events for first-time attendees. With everything going on at Summit, how can new attendees get the most out of their experience?
Our First-Timers Program takes the hard work out of conference prep and is designed specifically to help new attendees make the most of their time at Summit. We connect first-timers with conference alumni, take them inside the week with community webinars, help them sharpen their networking skills through fun onsite workshops, and share inside advice during our First Timers orientation meeting.

In addition, in our “Get to Know Your Community Sessions,” longtime PASS members share how to get involved with PASS and the worldwide #sqlfamily, including encouraging those new to PASS to connect with their local SQL Server communities through PASS Chapters and continue their learning through Virtual Chapters, SQLSaturdays, and other free channels.

How can you learn more about sessions and the overall PASS Summit experience?
A great way to get a taste of Summit is by watching PASS Summit 2013 sessions, interviews, and more on PASStv. You can also check out the best of last year’s Community blogs.

Plus, stay tuned for 24 Hours of PASS: Summit Preview Edition on September 9 to get a free sneak peek at some of the top sessions and speakers coming to PASS Summit this year. Make sure you follow us on Twitter at @PASS24HOP / #pass24hop for the latest updates on these 24 back-to-back webinars.

Where can you register for PASS Summit?
To register, just go to Register Now – and remember to take advantage of the $150 discount code from your local or Virtual PASS Chapter. We also have a great group discount for companies sending 5 or more employees. And don’t forget to purchase the session recordings for year-round learning on all aspects of SQL Server.

Once you get a taste for the learning and networking waiting for you at PASS Summit, we invite you to join the conversation by following us on Twitter (watch the #sqlpass #summit 14 hashtags) and joining our Facebook and LinkedIn groups. We’re looking forward to an amazing, record-breaking event, and can’t wait to see everyone there!

Please stay tuned for regular updates and highlights on Microsoft and PASS activities planned for this year’s conference. 

Wednesday, July 30, 2014 10:00:00 AM

This blog post was authored by: Sahaj Saini, PM on the Microsoft Analytics Platform System (APS) team.

In this blog post, we’ll provide a quick overview of Symmetric Multi-Processing (SMP) vs. Massively Parallel Processing (MPP) systems, how to identify triggers for migrating from SMP to MPP, key considerations when moving to Microsoft Analytics Platform System (APS), and a discussion about how to take advantage of the power of an MPP solution such as APS.

Let us begin with a scenario. Emma is the Database Administrator at Adventure Works Cycles, a bicycle manufacturing company. At Adventure Works, Emma and her team are using traditional SQL Server SMP as their data warehousing solution. The company has been growing rapidly and with growing competition in the bicycle industry, the business analysts at Adventure Works Cycles would like quicker insight into their data. Emma is now facing the following challenges with the SMP deployment –

  • High Data Volume and Data Growth: With increasing sales and a growing customer base, the data volume has grown rapidly to cross 10 TB.
  • Longer Data Loading/ETL times: With the need to produce daily reports to management, Emma finds the current ETL speed inadequate to intake and process the increasing quantity of data flowing from other OLTP and non-relational systems.
  • Slow Query Execution: Query execution times are slowing down due to the increase of data and it is becoming increasingly difficult to generate insights for daily reporting in a timely manner.
  • Long Cube Processing Time: With the current cube processing time, it is difficult to meet the real-time reporting needs of the company.

In order to overcome these challenges, Emma and her team evaluate the purchase of a larger, expensive and more powerful set of server and storage hardware to their datacenter. This approach would solve their problem but only for the short-term as the data growth is expected to explode in the next 12 months. With data growth that Adventure Works is expecting to see, even the bigger and more powerful SMP solutions would hit a wall very quickly. Emma would like to see a solution that scales as their data needs grow.

What’s the difference between SMP and MPP?

Before we jump into solving Emma’s problems, let’s quickly define what SMP and MPP are. Symmetric Multi-Processing (SMP) is a tightly coupled multiprocessor system where processors share resources – single instances of the Operating System (OS), memory, I/O devices and connected using a common bus. SMP is the primary parallel architecture employed in servers and is depicted in the following image.

Massively Parallel Processing (MPP) is the coordinated processing of a single task by multiple processors, each processor using its own OS and memory and communicating with each other using some form of messaging interface. MPP can be setup with a shared nothing or shared disk architecture.

In a shared nothing architecture, there is no single point of contention across the system and nodes do not share memory or disk storage. Data is horizontally partitioned across nodes, such that each node has a subset of rows from each table in the database. Each node then processes only the rows on its own disks. Systems based on this architecture can achieve massive scale as there is no single bottleneck to slow down the system. This is what Emma is looking for.

MPP with shared-nothing architecture is depicted in the following image.

Microsoft Parallel Data Warehouse (PDW) running on a Microsoft Analytics Platform System appliance is implemented as an MPP shared-nothing architecture. It consists of one control node and storage attached compute nodes inter-connected by Ethernet and Infiniband. The control node hosts the PDW engine – the brains of the MPP system – that creates parallel query plans, co-ordinates query execution on compute nodes, and data aggregation across the entire appliance. All nodes, including control and compute, host a Data Movement Service (DMS) to transfer data between nodes.

For more details on PDW architecture, you can read the Architecture of the Microsoft Analytics Platform System post.

Transitioning to MPP

To realize the value offered by MPP, Emma and her team purchase a Microsoft APS appliance and begin transitioning to MPP. Let’s take a look at how they adapt their solution to take full advantage of APS’s shared nothing MPP architecture.

Table Design

As previously mentioned, APS is based on a shared nothing MPP architecture which means that nodes are self-sufficient and do not share memory or disks. The architecture, therefore, requires you to distribute your large tables across nodes to get the benefits of the massively parallel processing. APS allows the definition of a table as either distributed or replicated. The decision to choose one versus the other depends on the volume of data and the need for access to all of the data on a single node.

Distributed Tables

A distributed table is one where row data within the table is distributed across the nodes within the appliance to allow for massive scale. Each row ends up in a one distribution in one compute node as depicted by the image below.

To take advantage of the distributed nature of APS, Emma modifies the large tables, typically Fact and large dimension tables, to be distributed in APS as follows:

CREATE TABLE [dbo].[FactInternetSales]
(
  [ProductKey] [int] NOT NULL,
  [OrderDateKey] [int] NOT NULL,
  .
  .
  [ShipDate] [datetime] NULL
) 
WITH
(
  DISTRIBUTION = HASH(ProductKey),
CLUSTERED COLUMNSTORE INDEX
);

As you can see, this is a typical DDL statement for table creation with a minor addition for distributed tables. Tables are distributed by a deterministic hash function applied to the Distribution Column chosen for that table. Emma chooses Product Key as the distribution column in the FactInternetSales table because of the high cardinality and absence of skew, therefore distributing the table evenly across nodes.

Replicated Tables

If all tables were distributed, however, it would require a great deal of data movement between nodes before performing join operations for all operations. Therefore, for smaller dimension tables such as language, countries etc. it makes sense to replicate the entire table on each compute node. That is to say, the benefits of enabling local join operations with these tables outweigh the cost of extra storage consumed. A replicated table is one that is replicated across all compute nodes as depicted below.

Emma designs the small tables, typically dimension tables, to be replicated as follows:

 CREATE TABLE [dbo].[DimDate](
  [DateKey] [int] NOT NULL,
  .
  .
  [SpanishDayNameOfWeek] [nvarchar](10) NOT NULL,
)
WITH
(
CLUSTERED COLUMNSTORE INDEX
);

By appropriately designing distributed and replicated tables, Emma aligns her solution with common MPP design best practices and enables efficient processing of high volumes of data. For example, a query against 100 billion rows in a SQL Server SMP environment would require the processing of all of the data in a single execution space. With MPP, the work is spread across many nodes to break the problem into more manageable and easier ways to execute tasks. In a four node appliance (see the picture above), each node is only asked to process roughly 25 billion rows – a much quicker task. As a result, Emma observes significant improvements to the query execution time and her business can now make better decisions, faster. Additionally, Emma can grow the data warehouse to anywhere from a few terabytes to over 6 petabytes of data in by adding “scale units” to APS.

Data Loading

With SQL Server SMP, Emma and her team were using ETL processes via a set of SSIS packages to load data into the data warehouse – (1) Extracting data from the OLTP and other systems; (2) Transforming the data into dimensional format; and (3) Loading the data to target dimension or fact tables in the Data Warehouse. With increasing volumes of data, the SSIS sever in the middle becomes a bottleneck while performing transformations, resulting in slow data loading.

With APS, Emma and her team can use ELT instead, to Extract the data from the OLTP and other systems and Load it to a staging location on APS. Then, the data can be Transformed into dimensional format not with SSIS but with the APS Engine utilizing the distributed nature of the appliance and the power of parallel processing. In a 4-node appliance, four servers would be doing the transformations on subsets of data versus the single node SSIS server.

This parallel processing results in a significant boost in data loading performance. Emma can then use the Create Table As Select (CTAS) statement to create the table from the staging table as follows.

CREATE TABLE [dbo].[DimCustomer] 
WITH
(
  CLUSTERED COLUMN INDEX,
  DISTRIBUTION = HASH (CustomerKey)
)
AS
SELECT * FROM [staging].[DimCustomer];

By switching to an ELT process, Emma utilizes the parallel processing power of APS to see performance gains in data loading.

In conclusion, Emma and her team have found answers to their SMP woes with MPP. They can now feel confident handling the data volume and growth at Adventure Works with the ability to scale the data warehouse as needed. With ELT and the power of parallel processing in APS, they can load data into APS faster and within the expected time-window. And by aligning with APS’s MPP design, they can achieve breakthrough query performance, allowing for real-time reporting and insight into their data.

Visit the Analytics Platform System page to access more resources including:  datasheet, video, solution brief, and more..

To learn more about migration from SQL Server to the Analytics Platform System

Thursday, July 24, 2014 1:00:00 PM

Today, we are making available a new version of SQL Server Migration Assistant (SSMA), a free tool to help customers migrate their existing Oracle databases to SQL Server 2014. Microsoft released SQL Server 2014 earlier this year, after months of customer testing, with features such as In-Memory OLTP to speed up transaction performance, In-Memory Columnstore to speed up query performance, and other great hybrid cloud features such as backup to cloud directly from SQL Server Management Studio and the ability to utilize Azure as a disaster recovery site using SQL Server 2014 AlwaysOn.

Available now, the SQL Server Migration Assistant version 6.0 for Oracle databases, greatly simplifies the database migration process from Oracle databases to SQL Server. SSMA automates all aspects of migration including migration assessment analysis, schema and SQL statement conversion, data migration as well as migration testing to reduce cost and reduce risk of database migration projects. Moreover, SSMA version 6.0 for Oracle databases brings additional features such as automatically moving Oracle tables into SQL Server 2014 in-memory tables, the ability to process 10,000 Oracle objects in a single migration, and increased performance in database migration and report generation.

Many customers have realized the benefits of migrating their database to SQL Server using previous versions of SSMA. For example:

SSMA for Oracle is designed to support migration from Oracle 9i or later version to all editions of SQL Server 2005, SQL Server 2008, SQL Server 2008 R2, and SQL Server 2012 and SQL Server 2014. The SSMA product team is also available to answer your questions and provide technical support at ssmahelp@microsoft.com

To download SSMA for Oracle, go here. To evaluate SQL Server 2014, go here.  

Thursday, July 24, 2014 10:00:00 AM

If you’re considering backing up your SQL Server database to the cloud, there are many compelling reasons. Not only will you have an offsite copy of your data for business continuity and disaster recovery purposes, but you can save on CAPEX by using Microsoft Azure for cost-effective storage.  And now, you can choose to backup to Microsoft Azure even for databases that aren’t running the latest version of SQL Server – creating a consistent backup strategy across your database environment. 

SQL Server has these tools and features to help you back up to the cloud:

  • In SQL Server 2014, Managed Backup to Microsoft Azure manages your backup to Microsoft Azure, setting backup frequency based on data activity.  It is available inside the SQL Server Management Studio in SQL Server 2014.
  • In SQL Server 2012 and 2014, Backup to URL provides backup to Microsoft Azure using T-SQL and PowerShell scripting.
  • For prior versions, SQL Server Backup to Microsoft Azure Tool enables you to back up to the cloud all supported versions of SQL Server, including older ones.  It can also be used to provide encryption and compression for your backups – even for versions of SQL Server that don’t support these functions natively.

To show you how easy it is to get started with SQL Server Backup to Microsoft Azure Tool, we’ve outlined the four simple steps you need to follow:

Prerequisites: Microsoft Azure subscription and a Microsoft Azure Storage Account.  You can log in to the Microsoft Azure Management Portal using your Microsoft account.  In addition, you will need to create a Microsoft Azure Blob Storage Container:  SQL Server uses the Microsoft Azure Blob storage service and stores the backups as blobs. 

Step 1: Download the SQL Server Backup to Microsoft Azure Tool, which is available on the Microsoft Download Center.

Step 2: Install the tool. From the download page, download the MSI (x86/x64) to your local machine that has the SQL Server Instances installed, or to a local share with access to the Internet. Use the MSI to install the tool on your production machines. Double click to start the installation. 

Step 3: Create your rules.  Start the Microsoft SQL Server Backup to Microsoft Azure Tool Service by running SQLBackup2Azure.exe.  Going through the wizard to setup the rules allows the program to process the backup files that should be encrypted, compressed or uploaded to Azure storage. The Tool does not do job scheduling or error tracking, so you should continue to use SQL Server Management Studio for this functionality.

On the Rules page, click Add to create a new rule.    This will launch a three screen rule entry wizard.

The rule will tell the Tool what local folder to watch for backup file creation. You must also specify the file name pattern that this rule should apply to.

To store the backup in Microsoft Azure Storage, you must specify the name of the account, the storage access key, and the name of the container.  You can retrieve the name of the storage account and the access key information by logging into the Microsoft Azure management portal.

At this time, you can also specify whether or not you wish to have the backup files encrypted or compressed.

Once you have created one or more rules, you will see the existing rules and the option to Modify or Delete the rule.

Step 4: Restore a Database from a Backup Taken with SQL Server Backup to Microsoft Azure Tool in place. The SQL Server Backup to Microsoft Azure Tool creates a ‘stub’ file with some metadata to use during restore.  Use this file like your regular backup file when you wish to restore a database.  SQL Server uses the metadata from this file and the backup on Microsoft Azure storage to complete the restore. 

If the stub file is ever deleted, you can recover a copy of it from the Microsoft Azure storage container in which the backups are stored.  Place the stub file into a folder on the local machine where the Tool is configured to detect and upload backup files.

That’s all it takes!  Now you’re up and running with Backup to and Restore from Microsoft Azure.

To learn more about why to back up to the cloud, join Forrester Research analyst Noel Yuhanna in a webinar on Database Cloud Backup and Disaster Recovery.  You’ll find out why enterprises should make database cloud backup and DR part of their enterprise database strategy. 

The webinar takes place on Tuesday, 7/29 at 9 AM Pacific time; register now.

Thursday, July 17, 2014 10:00:00 AM

In this overview and demo, we will show you what sentiment analysis is and how to build a quick mashup that combines real-time access to multiple data sources using tools from Microsoft.

Sentiment analysis is one of the hottest topics in the Big Data space. Sentiment analysis is the process of analyzing customer comments and feedback from Facebook, Twitter, Email, and more. The purpose of the analysis is to understand the overall sentiment the customer is trying to convey. This could be a negative sentiment, when the customer is unhappy with a company or its product. Neutral sentiment, when the customer is only mentioning a company or product, in passing, without a good or a bad feeling. The last is positive sentiment, when a customer is happy or excited about a company or its product.

Traditionally sentiment analysis was complicated because it required a mixture of very complex platforms and tools. Each component required for sentiment analysis was offered from a different company and required a large amount of custom work. The difficulty is further exasperated by hard-to- achieve business requirements. When we discuss sentiment analysis there are 3 key business requirements we see repeated:

  • Real-time access
  • Full granular data set (structured & unstructured)
  • BI and SQL front-end

Real-time Access

In the case of real-time access, business users need access to fresh data. In the world of social media, customer sentiment can change rapidly. With images and videos quickly being posted with re-tweets and Facebook ‘like’ capabilities, a good or bad aspect of a company’s product can go viral in minutes. Business users need to have the ability to analyze data as it comes in, in real-time. We will show in our overview video and demo, how we can utilize Microsoft’s StreamInsight technology for real-time data analysis and complex-event processing.

Full Granular Data Set

In the case of full granular data, in practice we have seen that using a traditional database system can hinder development. This is because a lot of the data that comes in for sentiment analysis such as email, is in a semi-structured or unstructured format. This means the data is not easily modeled into a database. The data does not come in a simple row/column format. Thus we utilize our Big Data technology that is meant for this type of data:  HDInsight (Hadoop). HDInsight is essentially Hortonworks Data Platform running on Windows. In our case we utilize HDInsight to land all of the data, in its raw original format, into the distributed file system HDFS. This allows us to ingest any kind of data, regardless of structure, and store that data online for further analysis at low cost. The Hadoop software is open-source and readily available.

BI and SQL Front-End

The most important area around delivering sentiment analysis to the business is access, making sure we are able to provide the data both in real-time (and high-fidelity) within the tools that our business users know and love. Previously when our customers were doing sentiment analysis on Hadoop systems, BI and SQL access was not available. This was not because the tools could not integrate with Hadoop systems. This was because they could not scale or have the same level of functionality. Some BI users have chosen Hive ODBC in Hadoop, which many claim to be slow and ‘buggy’. Instead here we utilize one of our flagship technologies: PolyBase. With PolyBase we expose the data in Hadoop, and relational SQL Server, with one T-SQL query. What this means is users can use BI tools like Excel, SSAS, or other 3rd party tools. They can then utilize PolyBase within Analytics Platform System (APS) to query that data either in Hadoop, or Parallel Data Warehouse (SQL Server), or mash up the data from both systems!

How It Works

Now we will show you how to use all of the tools from the SQL Server data platform to achieve sentiment analysis. This will allow you to quickly deploy and meet all 3 business requirements through a set of tools and platforms that are very easy to use, fully integrated, and ‘just work’ together.

Let’s get started with the first video (~5 minutes) where we present sentiment analysis using Microsoft technologies. We show you how sentiment analysis works, and how the Microsoft products fit. We then follow up by discussing the architecture in detail surrounding StreamInsight, HDInsight, and Analytics Platform System.

Watch the overview video:

Demo

In the second video (~7 minutes), we show you sentiment analysis in action. The demo will include a full sentiment-analysis engine running in real-time against Twitter data along with a web dashboard. We then stream Twitter data to both HDInsight and Parallel Data Warehouse. Finally, we end the demo by showcasing PolyBase, our flagship technology. With Polybase we can do data mashups combining data from relational and non-relational systems. We will use Polybase to write standard T-SQL queries against this data to determine tweet analytics and how social sentiment is fairing for our marketing campaigns and products.

Watch the demo video:

Wednesday, July 16, 2014 4:14:00 PM

We are pleased to see Microsoft acknowledged by Forrester Research as a Leader in The Forrester Wave™: Agile Business Intelligence Platforms, Q3 2014.  

We are happy to see what we believe to be an affirmation in our approach and in the strength of our technologies. Our placement in this report reflects both high scores from our clients for product vision, as well as for client feedback collected as part of the customer survey. Forrester notes that “Microsoft received high client feedback scores for its agile, business user self-service and [advanced data visualization] ADV functionality. Clients also gave Microsoft BI a high score for its product vision”. This feedback from our customers is especially gratifying to see.

Microsoft is delivering on our vision of making business intelligence more agile and accessible through the tools that people use every day. With the accessibility of Excel and the recent release of Power BI for Office 365, we aim to lower the barrier of entry for users and reduce the complexity of deploying business intelligence solutions for IT. Using Microsoft’s business intelligence solution, companies such as MediaCom have reduced time to reporting from weeks to days, Carnegie Mellon is using data to reduce energy consumption by 30%, and Helse Vest is combining hospital data to visualize trends in real time.

We appreciate the recognition of our software in this report. Above all, we value our customer’s voice in helping shape and validate this approach.

Thursday, July 10, 2014 12:35:00 PM

Microsoft SQL Server 2014 brings new performance and scalability gains by introducing In-Memory OLTP.  In-Memory OLTP contains tables and indexes optimized for in memory. Transactions execute under lock-free algorithms to provide linear scalability and Transact-SQL stored procedures can be compiled in native machine code for maximum efficiency in processing.

Working with SQL Server customers on In-Memory OLTP engagements, a common pattern emerged around the desire for increased performance and scalability when using ASP.NET session state. Some early adopters modified their SQL Server objects to take advantage of In-Memory OLTP for ASP.NET session state, with great success. To learn more, read the bwin.party case study “Gaming site can scale to 250,000 requests per second and improve player experience”. To further enhance this scenario, we have created a new provider to make it easier for customers to take advantage of SQL Server In-Memory OLTP when using ASP.NET session state.

This ASP.NET session state provider is fully optimized for In-Memory OLTP by calling natively compiled Transact-SQL stored procedures and by creating all tables as memory-optimized. The functionality of the provider was tested both internally and by external customers. The results showed the implementation was able to provide some significant gains at scale levels which would have previously exhibited a bottleneck on the database.

NOTE: While some testing has been done before the release, we recommend executing your own testing and validation to understand how this implementation behaves in your specific environment.

Getting Started

Setting up the provider requires two steps, installing the provider into the ASP.NET application and creating the In-Memory OLTP database and objects in Microsoft SQL Server 2014.

The provider and scripts can be accessed in two ways:

1. The package has been uploaded to NuGet: https://www.nuget.org/packages/Microsoft.Web.SessionState.SqlInMemory/

2. The source code is also accessible through CodePlex: https://msftdbprodsamples.codeplex.com/releases/view/125282

NuGet Installation

Download the ASP.NET Session State Provider for SQL Server In-Memory OLTP from the NuGet gallery by running the following command from the Visual Studio Package Manager Console:

PM> Install-Package Microsoft.Web.SessionState.SqlInMemory

More information about the NuGet package can be found here:

https://www.nuget.org/packages/Microsoft.Web.SessionState.SqlInMemory/

Installing the package will do the following things:

  • Add references to the ASP.NET Session State Provider assembly.
  • Add to the web.config file a customProvider equals to "SqlInMemoryProvider", where the connectionString attribute needs to be updated.
    <?xml version="1.0" encoding="utf-8"?>
    <configuration>
      <system.web>
        <sessionState mode="Custom" customProvider="SqlInMemoryProvider">
          <providers>
            <add name="SqlInMemoryProvider"
                 type="Microsoft.Web.SessionState.SqlInMemoryProvider"
                 connectionString="data source=sqlserver;initial catalog=ASPStateInMemory;User ID=user;Password=password;" />
          </providers>
        </sessionState>
      </system.web>
    </configuration>
  • Adds an ASPStateInMemory.sql file that includes the script for creating the SQL Server database configured to support In-Memory OLTP.

Setting up In-Memory OLTP Database and objects

Open the T-SQL script file "ASPStateInMemory.sql" and update the 'CREATE DATABASE' statement to replace the 'FILENAME' attributes to specify a path that will exist in your SQL Server machine where the memory-optimized filegroup should exist. For further considerations on placement of this filegroup see Books Online section Creating and Managing Storage for Memory-Optimized Objects

CREATE DATABASE [ASPStateInMemory]
ON PRIMARY (
  NAME = ASPStateInMemory, FILENAME = 'D:\SQL\data\ASPStateInMemory_data.mdf'
),
FILEGROUP ASPStateInMemory_xtp_fg CONTAINS MEMORY_OPTIMIZED_DATA (
  NAME = ASPStateInMemory_xtp, FILENAME = 'D:\SQL\data\ASPStateInMemory_xtp'
)
GO

After updating the 'FILENAME' attributes, run the entire script for creating the In-Memory tables and the natively compiled stored procedures.

Additionally, create a periodic task in SQL Server to run the stored procedure 'dbo.DeleteExpiredSessions'. This procedure removes the expired sessions and frees up the memory consumed.

NOTE: The memory-optimized tables are created with a durability of SCHEMA_ONLY to optimize for performance. If session data durability is required, then change the 'DURABILITY' attribute from 'SCHEMA_ONLY' to 'SCHEMA_AND_DATA'. More information can be found in Books Online sections Defining Durability for Memory-Optimized Objects and Durability for Memory-Optimized Tables.

Conclusion

SQL Server In-Memory OLTP has shown to greatly improve the performance of ASP.NET session state applications. This provider allows customers to optimize ASP.NET web farms to take advantage of SQL Server In-Memory OLTP using a packaged solution with ease.

For further considerations on session state with In-Memory OLTP, along with other solution patterns which have shown success with SQL Server In-Memory OLTP, please reference the whitepaper: In-Memory OLTP – Common Workload Patterns and Migration Considerations.  

Download the Microsoft SQL Server 2014 Evaluation and see how in-memory processing built into SQL Server 2014 delivers breakthrough performance.

Tuesday, July 1, 2014 10:00:00 AM

This blog post was authored by: Murshed Zaman, AzureCAT PM and Sumin Mohanan, DS SDET

With the advent of SQL Server Parallel Data Warehouse (the MPP version of SQL Server) V2 AU1 (Appliance Update 1), PDW got a new name: the Analytics Platform System [Appliance] or APS. The name changed with the addition of Microsoft’s Windows distribution of Hadoop (HDInsight or HDI) and PDW sharing the same communication fabric in one appliance. Customers can buy an APS appliance with PDW or with PDW and HDI in configurable combinations.

Used in current versions of PDW, Polybase is a technology that allows PDW users to query HDFS data. SQL users can quickly get results from Hadoop data without learning Java or C#.

Features of Polybase include:

  1. Schematization of Hadoop data in PDW as external tables
  2. Querying Hadoop data
  3. Querying Hadoop data and joining with PDW tables
  4. High speed export and archival of PDW data into Hadoop
  5. Creating persisted tables in PDW from Hadoop data 

In V2AU1 Polybase improvements include:

  1. Predicate push-down for queries in Hadoop as Map/Reduce jobs
  2. Statistics on Hadoop data in PDW

Another new feature introduced in PDW V2AU1 is the capability to query data that resides in Microsoft Azure Storage Accounts. Just like HDFS data, PDW can place a schema on data in Microsoft Azure Storage Accounts and move data from PDW to Azure and back.

The APS with these new features and improvements has become a first-class citizen in analytics for any type of data. Any company that has Big Data requirements and wants a highly scale-out Data Warehouse appliance can use APS.

Here are four cases that illustrate how different industries are leveraging APS:

One: Retail brand vs. Name brand

Retail companies that use PDW who also want to harvest and curate data from their social analytics sites. This data provides insights into their products and understand the behaviors of the customers. Using APS, the company can offer the right promotion at the right time and to the right demographics. Data also allows the companies to find brand recommendation coming from a friend, relative or a trusted support group that can be much more effective than marketing literature alone. By monitoring and profiling social media, these companies can also gain a competitive advantage.

Today’s empowered shoppers want personalized offers that appeal to their emotional needs. Using social media retailers offer promotions that are tailored to individuals using real-time analytics. This process starts by ranking blogs, forums, Twitter feed and Facebook posts for predetermined KPIs revealed in these posts and conversations. Retail organizations analyze and use the data to profile shoppers to personalize future marketing campaigns. Measureable or sale data reveals the effectiveness of the campaign and the whole process starts again with the insight gained.

In this example, PDW houses the relational sale data and Hadoop houses the social emotions. PDW with built in HDI region gives the company the arsenal to analyze both data sources in a timely manner to be able to react and make changes.  

Retail store APS diagram:

Two: Computer Component Manufacturing

Companies that generate massive amounts of electronic test data can get valuable insights from APS. Test data are usually a good candidate for Hadoop due to its key-value type (JSON or XML) structure.

One example in this space is a computer component manufacturer. Due to the volume, velocity and variety of these (ie: Sort/Class) data a conventional ETL process can be very resource expensive. Using APS, companies can gain insight from their data by putting the semi-structured (key-value pair) data into an HDI-Region and other complementary structured data sources (ie: Wafer Electrical Test) into PDW. With the Polybase query feature these two types of data can easily be combined and evaluated for success/failure rates.

Computer Component Manufacturing Diagram:

Three: Game Analytic Platform for online game vendors

The PDW with HDI regions can offer a complete solution for online game companies, to derive insights from their data. MMORPG’s (Massively Multiplayer Online Role Playing Games) are good examples where APS can deliver value.  Game engines produce many transactional data (events like which avatar got killed in the current active game) and a lot of semi-structured data such as activity logs containing chat data and historical logs. PDW is well-suited to loading the transactional data in to the PDW workload and semi-structured data to the HDI region of APS. The data can then be used to derive insights such as: 

  1. Customer retention - Discovering when to give customers offers and incentives to keep them in the game
  2. Improving game experience - Discovering where customers are spending more time in the game, and improving in-game experience
  3. Detecting fraudulent gaming activities

Currently these companies deal with multiple solutions and products to achieve the goal. APS provides a single solution to power both their transactional and non-transactional analytics.

Four: Click stream analysis of product websites for targeted advertisement.

In the past, a relational database system was sufficient to satisfy the data requirements of a medium-scale production website. Ever-increasing competition and advancements in technology have changed the way in which websites interact with customers. Apart from storing data that customers explicitly provide the company, sites now record how customers interact with their website.  As an example, when a registered user browses a particular car model, additional targeted advertisements and offers can be sent to the user.

This scenario can be captured using collected clickstream data and the Hadoop eco-system. APS acts as the complete solution to these companies by offering the PDW workload to store and analyze transactional data, combined with HDI region to derive insights from the click-stream data.

This solution also applies to Third party companies that specialize in targeted advertising campaigns for their clients.

While “Big Data” is a hot topic, we very often receive questions from customers about the actual use cases that apply to them and how they can derive new business value from “Big Data.” Hopefully these use cases highlight how various industries can truly leverage their data to mine insights that deliver business value in addition to showcasing how traditional data warehouse capabilities work together with Hadoop

Visit the Microsoft Analytics Platform System page to learn more. 

Wednesday, June 25, 2014 10:00:00 AM

This blog post will detail how APS gives users the ability to:

  • Leverage Power Query, Power Pivot, and Power Map at massive scale
  • Iteratively query APS, adding BI on the fly
  • Combine data seamlessly from PDW, HDI, and Azure using PolyBase

The Microsoft Analytics Platform System (APS) is a powerful scale out data warehouse solution for aggregating data across a variety of platforms. In Architecture of the Microsoft Analytics Platform System and PolyBase in APS - Yet another SQL over Hadoop solution?, the base architecture of the platform was defined. Here we’ll build on this knowledge to see how APS becomes a key element of your BI story at massive scale.

Let’s first start with a business case. Penelope is a data analyst at a US based restaurant chain with hundreds of locations across the world. She is looking to use the power of the Microsoft BI stack to get insight into the business – both in real time and aggregate form for the last quarter. With the integration of APS with Microsoft BI stack, she is able to extend her analysis beyond simple querying. Penelope is able to utilize the MOLAP data model in SQL Server Analysis Services (SSAS) as a front end to the massive querying capabilities of APS. Using the combined tools, she is able to:

  • Quickly access data in stored aggregations that are compressed and optimized for analysis
  • Easily update these aggregations based on structured and unstructured data sets
  • Transparently access data through Excel’s front-end

Using Excel, Penelope has quick access to all of the aggregations she has stored in SSAS with analysis tools like Power Query, Power Pivot, and Power Map. Using Power Map, Penelope is able to plot the growth of restaurants across America, and sees that lagging sales in two regions, the West Coast and Mid-Atlantic, are affecting the company as a whole.

After Penelope discovers that sales are disproportionately low on the West Coast and in the Mid-Atlantic regions, she can use the speed of APS’ Massively Parallel Processor (MPP) architecture to iteratively query the database, create additional MOLAP cubes on the fly, and focus on issues driving down sales with speed and precision using Microsoft’s BI stack. By isolating the regions in question, Penelope sees that sales are predominantly being affected by two states – California and Connecticut. Drilling down further, she uses Power Chart and Power Pivot to breakdown sales by menu item in the two states, and sees that the items with low sales in those regions are completely different.

While querying relational data stored in APS can get to the root of an issue, by leveraging PolyBase it becomes simple to also take advantage of the world of unstructured data, bringing additional insight from sources such as sensors or social media sites. In this way Penelope is able to incorporate the text of tweets relating to menu items into her analysis. She can use PolyBase’s predicate pushdown ability to filter tweets by geographic region and mentions of the low selling items in those regions, honing her analysis. In this way, she is able to discover that there are two separate issues at play. In California she sees customers complaining about the lack of gluten free options at restaurants, and in Connecticut she sees that many diners find the food to be too spicy.

Iterative Analytics

So how did Penelope use the power of APS to pull in structured data such as Point of Sale (POS), inventory and ordering history, website traffic, and social sentiment into a cohesive, actionable model? By using a stack that combines the might of APS, with the low time to insight of Excel - let’s breakdown the major components:

  • Microsoft Analytics Platform System (APS)
  • Microsoft HDInsight
  • Microsoft SQL Server Analysis Services (SSAS)
  • Microsoft Excel with Power Query, Power Pivot and Power Map

Loading Data in APS and Hadoop

Any analytics team is able to quickly load data into APS from many relational data sources using SSIS. By synchronizing the data flow between their production inventory and POS systems, APS is able to accurately capture and store trillions of transactional rows from within the company. By leveraging the massive scale of APS (up to 6 PB of storage), Penelope doesn’t have to create the data aggregates up front. Instead she can define them later.

Concurrently, her team uses an HDInsight Hadoop cluster running in Microsoft Azure to aggregate all of the individual tweets and posts about the company alongside its menus, locations, public accounts, customer comments, and sentiment. By storing this data in HDInsight, the company is able to utilize the elastic scale of the Azure cloud, and continually update records with real-time sentiment from many social media sites. With PolyBase, Penelope is able to join transactional data with the external tables containing social sentiment data using standard TSQL constructs.

Creating the External Tables

Using the power of PolyBase, the development team can create external tables in APS connected to the HDInsight instance running in Azure. In two such tables, Tweets and WordCloud, Twitter data is easily collected and aggregated in HDFS. Here, the Tweets table is raw data with an additional sentiment value and the WordCloud table is an aggregate of all words used in posts about to the company.

Connecting APS and SSAS to Excel

Within Excel, Penelope has the ability to choose how she would like to access the data. At first she uses the aggregations that are available to her via SSAS – typical sales aggregates like menu items purchases, inventory, etc. – through PowerQuery.

But how does Penelope access the social sentiment data directly from APS? Simple, by using the same data connection tab, Penelope can directly connect to APS and pull in the sentiment data using PolyBase.

Once the process is complete, tables pulled into Excel, as well as their relationships, are shown as data connections.

Once the data connection is created, Penelope is able to create a report using PowerPivot with structured data from the Orders table and the unstructured social sentiment data from HDInsight in Azure.

With both data sets combined in Excel, Penelope is able to then create a Power Map of the sales data layered with the social sentiment. By diving into the details, she can clearly see issues with sentiment from customers in Connecticut and California.

To learn more about APS, please visit http://www.microsoft.com/aps.

Drew DiPalma – Program Manager – Microsoft APS
Drew is a Program Manager working on Microsoft Analytics Platform System.  His work on the team has covered many areas, including MPP architecture, analytics, and telemetry.  Prior to starting with Microsoft, he studied Computer Science and Mathematics at Pomona College in Claremont, CA. 

Site Map | Printable View | © 2008 - 2014 SQLTrainer.com, LLC | Powered by mojoPortal | HTML 5 | CSS | Design by mitchinson