quinta-feira, 21 de junho de 2018

The Developers Conference - São Paulo - Brasil 2018

One of the largest Software conference in Brasil, with a lot of content dedicated to DATA and Analytics in July, 2018. For more info, click the mage bellow.

Check out my new book about R Language http://www.amazon.com/dp/B00SX6WA06

terça-feira, 19 de junho de 2018

ETL vs. ELT. How two letters make such a difference...

Since the rise of Data Warehouses, the ETL process has evolved and reached it's maturity, and has been the "go-to" technology for BI and Analytics solutions.

Althought the ETL process provides a great solution to so many different problems, It also creates a few problems of its own. If you are here, reading this post, you probably have done a few ETL (or many) processes yourself, but I will outline the conceps a bit before laying out the pros and cons of each technique.

An ETL is a data process, defined by three steps: Extract, Transform and Load. So, if you need to import an csv file into a database, converting the column values to proper database columns, you would be using an ETL tool like Sql Server Integration Services (SSIS) or Pentaho. 

Your transformation would look something like this in SSIS:

You can clear see the ETL process steps: A flat file components Extracts the csv file data, a derived columnn and a data conversion Transform the so that it can be Loaded into a SQL Server Database (or any other Database for that matter).

Ok, that is great, your data is loaded into a Database using a simple enough software and everybody is happy. So whats so wrong with that, that a new fancy word cames to "improve it"?

A simple ETL like that (simple csv file into database), wouldn't be much of a problem, BUT supose you need to do that using thousands of files with hundred of GB each, on a daily basis, and the destination database is heavily queried by hundreds of of users and third part applications? Your ETLs become long and dragging, and your users and your "data consumers" need to wait until the whole process is done to access the data. Boring!

The ETL is designed so that the Extract and Load process may occur in different mantainance window time frames, so that both source and destination don't suffer any downtime, minimizing the corporate strees.

But with the rise of NoSQL and Cloud appliances, that architecture is no longer needed. We can load the data into a centralized scalable data repository, and make all that data available to different users and applications.

This centralized data repository is called "Data Lake", and once the extract is loaded into it, your consumers can access it in a timely fashion. Later on, if that is the purpose, this data can be loaded into another applicance such as a Data Warehouse, Database Applications, etc.

In the image above, the Data Lake is noted as Hadoop, but you can use any technology you want: HBASE, CouchDB, Cassandra...even relational cloud databases such as Postgres, MySQL, SQL Server, etc. As far as technology goes, the Data Lake needs only "elastic" hardware resources and 24 X 7 X 365 availability.

The key idea, is that once the data is extracted, it can be made readily available to the consumers by loading into the Data Lake.

Pros of using and ELT process:

  • Time: once the data is extracted it is soon loaded and made available.
  • Cost: since the destination is usually a unstructered data repositoty, and NoSQL is usually cheaper, it is cost effective.
  • Flexibility: It is very useful for AD-HOC analysis, and its simplicity makes it closer to developers and data consumers. 

So what are your thoughts on it? Have you used ELT before?

Check out my new book about R Language http://www.amazon.com/dp/B00SX6WA06

segunda-feira, 18 de junho de 2018

Connecting with SFTP using SSIS

As you might already know, you can connect with a FTP server using SSIS FTP component. Unfortunately, SSIS cannot connect with SFTP but with a litle bit of creativity and some googling you can beat that limitation.

First STEP, download the latest psftp.exe version here. Once you download it, create a new folder on your hard drive and copy it over there. Say something like "c:\tmpdestino".

Second STEP, add a EXECUTE PROCESS TASK on your transformation and into the EXECUTABLE field add the full path to the psftp utility: "c:\tmpdestino\psftp.exe".

On your arguments field, add a command line with all the arguments needed to your task:

[user]@[sftp_HOSTNAME] -pw [YOURPASSOWRD] -be -batch -b [YOURBATCHFILE]

Third STEP, create a batch file that executes all your tasks in the SFTP host. That will be the last reference in your arguments field. So if you need to browse to a FOLDER like "Myfolder" and copy all the files from it, your bat would be like this:

cd Myfolder
mget *

And that is it! Add the task to your package and do your thing.

Check out my new book about R Language http://www.amazon.com/dp/B00SX6WA06

sexta-feira, 15 de junho de 2018

SQL SERVER Unused Indexes query

Checking for unused indexes that could compromise the execution plan creation in SQL SERVER instance, I needed to list and delete all the unused indexes.

Found the perfect query for it @ SQL  Authority:

SELECT TOP 25 o.NAME                                     AS ObjectName, 
              i.NAME                                     AS IndexName, 
              i.index_id                                 AS IndexID, 
              dm_ius.user_seeks                          AS UserSeek, 
              dm_ius.user_scans                          AS UserScans, 
              dm_ius.user_lookups                        AS UserLookups, 
              dm_ius.user_updates                        AS UserUpdates, 
              'DROP INDEX ' + Quotename(i.NAME) + ' ON ' 
              + Quotename(s.NAME) + '.' 
              + Quotename(Object_name(dm_ius.object_id)) AS 'drop statement' 
FROM   sys.dm_db_index_usage_stats dm_ius 
       INNER JOIN sys.indexes i 
               ON i.index_id = dm_ius.index_id 
                  AND dm_ius.object_id = i.object_id 
       INNER JOIN sys.objects o 
               ON dm_ius.object_id = o.object_id 
       INNER JOIN sys.schemas s 
               ON o.schema_id = s.schema_id 
       INNER JOIN (SELECT Sum(p.rows) TableRows, 
                   FROM   sys.partitions p 
                   GROUP  BY p.index_id, 
                             p.object_id) p 
               ON p.index_id = dm_ius.index_id 
                  AND dm_ius.object_id = p.object_id 
WHERE  Objectproperty(dm_ius.object_id, 'IsUserTable') = 1 
       AND dm_ius.database_id = Db_id() 
       AND i.type_desc = 'nonclustered' 
       AND i.is_primary_key = 0 
       AND i.is_unique_constraint = 0 
ORDER  BY ( dm_ius.user_seeks + dm_ius.user_scans 
            + dm_ius.user_lookups ) ASC 


Check out my new book about R Language http://www.amazon.com/dp/B00SX6WA06

segunda-feira, 11 de junho de 2018

[From Technet] SQL Server Automatic Tuning around the world…

..and in the cloudsWhen I first saw early builds of SQL Server 2017, one of the features that caught my eye immediately was Automatic Tuning with an option called Automatic Plan Correction. SQL Server 2017 was released in October of 2017 right on the heels of one of my favorites releases, SQL Server 2016. In SQL Server 2016, we brought to the product a new feature called Query Store. When Query Store is enabled for a database via ALTER DATABASE, the SQL Server engine will start collecting query performance telemetry in memory and system tables in the database. No longer do you need to poll Dynamic Management Views and store them into your own tables. This performance telemetry is collected by the SQL Server engine itself when queries are compiled and executed.

Like many features we build into SQL Server, you can use T-SQL queries to find out the details of this performance data through a series of catalog views. Query store opens up all types of cool performance insights and we have documented some of the key usage scenarios. One of these scenarios is called a query plan regression (also known as plan choice regressions).

Imagine this scenario. You have a stored procedure that takes a single integer parameter. This integer parameter is used in the WHERE clause of a SELECT statement in the stored procedure. The first time the stored procedure is compiled, the plan for this procedure is inserted into cache based on the value of the first execution of the procedure. And this plan may be a good plan for most users. Now for unexpected reasons, perhaps memory pressure, the plan is evicted from cache. Lets say a user then executes the procedure through an application but this time with a different integer parameter value. This could result in a different query plan that leads to poor performance. Compiling a plan for a stored procedure based on the parameter value is called parameter sniffing. This concept is discussed in our Query Processing Architecture Guide in the documentation (which in itself is a cool read). Parameter sniffing is designed to be a good thing, but in some situations where the data in the table associated with the parameter is skewed, a performance problem could occur.

So, in SQL Server 2016, you can use our reports in SQL Server Management Studio or run queries against Query Store catalog views to see whether a query plan regression has caused a performance problem. Now comes along SQL Server 2017 with some automation. Why not bake into the engine some automation behind the rich telemetry of Query Store? Turns out the folks in our engineering team that own the Query Store feature were already working on these kinds of features in the cloud for Azure SQL Database. Using our cloud-first approach for engineering we started working on these features in Azure, tested and verified their functionality, and then brought them to SQL Server 2017.

SQL Server 2017 on Windows

The approach we took for SQL Server 2017 for Automatic Tuning is recommended then automate. There is no better way to show you how this works then with a demo. Watch this video on our SQL YouTube channel about how Automatic Tuning works on SQL Server 2017 on Windows. I use the popular WideWorldImporters sample database for this demonstration.

Want to run the demo yourself? Download it from my GitHub repo.

Almost everyone Ive shown this demo to has been amazed. My recommendation to use this feature is the following:

  • Enable Query Store and configure it to your needs. See our documentation on best practices.
  • Monitor any recommendations we have for you by examining the Dynamic Management View (DMV), dm_db_tuning_recommendations.
  • If you are comfortable with our recommendations, experiment with turning on Automatic Plan Correction using this T-SQL syntax.

  • Whetheryou just look at recommendations or use automation, I always recommend you find the cause of the query plan regression problem and take more long-term corrective action.

SQL Server 2017 on Linux

One of the key points Ive been making to our customers about SQL Server on Linux is that the core database engine is the same as on Windows. Since Automatic Tuning is built into the core database engine, this feature works exactly the same on SQL Server on Linux.

As you watched my demo of Automatic Tuning for SQL Server on Windows, I used the Windows Performance Monitor to show the workload performance when automatic tuning corrects a query plan regression problem. But what about Linux? Windows Performance Monitor will not work with that platform.

Turns out I have some smart people working with me at Microsoft. I got help from Pedro Lopes from our Tiger Team and the engineering team who built SQL Operations Studio to show automatic tuning using that tool.

  • Pedro helped me build scripts that would query the DMV, sys.dm_os_performance_counters, which runs on SQL Server on Linux as it does on Windows (because it is built into the core engine). These scripts store results from this DMV in a temp table, and then query the temp table.
  • The folks at SQL Operations Studio showed me how to collect the result set from the temp table in a manner that would allow me to show the data in a Time Series chart in the tool. Kind of like a static perfmon chart of SQL Server counters.

The result is the same demo for Automatic Tuning I did for SQL Server on Windows but run against SQL Server on Linux. How did I get the WideWorldImporters sample database to work against Linux for this demo? I just downloaded it from the web into my Linux server and restored it!

The following is an example of the chart from SQL Operations Studio after Automatic Tuning corrected a query plan regression problem from the demo:

Check out this demo of Automatic Tuning on SQL Server on Linux from this video on our SQL Server YouTube channel.

You can grab the demo scripts here.

Azure SQL Database Managed Instance

Ive now tuned the world of SQL Server instances on Windows and Linux. Why not tune the cloud as well? I have been speaking at many customer events boasting that we have a great consistency story with SQL Server including the cloud. This story includes features, functionality, tools, and the T-SQL language across SQL Server and our Azure Data Services. One of these services that was just announced is Azure SQL Database Managed Instance. One of the promises about this service is that I can just take a database backup from SQL Server and restore it to Azure (can you see a pattern here for compatibility? Backup, Restore, and done!).

A quick search on the web led me to this article.Seems pretty simple. Watch the video to see how I restored my WideWorldImporters backup and then ran the same Automatic Tuning demo as I did for SQL Server on Linux.

Automatic Tuning in the real world

Does this feature really make a difference? Do any customers use this? I recently heard from one of the leading SQL Community leaders, Tracy Boggiano from Channel Advisor, on the topic. auto plan correction in query store is awesome we are seeing a 20% reduction on CPU utilization on 98% of our servers. Automatic Tuning is not just a shiny new feature. It works!

I think I accomplished my mission. I wanted to see how Automatic Tuning works with SQL Server everywhere it runs. And because of our great compatibility story I was able to do this with SQL Server on Windows, Linux, and in the clouds with Azure SQL Database Managed Instance all with the same database backup.

If you want to read more about Automatic Tuning check out our documentation.

Subscribe to our new SQL Server YouTube channel to keep up with more great videos and demos about SQL Server.

from SQL Server Blog https://ift.tt/2MkfEnd

segunda-feira, 4 de junho de 2018

[From Technet] Microsoft releases the latest update of Analytics Platform System

Microsoft is pleased to announce that the Analytics Platform System (APS) appliance update 7 (AU7) is now generally available. APS is Microsofts scale-out Massively Parallel Processing (MPP) system based on SQL Server for data warehouse specific workloads on-premises.

Customers will get significantly improved query performance and enhanced security features with this release. APS AU7 builds on appliance update 6 (APS 2016) release as a foundation. Upgrading to APS appliance update 6 is a prerequisite to upgrade to appliance update 7.

Faster performance

APS AU7 now provides the ability to automatically create statistics and update of existing outdated statistics for improved query optimization. APS AU7 also adds support for setting multiple variables from a single select statement reducing the number of redundant round trips to the server and improving overall query and ETL performance time. Other T-SQL features include HASH and ORDER GROUP query hints to provide more control over improving query execution plans.

Better security

APS AU7 also includes latest firmware and drivers along with the hardware and software patch to address the Spectre/Meltdown vulnerability from our hardware partners.

Management enhancements

Customers already on APS2016 will experience an enhanced upgrade process to APS AU7 allowing a shorter maintenance window with the ability to uninstall and rollback to a previous version.
AU7 also introduces a section called Feature Switch in configuration manager giving customers the ability to customize the behavior of new features.

Flexibility of choice with Microsofts data warehouse portfolio

The latest update is an addition to already existing data warehouse portfolio from Microsoft, covering a range of technology and deployment options that help customers get to insights faster. Customers exploring data warehouse products can also consider SQL Server with Fast Track for Data Warehouse or Azure SQL Data Warehouse, a cloud-based fully managed service.

Next Steps

For more details about these features, please visit our online documentation.

from SQL Server Blog https://ift.tt/2sLFS9t

terça-feira, 29 de maio de 2018

[From Technet] SQL Server at the Red Hat Summit 2018

This blog post is authored byBob Ward, Principal Architect, Microsoft; Travis Wright, Principal Program Manager, Microsoft; and Jamie Reding, Senior Program Manager, Microsoft.

A few weeks ago, developers from around the world gathered for the Microsoft Build Conference. It was an amazing display of Microsofts products and cloud services to meet the needs of all types of applications. I missed the //build event this year because I found myself in San Francisco at the Red Hat Summit 2018. I cant even imagine five years ago someone telling me I would represent Microsoft and SQL Server at an open-source based event.

Travis Wright, Jamie Reding, and I travelled to the event to speak and show amazing demos of SQL Server running on Red Hat Enterprise Linux (RHEL) and OpenShift. We were part of the Microsoft team attending the event to continue to show the great partnership we have and are building with Red Hat. Get an overview of the Microsoft presence at the summit from Robin Ginns blog post.

This was not the first time for Travis or Jamie attending this event. Microsoft demonstrated SQL Server on Linux at the Red Hat Summit 2016. Travis and Jamie both presented at the 2017 Summit for SQL Server on RHEL and Open Shift. I thought it would be interesting to write about our experiences and what we learned at this years Summit.

The presentation Travis and I gave on SQL Server was rich with demos. We worked hard on these demos, so we thought why not show everyone, not just the people at the Summit. We have packaged them all on the brand-new SQL Server YouTube channel. Ive included links to these videos throughout the rest of this blog post.

Its no longer Why are you here?

As the event ended, I asked Jamie and Travis what was one of the key differences between previous Summit events and this year. Travis summarized it as Customers at the booth have switched from asking us Why are you here? and What is SQL Server? two years ago to specific questions about running SQL Server on RHEL/OpenShift.”

The most common question we received was How do you install it? I think I must have demonstrated at our booth 20 or more times the SQL Server deployment experience, along with the great set of tools we have available for SQL Server on Linux including SQL Server Operations Studio and mssql-cli. Check out this video that shows the complete deployment experience on RHEL as well as a survey of our tools. Cross-platform and open-source tools is how we roll these days!

Is it the same as SQL Server on Windows?

Part of the above video shows how to restore a database and connect with SQL Server tools. Linux admins and architects want to be more comfortable with how we integrate with Linux, and I met many folks at the Summit who didnt know much SQL Server but knew Linux. Many of their colleagues who manage SQL Server in their environment couldnt attend so they wanted to know if SQL Server is the same as on Windows?

My first answer to this question was Tell your SQL friends to install SQL Server on Linux, backup their database on Windows, and just restore it on Linux. Then fire up SQL Server Management Studio (SSMS), connect, and start running queries. When I showed this simple experience, it made the Linux folks incredibly happy. They saw our deployment experience aligned with package managers like yum. But they also could tell their SQL Server colleagues Hey, its pretty much just SQL Server. Since many of them didnt use Windows, I then showed them SQL Operations Studio and mssql-cli which runs native on Linux and macOS as well as Windows. I heard comments like Now that is impressive.

We used these opportunities to talk beyond just the basics. We showed customers how fast SQL Server can be with technologies like Columnstore Indexes. Check out this demo with PowerBi and Columnstore.

We showed new SQL Server 2017 capabilities of intelligence performance like Automatic Tuning powered by Query Store. Check out this demo which includes built in performance telemetry through SQL Server Dynamic Management Views and charting capabilities in SQL Operations Studio.

We also talked about security and authentication. SQL Server supports both SQL Server authentication and integration with Active Directory. Watch this demo of SQL Server connecting with Active Directory Authentication.

What about new capabilities?

One of the most compelling new features of SQL Server 2016 and 2017 is in database Machine Learning Services. Keeping data associated with data science models and projects together with SQL Server provides security, data freshness, and scalability.
While SQL Server 2017 on Linux supports Native Scoring, support for R and Python did not make it into the release. But we are committed to bringing this type of feature for SQL Server on Linux. Check out this demo to see SQL Server on Linux with Python and Native Scoring with a real-world prediction example.

Does it perform?

As we described the architecture of SQL Server on Linux, the first question most people asked us was Does it perform?. Sometimes it is always best to answer questions like these with data.

The current top two 1TB TPC-H benchmarks are SQL Server 2017 on Linux 1 2. We talked to customers about how SQL Server can scale from your laptop to the biggest servers in the market. And we especially love showing how fast it can run and scale on enterprise-class machines. Check out this demo where Travis Wright shows how SQL Server in Linux can scan billions of rows and run aggregation functions in seconds on an HPE Superdome computer with 12TB of RAM and 480 CPUs.

How does SQL Server support Containers?

I have come to realize over the last few months how popular containers are spreading. It is fast becoming not just an interest, but part of production implementation plans. SQL Server is ready to be a part of this wave. Deploying in a container itself can be an amazingly easy way to get up to speed fast on SQL Server on Linux. In fact, I brought along my MacBook Pro with me and demonstrated to attendees the SQL Server on Mac challenge. For those who know me in the SQL Server community Im sure you fell off your chair over this.

Running a single container is interesting, but to run containers in a production environment, you need something bigger. This is where Kubernetes comes in. And this is also where Red Hats Kubernetes-based system called OpenShift can make a difference. At the Summit, Microsoft announced a new managed OpenShift service in Azure and SQL Server fits right into this offering. Kubernetes and OpenShift have built-in high availability capabilities with shared persistent storage. SQL Server works well within this model. Try it yourself with this tutorial.

Always on Availability Groups (AG) is the flagship feature for SQL Server High Availability. So, we are working on new capabilities for SQL Server to integrate AGs with environments like OpenShift. Watch this demo to see this in action.


One last interesting topic from attendees was around licensing and offers. Attendees wanted to know if SQL Server on Linux licensing were the same as with Windows. First, it is important to know the Editions of SQL Server are the same, I found many attendees did not know this:

  • SQL Server Evaluation A full-featured version of SQL Server for evaluation only purposes with a 180-time limit.
  • SQL Server Express A free, entry-level version of SQL Server for learning or building small desktop applications.
  • SQL Server Developer A full-featured version of SQL Server license only for development and testing.
  • SQL Server Standard The basic version of SQL Server for departments and small organizations. Limits exist for this edition compared to Enterprise, but many features previously only available in Enterprise are in Standard today.
  • SQL Server Enterprise The premium version of SQL Server for mission critical and applications that need maximum scalable performance and high-availability.

For a complete breakdown of editions, check out our documentation.

The licensing of these editions is the same as SQL Server on Windows. Licensing for containers was also a big question we saw at the summit and our SQL Server Licensing Guide shows licenses for containers is similar to licensing for virtual machines. There are some unique offers today for customers looking at SQL Server on Linux including Migrating from ORACLE, SQL Server on Linux subscription, and special offer with Red Hat Enterprise Linux.

The experience at the Red Hat Summit for me personally was humbling and rewarding. It is the first time for me to be at a big event where Microsoft was not the central focus. It forced me to work harder explaining to attendees the value of SQL Server on Linux and not assume they already knew SQL Server. I saw a new perspective from customers that want SQL Server to work within the natural ecosystem of Linux, while also not losing the excellent features and tools that have made SQL Server a force in the industry.

We also had an opportunity to meet several engineers from Red Hat. We shared ideas on how to make SQL Server and RHEL a better experience including discussions on performance monitoring and OpenShift.

I look forward to continuing getting the word out and showing off SQL Server on Linux. It was a remarkable achievement to launch last year, but it is the satisfaction of seeing it now in the mainstream of customer conversations that is most rewarding. It is no longer a what conversation. It is now a when project plan.

Get started with SQL Server on Linux or dive deeper with our free Virtual Academy Training for SQL Server on Linux.

from SQL Server Blog https://ift.tt/2JikRNX

quinta-feira, 17 de maio de 2018

Converting a ASC character column to UTF-8

On a recent task, I had to import a simple text file for data enrichment purposes, and the "name" column when imported to SQL SERVER, got messed up, event after I changed the file encoding to UTF-8.

So basically, the data was like this:

"Jão Pedro Valênciano"

When it should be like this:

"João Pedro Valenciano"

After some googling, I found some dead end alleys and a good answer. On the SQL TEAM forum there is thread on the subject, proposing a function that solves the issue.

Bellow is the function and the link:

CREATE FUNCTION dbo.Utf8_to_utf16 (@s VARCHAR(8000)) 
returns NVARCHAR(4000) 
      IF @s IS NULL 
        RETURN NULL 

      DECLARE @n      INT, 
              @r      NVARCHAR(4000), 
              @cn     INT, 
              @octets INT, 
              @ch     NVARCHAR(2) 

      SET @r = N'' 

      WHILE 1 = 1 
            -- dubious: unexpected octets (0x80-0xBF, 0xF8-0xFF) are treated like 0x00-0x7F 
            SET @n = Patindex('%[ร€-รท]%', @s COLLATE latin1_general_bin 

            IF @n = 0 
                  SET @r = @r + @s 

                  SET @r = @r + Substring(@s, 1, @n-1) 
                  SET @cn = Ascii(Substring(@s, @n, 1)) 

                  IF @cn <= 0xDF 
                        SET @octets = 2 
                        SET @ch = Nchar(( @cn & 0x1F ) * 0x40 + ( Ascii( 
                                        Substring(@s, @n + 1, 1)) 
                                                                  0x3F )) 
                  ELSE IF @cn <= 0xEF 
                        SET @octets = 3 
                        SET @ch = Nchar(( @cn & 0x0F ) * 0x1000 + ( Ascii( 
                                        Substring(@s, @n + 1, 1) 
                                                                    & 0x3F ) * 
                                                  Ascii(Substring(@s, @n + 2, 1) 
                        -- code point in a supplementary plane: output UTF-16 surrogate pair 
                        SET @octets = 4 
                        SET @ch = Nchar((@cn & 0x07) * 0x100 + (Ascii(Substring( 
                                  , @n 
                                  , 1) 
                                  ) * 0x04 + ( 
                                  Ascii(Substring(@s, @n+2, 1)) & 0x30) / 0x10 + 
                                  + Nchar((Ascii(Substring(@s, @n+2, 1)) & 0x0F) 
                                  + ( 
                                  Substring(@s, @n+3, 1)) & 0x3F) + 0xDC00) 

                  SET @r = @r + @ch 
                  SET @s = Substring(@s, @n + @octets, 8000) 

      RETURN @r 


Link to it: http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=62406

Use it like this:
SELECT dbo.utf8_to_utf16 ('João Pedro Valênciano')

The result is:
João Pedro Valênciano

Check out my new book about R Language http://www.amazon.com/dp/B00SX6WA06