quinta-feira, 15 de fevereiro de 2018

[From Technet] The February release of SQL Operations Studio is now available

This post is authored by Alan Yu, Program Manager, SQL Server.

We are excited to announce the February release of SQL Operations Studio is now available.

Download SQL Operations Studio and review the Release Notes to get started.

SQL Operations Studio is a data management tool that enables you to work with SQL Server, Azure SQL DB and SQL DW from Windows, macOS and Linux. To learn more, visit our GitHub.

SQL Operations Studio was announced for Public Preview on November 15th at Connect(), and this February release is the third major update since the announcement. If you missed it, the January release announcement is available here.

The February release includes several major repo updates and feature releases, including:

  • Added Auto-Update Installation feature
  • Added Connection Dialog ‘Database’ Drop-down
  • Added functionality for new query tabs keeping active connection
  • Fixed bugs in SQL Editor and auto-completion

For complete updates, refer to the Release Notes.

Auto-Update Installation

We want to use the February Insiders builds to test the auto-update feature. The 0.26.2 build will be released as an auto-update to 0.26.1 (assuming there are no issues that require publishing a new build to successfully support auto-update).

To discover updates faster, we have added auto-update functionality to ensure you have the latest version of SQL Operations Studio. A notification will pop up on the gear icon on the bottom left as shown in the image above. In addition, you can also check for updates by clicking Help on the toolbar.

Connection Dialog Database Drop-down

The Connection Dialog Database field is now a dynamically populated drop-down list that will contain a list of databases populated from the specified server.

New query tabs keeping active connection

A top voted feature by our users was for new query tabs to keep their active connection. In the February release, users can now click on a server name and press Ctrl + N, which will create a new query tab and retain the active connection instead of having to manually set the connection. This can be seen in the gif below.

Contact us

If you have any feature requests or issues, please submit to our GitHub issues page. For any questions, feel free to comment below or tweet us @SQLOpsStuido.



from SQL Server Blog http://ift.tt/2EKVZvS

Best way to count words in a string using T-SQL

On this recent project, I needed to count words in a string, in order to measure its general "quality", for a Data Quality project.

Sounded simple enough, but this is a massive table with milions of rows so performance is a requirement.

I created I function of my own, using this answer on StackOverBlow: https://stackoverflow.com/questions/15352990/get-word-count-of-a-column-using-sql

CREATE FUNCTION [dbo].[Wordcount] (@InputString VARCHAR(4000)) 
returns INT 
AS 
  BEGIN 
      DECLARE @Index INT 
      DECLARE @Char CHAR(1) 
      DECLARE @PrevChar CHAR(1) 
      DECLARE @WordCount INT 

      SET @Index = 1 
      SET @WordCount = 0 

      WHILE @Index <= Len(@InputString) 
        BEGIN 
            SET @Char = Substring(@InputString, @Index, 1) 
            SET @PrevChar = CASE 
                              WHEN @Index = 1 THEN ' ' 
                              ELSE Substring(@InputString, @Index - 1, 1) 
                            END 

            IF @PrevChar = ' ' 
               AND @Char != ' ' 
              SET @WordCount = @WordCount + 1 

            SET @Index = @Index + 1 
        END 

      RETURN @WordCount 
  END 

go 

Althought it is a correct answer to the problem, it's not performactly good. So after doing some more googling, I found this answer much more simpler: https://stackoverflow.com/questions/41952250/sql-string-counting-words-inside-a-string

INSERT INTO @t 
VALUES      ('I am not your father.'), 
            ('Where are your brother,sister,mother?'), 
            ('Where are your brother, sister and mother?'), 
            ('Who are     you?'), 
            (''); 

WITH cte 
     AS (SELECT id, 
                Replace(Replace(string, ' ', '><'), '<>', '') string 
         FROM   @t) 
SELECT id, 
       Len(' ' + string) - Len(Replace(string, '><', ' ')) CountWords 
FROM   cte 


After tweaking it to handle null and empty values, my function looked like this:

CREATE FUNCTION [dbo].[Fn_wordcount] (@InputString VARCHAR(4000)) 
returns INT 
WITH schemabinding 
AS 
  BEGIN 
      DECLARE @WordCount INT = 0; 

      IF @InputString IS NULL 
        RETURN 0 

      SELECT @InputString = Replace(Replace(@InputString, ' ', '><'), '<>', ''); 

      SELECT @WordCount = Len(' ' + @InputString) - Len( 
                          Replace(@InputString, '><', ' ')); 

      SELECT @WordCount = Isnull(@WordCount, 0); 

      RETURN @WordCount 
  END 

go 

I'm still observing the performance, the the first run looked ok. I will get back on this post after properly measuring a full load.

Check out my new book about R Language http://www.amazon.com/dp/B00SX6WA06

[From Technet] Sync SQL data in large scale using Azure SQL Data Sync

This post is authored by Xiaochen Wu, Program Manager, SQL Server.  

Azure SQL Data Sync allows users to synchronize data between Azure SQL Databases and SQL Server databases in one-direction or bi-direction. This feature was first introduced in 2012. By that time, people didn’t host a lot of large databases in Azure. Some size limitations were applied when we built the data sync service, including up to 30 databases (five on-premises SQL Server databases) in a single sync group, and up to 500 tables in any database in a sync group.

Today, there are more than two million Azure SQL Databases and the maximum database size is 4TB. But those limitations of data sync are still there. It is mainly because that syncing data is a size of data operation. Without an architectural change, we can’t ensure the service can sustain the heavy load when syncing in a large scale. We are working on some improvements in this area. Some of these limitations will be raised or removed in the future. In this article, we are going to show you how to use data sync to sync data between large number of databases and tables, including some best practices and how to temporarily work around database and table limitations.

Sync data between many databases

Large companies and ISVs use data sync to distribute data from a central master database to many client databases. Some customers have hundreds or even thousands of client databases in the whole topology. Users may hit one of the following issues when trying to sync between many databases:

  1. Hit the 30 databases per sync group limitation.
  2. Hit the five on-premises SQL Server databases per sync group limitation.
  3. Since all member databases will sync with the hub database, there’s significant performance impact to workload running in the hub database.

To work around the 30 databases or five on-premises databases per sync group limitation, we suggest you use a multi-level sync architecture. You can create a sync group to sync your master database with several member databases. And those member databases can become the hub databases of the sub sync groups and sync data to other client databases. According to your business and cost requirement, you can use the databases in the middle layers as client databases or dedicated forwarders.

There are benefits from this multi-level sync architecture even you don’t hit the 30 databases per sync group limitation:

  • You can group clients based on certain attributes (location, brand…) and use different sync schema and sync frequency.
  • You can easily add more clients when your business is growing.
  • The forwarders (member databases in the middle layers) can share the sync overhead from the master database.

To make this multi-level sync topology work in your system, you will need a good balance between how many client databases in a single sync group and how many levels in the overall system. The more databases in a single sync group, the higher impact it will add to the overall performance in the hub database. The more levels you have in your system, the longer it takes to have data changes broadcasted to all clients.

When you are adding more member databases to the system, you need to closely monitor the resource usage in the hub databases. If you see consistent high resource usage, you may consider upgrading your database to a higher SLO. Since the hub database is an Azure SQL database, you can upgrade it easily without downtime.

Sync data between databases with many tables

Currently, data sync can only sync between databases with less than 500 tables. You can work around this limitation by creating multiple sync groups using different database users. For example, you want to sync two databases with 900 tables. First, you need to define two different users in the database where you load the sync schema from. Each user can only see 450 (or any number less than 500) tables in the database. Sync setup requires ALTER DATABASE permission which implies CONTROL permission over all tables so you will need to explicitly DENY the permissions on tables which you don’t want a specific user to see, instead of using GRANT. You can find the exact privilege needed for sync initialization in the best practice guidance. Then you can create two sync groups, one for each user. Each sync group will sync 450 tables between these two databases. Since each user can only see less than 500 tables, you will be able to load the schema and create sync groups! After the sync group is created and initialized, we recommend you follow the best practice guidance to update the user permission and make sure they have the minimum privilege for ongoing sync.

Optimize the sync initialization

After the sync group is created, the first time you trigger the sync, it will create all tracking tables and stored procedures and load all data from source to target database. The initial data loading is a size-of-data operation. Initializing sync between large databases could take hours or even days if it is not set up properly. Here are some tips to optimize the initialization performance:

  1. Data sync will initialize the target tables using bulk insert if the target tables are empty. If you have data on both sides, even if data in source and target databases are identical (data sync won’t know that!), data sync will do a row-by-row comparison and insertion. It could be extremely slow for large tables. To gain the best initialization performance, we recommend you consolidate data in one of your databases and keep the others empty before setting up data sync.
  2. Currently, the data sync local agent is a 32 bits application. It can only use up to 4GB RAM. When you are trying to initialize large databases, especially when trying to initialize multiple sync groups at the same time, it may run out of memory. If you encountered this issue, we recommend you add part of the tables into the sync group first, initialize with those tables, and then add more tables. Repeat this until all tables are added to the sync group.
  3. During initialization, the local agent will load data from the database and store it as temp files in your system temp folder. If you are initializing sync group between large databases, you want to make sure your temp folder has enough space before you start the sync. You can change your temp folder to another drive by set the TEMP and TMP environment variables. You will need to restart the sync service after you update the environment variable. You can also add and initialize tables to the sync group in batch. Make sure the temp folder is cleaned up between each batch.
  4. If you are initializing data from on-premises SQL Server to Azure DB, you can upgrade your Azure DB temporarily before the initialization. You can downgrade the database to the original SLO after the initialization is done. The extra cost will be minimum. If your target database is SQL Server running in a VM, add more resources to the VM will do the same.

Experiment of sync initialization performance

Following is the result of a simple experiment. I created a sync group to sync data from a SQL Server database in Azure VM to an Azure SQL database. The VM and SQL database are in the same Azure region so the impact of network latency could be ignored. It was syncing one table with 11 columns and about 2.1M rows. The total data size is 49.1GB. I did three runs with different source and target database configuration:

In the first run, the target database is S2 (50 DTU), and source database is running in D4S_V3 VM (4 vCPU, 16GB RAM). It takes 50 min to extract data to the temp folder and 471 min to load the data from the temp folder to the target database.

I upgraded the target database to S6 (400 DTU) and the Azure VM to D8S_V3 (8 vCPU, 32GB RAM) for the second run. It reduced the loading time to 98 min! The data extracting surprisingly took longer time in this run. I can’t explain the regression since I didn’t capture the local resource usage during the run. It might be some disk I/O issue. Even though, upgrading the target database to S6 reduced the total initialization time from 521 min to 267 min.

In the third run, I upgraded the target database to S12 (3000 DTU) and used the local SSD as temp folder. It reduced data extract time to 39 min, data loading time to 56 min and the total initialization time to 95 min. It was 5.5 time faster than the first configuration with extra cost of a cup of coffee!

Conclusion

  1. Upgrade the target database (Azure DB) to higher SLO will help to improve the initialization time significantly with manageable extra cost.
  2. Upgrade the source database doesn’t help too much since the data extract is an I/O bound operation and 32bits local agent can only use up to 4GB RAM.
  3. Using attached SSD as temp folder will help on the data extract performance. But the ROI is not as high as upgrading target database. You also need to consider if the temp files can fit into the SSD disk.
Runs Target database SLO (Azure DB) Source database SLO (VM) Total Initialization time Data extract time Data load time
1 S2 D4S_V3 521 min 50 min 471 min
2 S6 D8S_V3 267 min *169 min 98 min
3 S12 D8S_V3, Attached SSD 95 min 39 min 56 min

In this article, we provided some best practices about how to sync data using Azure SQL Data Sync service between many databases and databases with many tables. Please find more information about data sync in the online documentation. More data sync best practice is available at Best Practices for Azure SQL Data Sync.



from SQL Server Blog http://ift.tt/2C0Ke3c

sexta-feira, 9 de fevereiro de 2018

Reading SQL SERVER ERROR LOG

I had to run a update on a massive table from SQL SERVER, but ended up giving up on the task as it was taking too long and keeping from more important tasks.

After trying to kill the process unsuccesfully, I restarted the service and the database came up as recovery after the restart.

SQL SERVER 2008 or superior versions, can handle RECOVERY modes on their own, but to check on the status of the task, you check the SQL SERVER error log by executing this proc on your SQL SERVER:

EXEC sp_readerrorlog;

I know we have to be patient when doing database stuff, but its hard to be assertive with other people breathing down your neck.

Your SQL SERVER will show the evolution of the 3 steps to the recovery as follows:

2018-02-09 11:27:46.540 spid27s      Recovery of database 'dbx' (10) is 23% complete (approximately 2090 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2018-02-09 11:28:06.540 spid27s      Recovery of database 'dbx' (10) is 24% complete (approximately 2072 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2018-02-09 11:28:26.550 spid27s      Recovery of database 'dbx' (10) is 25% complete (approximately 2040 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.



Check out my new book about R Language http://www.amazon.com/dp/B00SX6WA06

segunda-feira, 5 de fevereiro de 2018

DELETING ROWS FROM VERY LARGE TABLES

Ever had a hard time deleting rows from a very large table? Try doing it in smaller chuncks in a loop, like this:

SELECT * 
INTO   #tmp 
FROM   sys.sysobjects;


DECLARE @COUNTX BIGINT;

SELECT @COUNTX =Count(*) 
FROM   #tmp;


WHILE EXISTS 
( 
       SELECT 0 
       FROM   #tmp) 
BEGIN 
  DELETE TOP (10) 
  FROM   #tmp; 
   
  SET @COUNTX=@COUNTX-10; 


  PRINT '10 DOWN!'+try_cast(@COUNTX as varchar)+' TO GO!' 
END;


DROP TABLE #tmp;

Check out my new book about R Language http://www.amazon.com/dp/B00SX6WA06

segunda-feira, 22 de janeiro de 2018

[From Technet] SQL Server 2017 on Linux webcast series

The world’s leading database is now available on Linux by bringing Microsoft SQL Server to Linux, Microsoft continues to embrace open source solutions.

SQL Server 2017 brings the best features of the Microsoft relational database engine to the enterprise Linux ecosystem, including SQL Server Agent, Azure Active Directory (Azure AD) authentication, best-in-class high availability/ disaster recovery, and unparalleled data security.

Note that SQL Server on Linux is not a port or rewrite. This is the same world-class Microsoft relational database management system (RDBMS) now available on more operating systems (like Red Hat Enterprise Linux, SUSE Linux Enterprise Server, and Ubuntu) and more cloud and container platforms (like Docker).

Join us for one or all of a three-part webcast series now available on demand as we explore how SQL Server 2017 brings the industry-leading Microsoft relational database engine to the enterprise Linux ecosystem with our partners from Intel, Red Hat and HPE.

Session One:

SQL Server 2017 on Linux- #1 in price and performance—with massive scale
Learn how you can get record breaking performance with SQL Server on Linux. SQL Server consistently leads in the TPC-E OLTP workload, the TPC-H data warehousing workload, and real-world application performance benchmarks.

Presented by Bob Ward, Microsoft; Nicholas Gerasimatos, Red Hat; Wendy Harms, HPE.

Session Two:

SQL Server 2017 on Linux, Providing Industry leading security
Learn how Security innovations in SQL Server 2017 help secure data for mission-critical workloads with layers of protection, adding Always Encrypted technology along with row-level security, dynamic data masking, transparent data encryption (TDE), and robust auditing.

Presented by Joachim Hammer, Microsoft; Ken LeTourneau, Intel; Nicholas Gerasimatos, Red Hat.

Session Three:

SQL Server 2017 on Linux, In-memory technologies
Learn how in-memory technologies in SQL Server help to optimize the performance of your transaction processing, analytics, as well as hybrid (HTAP) workloads.

Presented by Jos de Bruijn, Microsoft; Ken LeTourneau, Intel; Wendy Harms, HPE.

Register now>>



from SQL Server Blog http://ift.tt/2rwiwqD

quinta-feira, 18 de janeiro de 2018

[From Technet] Webinar: Modernize your applications with cloud and on-premises data solutions from Microsoft

Customers today demand the latest innovations in every solution you deliver. How can you make sure your data infrastructure not only keeps up, but drives innovation?

Data is the core of modern applications. Two key trends that help organizations extract the most from their data are the adoption of cloud technologies and the ability to drive new customer experiences with artificial intelligence. Organizations that modernize and harness data, cloud, and AI outperform their competition and are becoming leaders in their field. The most digitally transformed enterprises earn an additional $100 million in operating income!

Join our speakers Claudia Backus, Prem Prakash, and Frederico Rezende for a webinar on how you can transform your applications and enable new customer experiences using the Microsoft data platform.

In this webinar, you’ll learn:

  • How to leverage the performance, security and flexibility of the entire Microsoft database portfolio from SQL Server 2017 and Azure SQL Database to open-source databases like Azure Database for MySQL and Azure Database for PostgreSQL.
  • How to accelerate your move towards a cloud-based application with the new Azure Database Migration Service.
  • How the Microsoft Data Accelerator program can help you modernize your apps across on-premise and cloud.

Register now for this webinar to find out how Microsoft can help you successfully transform your existing applications and ultimately become a SaaS provider with cloud technologies.



from SQL Server Blog http://ift.tt/2DLBGeT

quarta-feira, 17 de janeiro de 2018

[From Technet] The January release of SQL Operations Studio is now available

This blog post was authored by Alan Yu, Program Manager, Microsoft SQL Server.

We are excited to announce the January release of SQL Operations Studio is now available.

Download SQL Operations Studio and review the release notes to get started. SQL Operations Studio is a data management tool that enables you to work with SQL Server, Azure SQL DB, and SQL DW from Windows, macOS, and Linux. To learn more, visit our Github.

sql-operations-studio-main

SQL Operations Studio was announced for public preview on November 15, 2017 at Connect(), and this January release is the second major update since the announcement. If you missed the December release announcement, you can learn more on the SQL Server blog.

The January release includes several major repo updates and feature releases, including:

  • Enable the HotExit feature to automatically reopen unsaved files.
  • Add the ability to access saved connections from Connection Dialog.
  • Set the SQL editor tab color to match the Server Group color.
  • Fix the broken Run Current Query command.
  • Fix the broken pinned Windows Start Menu icon.

For a complete list of updates, please refer to the release notes.

HotExit feature

hotexit

A highly requested feature for SQL Operations Studio is to remember unsaved changes when the program is exited, similar to VSCode and other editors. We are excited to announce that HotExit has been enabled as demonstrated in the GIF above. SQLQuery2 represents an unsaved query, and the user can quit SQL Operations Studio and simply reload it again to retain the unsaved query.

This feature is not enabled by default. To enable HotExit by default, go to Settings with Ctrl + , and  copy “files.hotExit”: “onExit” into your settings.

Saved connections available in Connection Dialog

saved-connections

The Connection Dialog now contains a “Saved Connections” tab next to “Recent Connections”. This contains a list of saved server connections from the server’s “Object Explorer” viewlet, as well as the Server Group.

SQL editor tab color to match Server Group color

sql-editor-tab-color

SQL query editor and dashboard windows will have the same color as the Server Group used to open the window. This feature is not enabled by default. To enable Tab Colors by default, go to Settings with Ctrl + , and  copy “sql.enableTabColors”: true into your settings.

Contact us

If you have any feature requests or issues, please submit them to our Github issues page. For any questions, feel free to comment below or tweet us @sqlopsstudio.



from SQL Server Blog http://ift.tt/2EO3QFl

segunda-feira, 15 de janeiro de 2018

T-SQL Query to list all indexes and their filegroups

I was fishing for a query to list all indexes and their FileGroups, so I could locate the ones that were created incorrectly and re-create them in their right place.

Ended up finding the query bellow at Technet by a Mr. Olaf  Helper. Mr. "Helper" was a great help indeed (ba dum tsss).


SELECT DS.NAME              AS DataSpaceName, 

       AU.type_desc         AS AllocationDesc, 
       AU.total_pages / 128 AS TotalSizeMB, 
       AU.used_pages / 128  AS UsedSizeMB, 
       AU.data_pages / 128  AS DataSizeMB, 
       SCH.NAME             AS SchemaName, 
       OBJ.type_desc        AS ObjectType, 
       OBJ.NAME             AS ObjectName, 
       IDX.type_desc        AS IndexType, 
       IDX.NAME             AS IndexName 
FROM   sys.data_spaces AS DS 
       INNER JOIN sys.allocation_units AS AU 
               ON DS.data_space_id = AU.data_space_id 
       INNER JOIN sys.partitions AS PA 
               ON ( AU.type IN ( 1, 3 ) 
                    AND AU.container_id = PA.hobt_id ) 
                   OR ( AU.type = 2 
                        AND AU.container_id = PA.partition_id ) 
       INNER JOIN sys.objects AS OBJ 
               ON PA.object_id = OBJ.object_id 
       INNER JOIN sys.schemas AS SCH 
               ON OBJ.schema_id = SCH.schema_id 
       LEFT JOIN sys.indexes AS IDX 
              ON PA.object_id = IDX.object_id 
                 AND PA.index_id = IDX.index_id 
ORDER  BY DS.NAME, 
          SCH.NAME, 
          OBJ.NAME, 
          IDX.NAME 



Using this query I was able to locate the indexes (clustered and nonclustered) that were incorrect and placed them (actually drop and re-create them into their righteous location). 

Although creating multiple files "per se" is not a performance improvement garantee, you need to test it in your case: http://community.idera.com/blog/b/community_blog/posts/increase-sql-server-performance-using-multiple-files

Messy work, but indeed necessary if your boss has OCD. You can create another FileGroup and add files to it using this:


ALTER DATABASE dbx ADD filegroup dbx_file_group; 


ALTER DATABASE dbx ADD FILE (NAME='dbx_file_group1', filename= 
'N:\MSSQL13.MDM\MSSQL\DATA\dbx_file_group1.ndf') TO dbx_file_group; 


In order to move a index to a new FileGroup, you need to drop and recreate it (yeah, messy indeed):


DROP INDEX [CIX_INDEX1] ON table1 WITH ( online = OFF ) 

go 



CREATE CLUSTERED INDEX [CIX_INDEX1] 
  ON table1 ( [col1] ASC ) 
  ON [dbx_file_group] 

go 

Check out my new book about R Language http://www.amazon.com/dp/B00SX6WA06