In September of 2017, Microsoft provided information about their formulation of a new query language, known as U-SQL. It has been designed for use on the Azure Data Lake Store, which is believed to be launching in preview mode by the end of 2017.
Azure Data Lake Store
Microsoft has been pushing Azure Data Lake Store as a service that helps with analyzation of large-scale unstructured data. Data Lake Store is made to support tools for the Hadoop File System (HDFS), but is supposed to be easier to manage than actually running a Spark or Hadoop cluster on-site.
Azure Data Lake was announced in April, but U-SQL answers one of the many questions about the service. That question being how useful information could be that is grabbed from the tons and tons of data a data lake will collect. Microsoft clearly has been thinking about ways to simplify the process of analyzing big data.
Easier Data Management
While it is true that most businesses and organization have big data in some form, which might include purchasing records or log files, they may not all have the people and programming options to do much with it. However, many of those companies do have technology employees who are versed with all things Microsoft.
Oliver Chiu, Microsoft product marketing manager for big data, Hadoop, and data warehousing, had this to say, ““Microsoft’s goal is to make big data technology simpler and more accessible to the greatest number of people possible.”
SQL + C#
Because of that desire, Microsoft built U-SQL to fuse the familiar SQL with the capabilities of a programming license (Microsoft’s C#). In theory, this gives the best of both worlds in one tidy, sealed-up package.
Chiu explains, “We’ve heard that many data engineers struggle to process data with today’s tools. Code-based solutions offer great power, but are complex to learn; SQL-based tools are easy to start with but difficult to extend.”
What U-SQL does is combine the keywords of SQL with the syntactic expressions of C# so that a single script can be used to schematize data from an entirely unstructured source, then use SQL to aggregate that data as desired, before writing the output into a table or other file. As such, U-SQL lets information be worked in stages, which allows for a more complex analysis.
Of course, it will take time to determine whether U-SQL works the way that it is intended to do. However, some Microsoft businesses are already working with the system and offering some amount of insight into what is capable.
One of those companies is Codit, who has been testing the technology with a specific job in mind. They are building a system that combines usage data from smart energy meters with energy market prices to highlight when the best times are to turn the power down to save more money.
Codit chief technology officer, Sam Vanhoutte, explained that the technology is time-saving. “If you know both languages, it’s indeed rather easy to get up to speed.”
While U-SQL has a number of benefits, there are a few limitations to note. One is that Microsoft has not yet revealed if U-SQL will be available for non-Azure platforms. It also will not be ideal for every big data case. Those who do stream processing or machine learning within the Microsoft cloud will need to be familiar with Steam Analytics and Machine Learning in Azure.
It remains to be seen how this new technology will pan out for customers. There are plenty of questions waiting to be answered. However, it does seem like an exciting new option for those who the technology will work for and could be the start of something big.