Question:- Does Puppet run on Windows?
Answer:- Beginning with Puppet 2.7.6 it is possible to run on Windows and this ensures future compatibility.
Question:- What type of organizations can use Puppet?
Answer:- There is no strict rule about the type of organizations that can benefit from Puppet. But an organization with only a few servers is less likely to benefit from Puppet. An organization with huge number of servers can benefit from Puppet as this eliminates the need to manually manage the servers.
Question:- Can Puppet run on servers that are unique?
Answer:- Puppet can run on servers that are unique. Even though there might be very less chances of servers being unique since within an organization there are a lot of similarities that exist like the operating system that they are running on, and so on.
Question:- What is Puppet Labs?
Answer:- Puppet Labs is the company that is interested in solving the Puppet automation problem.
Question:- How to upgrade Puppet and Facter?
Answer:- You can upgrade Puppet and Facter through your operating system package management system. You can do this either through the vendor’s repository or through the Puppet Labs’ public repositories.
Question:- What are the characters permitted in a class and module name?
Answer:- The characters that are permitted in a class and module name can be lowercase letters, underscores, numbers. It should being with a lowercase letter, you can use “::” as a namespace separator. The variable names can be including alphanumeric characters and underscore and can be case sensitive.
Question:- How are the variables like $ operating system set?
Answer:- You can set the variables by Facter. It is possible to get the complete list of variables and the values if you run the facter in a shell by itself.
Question:- Why do we need Azure Data Factory?
Answer:- The amount of data generated these days is huge and this data comes from different sources. When we move this particular data to the cloud, there are a few things needed to be taken care of. Data can be in any form as it comes from different sources and these different sources will transfer or channelize the data in different ways and it can be in a different format. When we bring this data to the cloud or particular storage we need to make sure that this data is well managed. i.e you need to transform the data, delete unnecessary parts. As per moving the data is concerned, we need to make sure that data is picked from different sources and bring it at one common place then store it and if required we should transform into more meaningful. This can be also done by a traditional data warehouse as well but there are certain disadvantages. Sometimes we are forced to go ahead and have custom applications that deal with all these processes individually which is time-consuming and integrating all these sources is a huge pain. we need to figure out a way to automate this process or create proper workflows. Data factory helps to orchestrate this complete process into a more manageable or organizable manner.
Question:- What is the integration runtime?
Answer:- • The integration runtime is the compute infrastructure that Azure Data Factory uses to provide the following data integration capabilities across various network environments. • 3 Types of integration runtimes: • Azure Integration Run Time: Azure Integration Run Time can copy data between cloud data stores and it can dispatch the activity to a variety of compute services such as Azure HDinsight or SQL server where the transformation takes place • Self Hosted Integration Run Time: Self Hosted Integration Run Time is software with essentially the same code as Azure Integration Run Time. But you install it on an on-premise machine or a virtual machine in a virtual network. A Self Hosted IR can run copy activities between a public cloud data store and a data store in a private network. It can also dispatch transformation activities against compute resources in a private network. We use Self Hosted IR because the Data factory will not be able to directly access on-primitive data sources as they sit behind a firewall. It is sometimes possible to establish a direct connection between Azure and on-premises data sources by configuring the Azure firewall in a specific way if we do that we don’t need to use a self-hosted IR. • Azure SSIS Integration Run Time: With SSIS Integration Run Time, you can natively execute SSIS packages in a managed environment. So when we lift and shift the SSIS packages to the data factory, we use Azure SSIS Integration Run Time.
Question:- What is the limit on the number of integration runtimes?
Answer:- There is no hard limit on the number of integration runtime instances you can have in a data factory. There is, however, a limit on the number of VM cores that the integration runtime can use per subscription for SSIS package execution.
Question:- What is the difference between Azure Data Lake and Azure Data Warehouse?
Answer:- Data Warehouse is a traditional way of storing data that is still used widely. Data Lake is complementary to Data Warehouse i.e if you have your data at a data lake that can be stored in the data warehouse as well but there are certain rules that need to be followed. • DATA LAKE • Complementary to data warehouse • Data is Detailed data or Raw data. It can be in any particular form. you just need to take the data and dump it into your data lake • Schema on read (not structured, you can define your schema in n number of ways) • One language to process data of any format(USQL) • DATA WAREHOUSE • Maybe sourced to the data lake • Data is filtered, summarised, refined • Schema on write(data is written in Structured form or in a particular schema) • It uses SQL
Question:- What is blob storage in Azure?
Answer:- Azure Blob Storage is a service for storing large amounts of unstructured object data, such as text or binary data. You can use Blob Storage to expose data publicly to the world or to store application data privately. Common uses of Blob Storage include: • Serving images or documents directly to a browser • Storing files for distributed access • Streaming video and audio • Storing data for backup and restore disaster recovery, and archiving • Storing data for analysis by an on-premises or Azure-hosted service
Question:- What are the steps for creating ETL process in Azure Data Factory?
Answer:- While we are trying to extract some data from Azure SQL server database, if something has to be processed, then it will be processed and is stored in the Data Lake Store. Steps for Creating ETL • Create a Linked Service for source data store which is SQL Server Database • Assume that we have a cars dataset • Create a Linked Service for destination data store which is Azure Data Lake Store • Create a dataset for Data Saving • Create the pipeline and add copy activity • Schedule the pipeline by adding a trigger
Question:- What is the difference between HDinsight & Azure Data Lake Analytics?
Answer:- HDInsight • HDInsight is Platform as a service • If we want to process a data set, first of all, we have to configure the cluster with predefined nodes and then we use a language like pig or hive for processing data • Since we configure the cluster with HD insight, we can create as we want and we can control it as we want. All Hadoop subprojects such as spark, Kafka can be used without any limitation. Azure Data Lake Analytics • Azure Data Lake Analytics is Software as a service. • It is all about passing queries, written for processing data and Azure Data Lake Analytics will create necessary compute nodes as per our instruction on-demand and process the data set • With azure data lake analytics, it does not give much flexibility in terms of the provision in the cluster, but Microsoft Azure takes care of it. We don’t need to worry about cluster creation. The assignment of nodes will be done based on the instruction we pass. In addition to that, we can make use of USQL taking advantage of dotnet for processing data.
