Sapling Data

… because it's all about the data.

Presenting at the Toronto SPUG

For anyone in the GTA and interested in SharePoint data protection, I’m presenting at the Toronto SharePoint User Group meeting on Wednesday December 21st 2011. Details of the session are available at the following link:

http://www.meetup.com/TorontoSPUG/events/35162912/

I look forward to seeing everyone there. Please bring your questions.

TechDays Montreal Details

For those of you attending TechDays 2011 in Montreal on Nov.29 and Nov.30, I am presenting at 4:15 on Nov.30. My session is session code ‘CLB277′ and is titled ‘Why SharePoint Data Protection is not as Easy as it Looks’. It is a 200-level session intended for the technical and business owners responsible for the recovery of SharePoint data and services in the event of a failure. I’ll be presenting this session in English, however, the slides will be displayed in both English and French and what I say will be available in French through a live translation service.

I’ll also be presenting at the Partner Stage at 2:20 on Nov.30 ( in English ), and will be spending the rest of my time between the Collaboration Lounge and the AvePoint booth.

I hope to see you there.

TechDays Vancouver Details

For those of you attending TechDays 2011 in Vancouver on Nov.15 and Nov.16, I am presenting at 4:15 on Nov.16. My session is session code ‘CLB277′ and is titled ‘Why SharePoint Data Protection is not as Easy as it Looks’. It is a 200-level session intended for the technical and business owners responsible for the recovery of SharePoint data and services in the event of a failure.

I’ll also be presenting at the Partner Stage at 2:20 on Nov.16, and will be spending the rest of my time between the Collaboration Lounge and the AvePoint booth.

I hope to see you there.

DevTeach Ottawa Details

For those of you attending DevTeach in Ottawa from Nov.02 to Nov.04, I am presenting at 4:30 on Nov.04. I’m presenting my TechDays session titled ‘Why SharePoint Data Protection is not as Easy as it Looks’. It is a 200-level session intended for the technical and business owners responsible for the recovery of SharePoint data and services in the event of a failure.

I hope to see you there.

TechDays Toronto Details

For those of you attending TechDays 2011 in Toronto on Oct.25 and Oct.26, I am presenting at 10:30 on Oct.26. My session is session code ‘CLB277′ and is titled ‘Why SharePoint Data Protection is not as Easy as it Looks’. It is a 200-level session intended for the technical and business owners responsible for the recovery of SharePoint data and services in the event of a failure.

I’ll also be presenting at the Partner Stage at 2:30 on Oct.26, and will be spending the rest of my time between the Collaboration Lounge and the AvePoint booth.

I hope to see you there.

Thank-you to Everyone Who Voted, See you at TechDays

Thank-you to everyone who cast their vote and helped select the sessions for this year’s TechDays. My session on Why SharePoint Data Protection is not as Easy as it Looks has been selected so I’ll be presenting in all three cities ( Toronto, Vancouver, and Montreal ).

See you at TechDays.

 

Call to Action :: Vote for the Topics You Would Like to See at TechDays

Background:

Last year at TechDays, I attended all 8 cities and presented several sessions on SharePoint, Group Policy, and Customizing Windows 7. For TechDays 2011, I submitted the following sessions for consideration:

  1. SharePoint Storage: Go big or stay home. This is a 200-level session intended for the architects and technical leads responsible for the design and implementation of SharePoint. It discusses data externalization strategies, the implications of SharePoint 2010 SP1, the updated database recommendations from Microsoft and what each of these mean to organizations with SharePoint implementations being scaled for more than 1 TB of data.
  2. Enterprise Collaboration in the Cloud. This is a 200-level session intended for the architects and technical leads responsible for the design and implementation of SharePoint and the corporate cloud strategy. It discusses the key differences between on-premises and off-premises SharePoint, the limitations of each, and the key factors to consider when developing a cloud strategy for collaboration.
  3. Why SharePoint Data Protection is not as Easy as it Looks. This is a 200-level session intended for the architects and technical leads responsible for the design and implementation of SharePoint backups, the SharePoint disaster recovery solution and high availability planning. It discusses the different components of a complete SharePoint backup, how to achieve high availability for SharePoint, the different options for SharePoint data protection and how to select a data protection strategy to meet organizational requirements.

Of the three sessions submitted, the third one ( Why SharePoint Data Protection is not as Easy as it Looks ) has been short-listed.

Call to Action:

The sessions presented at TechDays this year will be selected by vote. All sessions short-listed for TechDays are up for consideration. The URL to cast your vote for the sessions you would like to see is:

http://bit.ly/tdcan2011vote

Please take a minute to vote for the sessions you would like to see at TechDays this year. Also, for more information on TechDays, please visit the TechDays website at:

http://www.microsoft.com/canada/techdays/2011/

Finally, a shout-out to Damir, who did a fantastic job ‘quarterbacking’ TechDays last year and it looks like this year’s events will be equally exceptional.

Technologies for Externalizing SharePoint Content

Background and Context:

In order for SharePoint to function as an enterprise solution, it must be capable of supporting large quantities of data. SharePoint uses SQL databases for it’s data storage which is both good and bad – good because SQL is a capable and reliable database engine but bad because much of the data a typical organization will store in SharePoint ( Office documents, PDF files, media files, etc. ) is not well suited to database storage and retrieval. At the same time, corporate data growth – especially during the early stages of SharePoint adoption – tends to cause SharePoint databases to grow quickly causing strain on database systems if they have not been sized or are not expanded properly to accommodate this growth. Finally, regardless of the growth rate, large databases require more system resources to run, are more expensive to maintain, and are more difficult to backup and restore.

There are many articles available on the internet highlighting and / or explaining both the features and complexities of externalizing SharePoint content. There are also many articles cautioning against the use of SharePoint externalization because of it’s complexities, however, very few articles exist which offer solutions for organizations with business-critical implementations of SharePoint and large quantities of SharePoint data. For these organizations, it isn’t practical to store, say 3 TB of data, in content databases. For these organizations, externalizing content and other storage optimization techniques are critical to the success of SharePoint as an enterprise solution.

This article attempts to addresses this gap by providing a business-centric explanation of how and why externalization works, the differences between the two Microsoft providers and how to select one. It also discusses the implications of externalizing content so the individuals who do choose to use this technology have enough information to (a) ask the right questions, and (b) plan and implement externalization properly within their organizations**.

Why Externalize SharePoint Content:

There are two reasons for choosing to externalize SharePoint content:

  1. Capacity, and
  2. Performance.

In terms of capacity, externalizing documents ( more specifically their BLOBs – explained in more detail in the next section ) allows an organization to store a much larger quantity of data within the same amount of SharePoint content database space.

In terms of performance, SQL Server is very good at storing and retrieving small items such as data records but it is not nearly as good at storing and retrieving large items. From a SharePoint perspective, leaving small items ( such as text files ) inside SharePoint content databases and externalizing large items ( such as big MS Office documents, media files, and images ) so storage and retrieval of these documents happens largely outside SQL will improve SharePoint performance for users of the system and for the administrators managing it.

Understanding BLOBs:

Before examining Microsoft’s externalization technologies, it’s important to understand the concept of a BLOB. ‘BLOB’ is an acronym which stands for ‘Binary Large OBject’ and refers to the non-metadata, not-attribute part of a document … in other words, the actual ‘document’ or ‘content’ part of the object.

By way of example, suppose a user uploads a 2 MB MS Word file into SharePoint. In this case the properties of the document ( name, size, description, etc. ) and any SharePoint metadata associated with the document are not part of the BLOB but the contents of the document ( text, embedded images, formatting, etc. ) are.

BLOBs and SharePoint Content Databases:

There are two important things to realize about BLOBs:

  1. Unless the document is very small, the bulk of the space consumed by the document is taken up by the BLOB … in other words, the size of the BLOB is usually very large compared to the size of the metadata and properties. Of the 2 MB document from our earlier example, it is reasonable to expect that the document attributes and properties would be 100 – 500 KB in size. As the size of the document increases, the increase occurs only in the BLOB.
  2. The BLOB part of the document is irrelevant from a database perspective. I say this because the BLOB doesn’t contribute to storing, indexing, or locating the document. These database functions are dependent upon the document’s properties and metadata but not the BLOB.

Because of these factors, it is both possible and advantageous to store document BLOBs outside SharePoint content databases.

In order for SharePoint to be used as an enterprise content management system, it is necessary for it to support large quantities of data. To facilitate this Microsoft has provided two technologies for externalizing BLOBs uploaded into SharePoint. ( Aside: So have several 3rd party vendors, to be discussed in a subsequent article. ) These technologies are EBS and RBS.

Understanding EBS:

EBS ( External BLOB Storage ) was introduced in with Service Pack 1 for SharePoint 2007 and supports both SharePoint 2007 and SharePoint 2010. It is a SharePoint-specific technology configured on the Web Front End servers ( WFE’s ) of the SharePoint environment where it is enabled. Keeping things simple, EBS works as follows:

Suppose a user uploads a document into a SharePoint environment with EBS enabled. The document is received by the SharePoint Object Model and passed to the storage system ( for the techies reading this article, more specifically the Storage Access Stack of the storage sub-system ) so it can be transferred to SQL and saved in the content database. At this point, the document passes to the EBS provider which splits the document, separating the metadata and properties from the BLOB. It sends the BLOB to the BLOB store and the document properties, metadata, and a pointer to SQL which stores them in the content database. The pointer – also referred to as a document stub – contains the information necessary for the BLOB to be matched to its corresponding properties and metadata during document retrieval.

When a user requests this document, that request is received by the SharePoint Object Model and again passed to the EBS provider through the storage sub-system. The EBS provider requests the information from SQL, which queries the content database and returns the document’s properties, metadata, and the pointer to the BLOB. Next the EBS provider follows the pointer to the BLOB store and retrieves the document’s BLOB. It then combines the BLOB with it’s metadata and returns the ‘re-assembled’ document to the SharePoint Object Model for delivery to the requesting user.

Understanding RBS:

RBS ( Remote BLOB Storage ) is a SQL-specific technology introduced with SQL Server 2008. It supports SharePoint 2010 and, again keeping it simple, it works as follows:

Suppose our user uploads a second document into a different SharePoint environment, this one with RBS enabled. The document is received by the SharePoint Object Model which passes the document to the storage sub-system so it can be saved in the content database. The storage sub-system passes the document to SQL, at which point SQL leverages the RBS provider ( for the techies reading this article, more specifically the RBS Client Library ) to split the document into metadata / properties and BLOB, save the BLOB to the BLOB store and the document properties, metadata, and pointer to the content database.

When a user requests this document, the request is received by the SharePoint Object Model and passed to SQL through the storage sub-system. The RBS provider obtains the document’s properties, metadata, and the pointer from the content database and the BLOB from the BLOB store. It combines the two and returns the document to the SharePoint Object Model for delivery to the requesting user.

Selecting a Externalization Technology:

EBS and RBS can not be used in conjunction with each other so if externalizing content it is necessary to select one or the other. Microsoft has indicated that EBS is on the depreciation list for SharePoint, so the long term externalization technology Microsoft is choosing to support is RBS.

Since RBS requires SQL 2008 and SharePoint 2010, EBS is the only option for externalization in SharePoint 2007. For clients who wish to externalize data but haven’t upgraded to SharePoint 2010 yet, this doesn’t mean they shouldn’t externalize with EBS, however, it does mean they will need to go through an EBS-to-RBS conversion before EBS becomes unsupported. ( Aside: Microsoft has provided a supported conversion path using PowerShell, as have several 3rd party vendors such as AvePoint. )

Also, because RBS is a SQL-specific technology unlike EBS which is a SharePoint-specific technology, RBS can be used to externalize BLOB’s from any SQL database, not just SharePoint content databases.

In summary, if an organization meets the requirements to use RBS, it is the recommended first choice.

Implications of Externalizing Content:

For organizations considering externalization, with either EBS or RBS, the following key implications should be considered before making a decision.

  1. Architecture. Externalization of SharePoint data requires some significant changes to the overall Architecture of the SharePoint environment. While externalization does reduce the load on the database layer ( ie: SQL servers ) it will increase the load on the storage systems responsible for hosting BLOB stores. Also keep in mind that externalizing SharePoint BLOBs requires a very close coupling between the database layer and the BLOB storage systems, so will also increase the load on whatever system is facilitating the communication between these two ( fibre channel or iSCSI infrastructure, network connections, etc. ). Generally speaking the redistribution of load is still advantageous – the point here is to realize the infrastructure supporting SharePoint needs to be designed and implemented differently for externalized data then if all SharePoint data resides in content databases.
  2. Configuration. Neither EBS nor RBS are enabled by default, meaning before they can be used, all prerequisites must be met and one of the providers enabled. While not overly complicated, this configuration is not trivial either. It should be tested in a non-production environment and only implemented during a threat or outage window. Implementation activities should also be carefully documented to ensure consistency of configuration and in the event troubleshooting is required. Also worth noting is that several 3rd party solutions are capable of performing the necessary configuration and will simplify the deployment process.
  3. Performance. In most cases this is a positive implication, given that overall system performance is usually better once data is externalized**. That said, when testing storage optimization, be sure to test the system under load. A content database with only 10 documents, all between 1 and 5 MB of size will perform the same with or without externalization enabled. To see performance benefits, most organizations require a larger and more varied set of data. Also, when considering performance be sure to consider all aspects of overall system performance. Backup and restore times of SharePoint and SQL systems are often overlooked when considering SharePoint system performance.
  4. Data Protection. For most organizations, this will be the most signification implication. When all SharePoint data resides in content database, backing up and restoring SharePoint content is very straight forward. Once that data exists in two places – content databases and BLOB stores – the backup and restore process is not as straight forward. A complete backup of SharePoint data requires backing up both items. Many enterprise backup tools understand and support externalized data and can provide a complete backup of SharePoint data even if externalized. That said, this is a very important consideration – with many applications this functionality needs to be enabled or activated before it can be used. If the enterprise backup solution used in a particular organization doesn’t support externalized BLOBs, consider a 3rd party SharePoint-specific backup solution such as AvePoint’s Data Protection Suite within DocAve.

Summary:

Returning back to our earlier example of the 2 MB MS Word document. Considering just this one document, it is insignificant whether this document is externalized or not. But consider an organization that has 10,000 MS Office documents each of which is 2 MB in size. This equates to approximately 20 GB of MS Office documents and in a SharePoint content database, the size of the database will be ( as expected ) about 20 GB. Assuming a generous estimate of 200 KB per document is properties, metadata, and EBS or RBS pointers, with externalization the size of this content database becomes 2 GB.

The overall quantity of data remains approximately the same ( externalization does add a small overhead due to the addition of EBS or RBS pointers ) but the distribution of data has changed and for most organizations having their critical data in BLOB format on a file system is easier to manage then in SharePoint content databases, even given the slight increase caused by the overhead of externalization.

———-

References:

———-

Notation:

** In discussing this topic, it is important to understand that the principles of storage optimization and externalizing SharePoint content assume a ‘normal’ set of corporate data. For example, companies with very large numbers of tiny documents or with very small quantities of data relative to the number of employees may not be good candidates for these solutions and technologies.

As is the ‘best practice’ when implementing any enterprise solution, a full analysis of an organization’s data and data usage patterns is required to assess the suitability and benefits before choosing to externalize SharePoint content.

Externalizing SharePoint Content

Corporate Data Landscape:

In most organizations data is distributed across multiple systems and locations. The most typical locations where corporate data resides are:

  • File systems,
  • Messaging systems such as Microsoft Exchange,
  • ECM ( Enterprise Content Management ) systems such as EMC Documentum, IBM FileNet and SAP,
  • Custom line-of-business systems.

Typically each different system is accessed and searched separately, meaning that users need to know which system to search before they even begin looking for the data they require. This usually also means that cross-system integration is limited and any system integration which does exist is completed manually and / or limited in its capability.

The SharePoint Opportunity:

SharePoint is unique in the sense that it provides an excellent system for both hosting corporate data and providing a point of integration for data stored in disparate systems.

SharePoint, especially when combined with FAST search, has the capability and capacity to reach out to other enterprise systems, index them and provide users with a single interface to the corporate data they require. It also has the capability and capacity to host data migrated from other data storage locations for organizations that prefer to migrate their data to SharePoint and reduce the number of enterprise data repositories.

Centralizing Corporate Data Access with SharePoint:

With SharePoint there are three strategies for centralizing corporate data access:

  1. Integration through SharePoint Search. SharePoint search can be configured to reach out and index other data systems. There are also 3rd party search connectors which can be used to extend SharePoint’s ability to gather information from other data systems and present it to users. If the objective is to locate data across multiple systems, this is a good solution but it doesn’t provide true integration. SharePoint search will provide links to data in other systems, but the integration stops there. Other SharePoint functionality, for example workflows and integrated security won’t extend to the data which continues to reside in other systems.
  2. Data migration to SharePoint. This is the most involved of the three strategies but is the one option which provides an opportunity to reduce the number of enterprise data systems, thus reducing the complexity of the corporate data landscape. This option involves either manually moving data by exporting it from its current location and importing it into SharePoint or leveraging a 3rd party migration tool.
  3. Connecting external data. This can be done through Microsoft’s Business Connectivity Services ( BCS ), formerly known as Business Data Connectivity ( BDC ) Services, through connectors ( usually provided by the vendor of the external data repository ) or through connector tools provided by 3rd party vendors. The Microsoft BDC / BDC are designed to be generic connectors and may or may not provided the desired degree of integration. Specialized connectors are usually the best option, often providing both the benefits of SharePoint and the benefits of the external system, but may or may not be available for system and version being connected. When using external connectors, ones which leverage Microsoft’s EBS and / or RBS providers are strongly preferred.

SharePoint as an Enterprise Data Repository:

For the reasons given above and the different implementation strategies available, SharePoint is certainly a good choice for an enterprise data repository, subject, of course, to a business needs analysis. That said, in order for SharePoint to handle this responsibility, a corresponding commitment is required from the organization in order to ensure all components of the SharePoint system are architected and implemented ‘properly’.

Addressing SharePoint Erosion

Definition:

SharePoint ‘erosion’ refers to the gradual deterioration of SharePoint, in particular to the configuration, security, topology, and data stored within SharePoint.

Background:

SharePoint tends to be more prone to erosion then many other Microsoft enterprise technologies partly because of the broad range of capabilities SharePoint offers, partly because of the way in which organizations use SharePoint and partly because of the way security is managed and delegated.

At the time SharePoint is deployed within an organization, the implementation should be ‘perfect’ in the sense that it should be properly designed, properly implemented, and meets all the needs of the organization. Unfortunately, this is when users begin using the SharePoint environment.

Unless all users understand and adhere to the architecture, governance, and ‘best practices’** for the use of SharePoint the environment will begin to ‘erode’ as content is uploaded to the wrong place, files are duplicated across sites instead of linked, permissions are set which do not adhere to corporate policies, the needs or the organization change while SharePoint does not, and so on.

The Impact of Erosion:

The impact of erosion is often underestimated because on their own, each change to the environment which doesn’t adhere to the architecture, governance and best practices is insignificant but collectively they can represent a significant impact to the overall quality and efficiency of SharePoint.

For those fortunate organizations, erosion results in a level of inconvenience for users and / or SharePoint administrators. For less fortunate organizations, the impact of erosion can be thousands of dollars each year in additional administration and remediation costs.

Unmanaged erosion can and has resulted in failed security audits, uncontained growth in sites and storage, and various degrees of system failure … so certainly can become very significant.

Addressing Erosion:

Unfortunately, in an enterprise deployment of SharePoint erosion is unavoidable. The best analogy for this challenge compares SharePoint to a leaking boat – the key is to make sure water is being bailed out faster than it is coming in.

Fortunately, the steps required to combat erosion are very straight forward. Some reduce the rate with which erosion occurs while others help in finding and remediation the erosion that does occur. In a nutshell the strategy is to realise that, despite all possible efforts, SharePoint erosion is going to happen – plan for it and be proactive in addressing it.

More specifically, an organization needs to:

  1. Understand the organization’s business requirements. The remainder of the SharePoint implementation derives from the business requirements and they also provide the measures of success and acceptance criteria against which SharePoint is measured.
  2. Invest in developing a complete SharePoint architecture. Many organizations either do not realize or underestimate the importance of a complete architecture to the long-term success of SharePoint.
  3. Implement governance. Governance is the foundation for the set of processes and administrative practices which will address erosion over time.
  4. Documentation. IT professionals are notoriously bad at documentation, however, understanding how and why things were configured and what changes have been made are extremely important. Requiring documentation at all levels of SharePoint administration makes it easier to remediate erosion and makes individuals more accountable for their changes.
  5. Leverage 3rd party tools. ‘Some’ if not ‘many’ of the manual activities involved in remediating erosion can be facilitated through 3rd party tools.

Closing Thoughts:

For most organizations, the key to addressing SharePoint erosion is awareness. The organizations which are aware of the extent to which their SharePoint environments are eroding are in the best position to address it. Also worth mentioning is that the organizations who have calculated the cost and impact of erosion, almost all of them have determined that the organizational benefits of having SharePoint far outweigh the effort required to combat erosion.

———-

Notation:

** The term ‘best practices’ is used on this site, both in this and all subsequent contexts, to refer to the set of proven practices and guidance available through experienced technical architects, administrators, and other technical professionals. Despite all best intentions to do so, the use of the term ‘best practices’ does not necessarily guarantee the absolute best possible practice in all situations. The term ‘proven practice’ or ‘reasonably accepted industry standard’ are better descriptions of what is meant in this case. Despite this, the term ‘best practices’ continues to be used where it conveys the proper sentiment because of its industry-wide recognition and the connotation it carries.

Follow

Get every new post delivered to your Inbox.