Wednesday, January 30, 2019

Resetting Search Index in Team Foundation Server

Original Post: 9/11/2017 10:55:53 AM

Search indexing for entities (Code, Work Item, WIKI) works in 2 phases:
  • Bulk Indexing (BI) where the entire code and work item artifacts in all projects/repositories under a Collection are indexed. This is a time consuming operation and depends on the size of the artifacts under the collection.

  • Continuous Indexing (CI) which handles all incremental updates to the artifacts (add/updated/delete) and indexes them. This is notification based model where the indexer listens to TFS events and operates based on those event notifications. CI handles almost all update operations including CRUD operations at Project/Repository/Collection layer (such as Repository renames, Project add/deletes, etc.). The operation time for these CI would depend again on the size of the incremental update. BI always precedes CI i.e. a CI will never execute on a project/repository until BI is completed for the same.
In addition to this, Search indexer has in-built resilience measures to monitor and run patch operations for any missed notifications and/or any other reason where the index is not up-to-date with the actual state of artifacts in TFS. These patch operations run at specific intervals (typically every 12 hours duration) on each Collection.

Keeping the above background information in mind, in most occasion one would never need to run a BI operation (i.e. Re-Index) on a Collection. However there are certain instances planned/unplanned where it will require a re-indexing. This can happen for various reasons such as -
  1. Index shard corruptions which are not auto-recovered from the indexer. Any such index level errors would required a clean re-indexing.

  2. Manually/accidentally deleted index data folder entirely (or specific index folders within it). Indexer does not auto-recover from such actions.

  3. Planning to move the index into another machine, let's say as part of an upgrade. In this case, the configuration upgrade step does take care of re-indexing internally, but the point is, re-indexing does happen.

  4. Search query shows stale results. Could be because of some consistent error, or the BI has simply aborted/crashed. And now, it requires a full re-indexing of that collection.
Coming to the primary focus of this post, how to do a clean re-indexing? This applies to both Code as well as other entities such as Work Item or WIKI.

IMPORTANT UPDATE (02 May 2019) :

The Azure DevOps Search team has been lately working on a new set of automated Search Repair scripts. These PowerShell scripts provides a bunch of auto-analysis and mitigation features (including re-indexing mitigation which have been described further below). I would strongly suggest to try out these scripts for any Search related issues. Do provide feedback to CSS or PG team, and/or report issues if you encounter any. 

Currently the scripts support only Azure DevOps Server 2019 and TFS_2018 (including all Updates). To access and run the scripts:
  1. Download/clone the entire Github repository
  2. Open Powershell as admin.
  3. Navigate to <Product name>\Troubleshooting directory in the Github repository. For example, Azure_DevOps_Server_2019\Troubleshooting for troubleshooting a Azure DevOps Server 2019 installation.
  4. Execute script .\Repair-Search.ps1 with the appropriate parameters. It will look for problems in the system and possibly suggest mitigation. 
  5. For more information about this script, execute Get-Help .\Repair-Search.ps1.

Clean-up Index Data and Re-index

This applies to scenarios such as [1] & [2] listed above where there are index level errors and the index needs to be re-built.
  • Pause Indexing for all collections. Run PauseIndexing.ps1 script on TFS Configuration DB
  • Login to the machine where the Elasticsearch (ES) is running

  • Stop the ES service

  • Delete the entire Search Index folder (something like, C:\TfsData\Search\IndexStore, or wherever you had configured it to be)

  • Restart the TFS Job Agent service(s) on the AT machines

  • Get the list of Associated Job Ids from each of the Collection DB by running the following query at every collection:
SELECT [AssociatedJobId]
              FROM [<CollectionDB>].[Search].[tbl_IndexingUnit]
              WHERE [AssociatedJobId] IS NOT NULL

            (Save these JobIds; they will be used to run the delete command on tbl_JobQueue in one of the steps further down. The sequence of steps are very important, hence do not swap any of the steps)
  • Delete the following tables from each of the collection databases.
 DELETE FROM [Search].[tbl_IndexingUnit]
               DELETE FROM [Search].[tbl_IndexingUnitChangeEvent]
               DELETE FROM [Search].[tbl_IndexingUnitChangeEventArchive]
               DELETE FROM [Search].[tbl_JobYield]
               DELETE FROM [Search].[tbl_TreeStore]
               DELETE FROM [Search].[tbl_DisabledFiles]
               DELETE FROM [Search].[tbl_ItemLevelFailures]
               DELETE FROM [Search].[tbl_ResourceLockTable]  
  • Delete Indexing jobs from JobQueue using the command below with inputs of the above query. Note: this needs to run at configuration level.
  DELETE FROM [Tfs_Configuration].[dbo].[tbl_JobQueue]
               WHERE JobSource = ‘<CollectionHostId from tbl_ServiceHost>’ and
               JobId in (<list of Associated JobIds from the tbl_IndexingUnit>) and AgentId IS NULL
  • [This step applies ONLY to TFS 2018 Update 2 and earlier]

    • Open "%Program Files%\Microsoft Team Foundation Server 2018\Search\ES\%ESVersionFolder%\config\elasticsearch.yml"

    • Insert a line :  "action.auto_create_index": "false";

    • Save the elasticsearch.yml file.
Try the last script on a smaller collection first (which has less number of repositories) so that you can verify that indexing happened correctly and the results are query-able.

Re-index at Collection level

This applies to scenario where the index configuration and health is good; however the search results are not as expected and you need to refresh the index for this specific collection (something in the lines of scenario [4] above).

There are two approaches to re-indexing here:

(A) Extension Uninstall and Install
  • Uninstall the extension cleanly (Refer to the detailed guidance in the post here)
  • Install the specific entity extension for the collection from the Local Gallery (http://<Server>/tfs/_gallery)
  • [This step applies ONLY to TFS 2017 Update 3 and beyond]

    • Verify the current status of the Account Fault-In Job which got triggered by the entity extension install is not continuously re-queueing itself for extended period of time (say, > 15min)
  SELECT [StartTime], [Result], [ResultMessage]
                      FROM [Tfs_Configuration].[dbo].[tbl_JobHistory] as JobHistory
                      INNER JOIN
                      [Tfs_Configuration].[dbo].[tbl_ServiceHost] as ServiceHost
                      ON JobHistory.JobSource = ServiceHost.HostId
                      WHERE JobId = 'Entity-AccountFaultInJobId'
                      -- for Code = '02F271F3-0D40-4FA0-9328-C77EBCA59B6F'
                      -- for WorkItem = '03CEE4B8-ECC1-4E57-95CE-FA430FE0DBFB'
                      -- for WIKI = '27B11FD5-1DA5-48B4-A732-761CE99F5A5F'
                      and ResultMessage like '%Requeue the Account Fault-In job since Extension Uninstall sequence is still in progress%'
                     ORDER BY StartTime desc

If you continue to see a ResultMessage such as "Requeue the Account Fault-In job since Extension Uninstall sequence is still in progress", it implies the entry #\Service\ALMSearch\Settings\IsExtensionOperationInProgress\%EntityType%\Uninstalled was not reset correctly (where EntityType = Code, WorkItem or WIKI depending on the extension that was uninstalled in above step). Refer the uninstall extension post here on the mitigation to clean this up.
  • Depending on the code/work item volume in the collection, the re-indexing will take it's time. To monitor the indexing progress, check the blog post here.
(B) Collection re-indexing through script.

Re-index at Repository level

This applies to the scenario where the index configuration and health is good; however the search results are not as expected for some specific repository and you need to refresh the index for this repository (something in the lines of scenario [4] above). Currently this applies to Code Search only.

Couple of important points related to re-indexing of collection/repository -
  • Bulk indexing is a costly operation. Depending on the volume of code/work item data in the collection, it might take from few minutes to the order of few days to complete. Hence, in case of search query returning stale data, it's advisable to wait for 12-24 hours for the indexer's scheduled patch operation to execute and auto-patch the index.

  • For all scripts, do ensure you are picking up the correct version from the appropriate TFS release folder in GitHub.

No comments:

Post a Comment

Installing Azure DevOps Server Search extension from Local Gallery

One of the mistakes that Azure DevOps Server Admins make is installing the Search extension from the Visual Studio Market Place . The Sear...