Mudis Tech Blog

Friday, September 9, 2016

Sitecore 8 integration hooks with Solr 4.6 / ZK

Lately we ran into Solr stability issues in production losing the nodes out of the 3 nodes cluster.

We have a Solr 4.6 Zookeeper 3.5.8 running on Linux Centos machines. There are about 15 sitecore cores/collections configured on Solr instances.

3 nodes cluster setup

16GB of ram - JVM set at 4GB

Java 1.7

We noticed that sitecore_analytics_index core has a lot of records which is not a big deal for Solr. However, we are using the standard out-of-the-box configuration. Is there anything we need to config for Sitecore cores to perform better? What are the best practices for Sitecore with Solr (if any)?

Also, Sitecore is running very heavy queries (not sure which core is doing this) so it is eating a lot of JVM resources. This causes our solr instance to go down.

Some errors we are experiencing:

· ERROR SolrDispatchFilter null:ClientAbortException: java.net.SocketException: Broken pipe
· SolrCore [sitecore_marketingdefinitions_web] PERFORMANCE WARNING: Overlapping onDeckSearchers=2
· ERROR java.lang.OutOfMemoryError: Requested array size exceeds VM limit
· ERROR SolrCmdDistributor org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2,? try again later.

Solutions applied:

I Increased the memory footprint on all the 3 Linux servers to 32 GB RAM and 16 GB JVM

1. Login on the Linux server as ‘root’

2. sudo service tomcat stop

3. sudo kill -9 pid (tomcat) – only if tomcat did not stop gracefully

4. Navigate to /opt/tomcat/bin

5. Create a new setenv.sh file

6. Edit the file and add

export JAVA_OPTS="$JAVA_OPTS\

-Xms12288m\

-Xmx12288m\

-XX:+HeapDumpOnOutOfMemoryError\

-XX:HeapDumpPath=/var/log/\

-XX:MaxPermSize=1024m\

-XX:MaxNewSize=2048m\

-XX:NewSize=2048m"

7. Save the file

8. sudo service tomcat start

9. Navigate to solr1.environment.pmi.org:8080/solr

10. Ensure the Physical Memory and JVM-Memory has changed.

The Solr was restored and was stable for a few days but the errors started creeping up and we saw the same behavior.

We captured the catalina logs from tomcat/logs folder and did indepth analysis.

We found that all the 5 Sitecore web instances were configured to logs the analytics index Therefore Solr was getting bombarded with the exceptions and the sitecore_analytics_index was growing enormously. (1GB every week ).

Some of the most common issues we notice in production logs:

1. Why is sitecore retrieving MAX rows available? This is very costly for solr.

454893853 [http-bio-8080-exec-4226] INFO org.apache.solr.core.SolrCore â€“ [sitecore_web_index] webapp=/solr path=/select params={q=(slug_s:(\/certifications\/types\/pmp)+AND+_latestversion:(True))&fq=_indexname:(sitecore_web_index)&version=2.2&rows=2147483647} hits=0 status=0 QTime=0

454877084 [http-bio-8080-exec-4197] INFO org.apache.solr.core.SolrCore â€“ [sitecore_web_index] webapp=/solr path=/select params={q=(slug_s:[*+TO+*]+AND+_group:(25e875fea5fa4ff3821a1413d4ee3bc1))&fq=_indexname:(sitecore_web_index)&version=2.2&rows=2147483647} hits=2 status=0 QTime=7

454877096 [http-bio-8080-exec-4225] INFO org.apache.solr.core.SolrCore â€“ [sitecore_web_index] webapp=/solr path=/select params={q=(slug_s:[*+TO+*]+AND+_group:(25e875fea5fa4ff3821a1413d4ee3bc1))&fq=_indexname:(sitecore_web_index)&version=2.2&rows=2147483647} hits=2 status=0 QTime=0

The code base that generates this query has been amended, it should now request only 1 row. Additionally it will call more selectively to decue the number of calls. Fixe will be distributed in next code delivery.

2. Why is sitecore retrieving all fields and also doing sorting on date fields. This is very costly for solr. (Eg: SELECT *.* )

454877030 [http-bio-8080-exec-4220] INFO org.apache.solr.core.SolrCore â€“ [sitecore_web_index] webapp=/solr path=/select params={facet=true&sort=publicationdate_tdt+desc&fl=*,score&start=0&q=*:*&f.contenttype_facet_s.facet.mincount=1&facet.field=contentsources_facet_sm&facet.field=topicsfacet_sm&facet.field=contenttype_facet_s&fq=((((((((_templates:(51fe426158da421da104f3cbc23e328d)+AND+-_templates:(216357ebc69e46ceb912707c2dba28a1))+AND+_path:(110d559fdea542ea9c1c8a5df7e70ef9))+AND+-excludefromsearch_b:(True))+AND+_latestversion:(True))+AND+_language:(en))+AND+publicationdate_year_tl:[-2147483648+TO+2012])+AND+publicationdate_year_tl:[2005+TO+2147483647])+AND+((topicsfacet_sm:("Portfolio+Management")+AND+topicsfacet_sm:("Scope+Management"))+AND+topicsfacet_sm:("Program+Management")))&fq=_indexname:(sitecore_web_index)&version=2.2&f.topicsfacet_sm.facet.mincount=1&f.contentsources_facet_sm.facet.mincount=1&rows=10} hits=2 status=0 QTime=5

The default behaviour of Sitecore is to get all the fields of the document. In the sample above we do this type of call with rows=10, so it not expensive. The sorting is necessary as this query feeds the KAS articles listing, still is cheaper to sort in Solr than sor in memory in Sitecore. Alternatively a new core could be setup with fewer fields.

3. Why is sitecore doing hard commits directly on the index. Solr does not recommend because there could be open searches that are not available.

54956494 [http-bio-8080-exec-4178] INFO org.apache.solr.update.processor.LogUpdateProcessor â€“ [sitecore_analytics_index] webapp=/solr path=/update params={waitSearcher=true&commit=true&wt=javabin&expungeDeletes=false&commit_end_point=true&version=2&softCommit=false} {} 0 13566

454956495 [http-bio-8080-exec-4178] ERROR org.apache.solr.core.SolrCore â€“ org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later.

This is managed by Sitecore internally. We have added configuration fixes as currently the logs reveal that all the 5 servers are committing to sitecore_analytics_index. After the fix only CM server will doing this.

4. Solr does not recommend to optimize the index instantly. Especially for cores like sitecore_analytics_index.

455450829 [http-bio-8080-exec-4167] INFO org.apache.solr.update.processor.LogUpdateProcessor â€“ [sitecore_analytics_index] webapp=/solr path=/update params={optimize=true&waitSearcher=true&maxSegments=1&wt=javabin&expungeDeletes=false&commit_end_point=true&version=2} {optimize=} 0 155892

This happens in 2 occasions: one automatically only when a full rebuild is executed or by Sitecore scheduled agent which currently it is triggered on sitecore_master_index

sitecore_analytics_index Indexes all the visitors interactions, and it is not populated by Sitecore content tree data. We have added configuration fixes as currently the logs reveal that all the 5 servers are committing to sitecore_analytics_index. After the fix only CM server will doing this. It looks like a lot of empty were produced in the index due to 4x deletions at the time

Thursday, March 17, 2016

Updating Application Pool Settings using powershell on a server stack

This is a powershell script that allows you to execute remotely to update the Application Pool settings on a website. Here I am trying to update the .Net version to 4.0.

$environment = (Read-Host 'Which environment is this? (localhost, erpint, erpqa, staging, candidate, production)').toLower()
# check if user enter incorrect environment

for( $i = 1; $i -le 3; $i++ ){ #loop thru each node

$server = $null
Write-Host $environment

$server = $environment + "wfe" + $i + ".pmienvs.pmihq.org" #construct the server fqdn to remote connect. This will different for you

Write-Host "Remote Executing on $server ..."

invoke-command -computername $server {
Import-Module WebAdministration
$iisAppPoolName = "yourwebsite.org"
$iisAppPoolDotNetVersion = "v4.0"

#navigate to the app pools root
cd IIS:\AppPools\

#check if the app pool exists
if (!(Test-Path $iisAppPoolName -pathType container))
{
#App pool does not exist
"App Pool $iisAppPoolName does not exist. Skipping updates."
}
else
{
#update the app pool settings
$appPool = Get-Item $iisAppPoolName
$appPool | Set-ItemProperty -Name "managedRuntimeVersion" -Value $iisAppPoolDotNetVersion
"App Pool $iisAppPoolName updated."
}
}
}

Friday, October 16, 2015

Using Telerik Kendo UI ASP.Net MVC controls (EditorFor)

Prerequisites:

I am assuming that you have installed ASP.Net KendoUI dll (Kendo.Mvc.dll).
Added a reference in your project.
Also ensure that you have included all the javascript kendo js library and css library files in your Visual Studio Project.

Add a view and put this in your razor view cshtml file.
Note that the Model must have all the properties declared and must match the names used.

@model Models.EmailModel

<script type="text/javascript" src="@Url.Content("~/Scripts/kendo/jquery.min.js")"></script>
<script type="text/javascript" src="@Url.Content("~/Scripts/kendo/kendo.web.min.js")"></script>
<script type="text/javascript" src="@Url.Content("~/Scripts/kendo/kendo.aspnetmvc.min.js")"></script>

@using (Html.BeginForm("SendEmail", "Email", FormMethod.Post))
{
<div>
<fieldset>
@Html.ValidationMessageFor(x => x.HtmlContent, "Email content cannot be empty.")
@(Html.Kendo().EditorFor(m => m.HtmlContent)
.Encode(false)
.Name("HtmlContent")
.HtmlAttributes(new { style = "height:440px" })
.Tools(tools => tools
.Clear()
.Bold().Italic().Underline().Strikethrough()
.JustifyLeft().JustifyCenter().JustifyRight().JustifyFull()
.InsertUnorderedList().InsertOrderedList()
.CreateLink().Unlink()
)
.Value(Model.HtmlContent)
)
</fieldset>
</div>
<div style="float:right">
@(Html.Kendo().Button().Name("sendButton").Content("Send Email"))
</div>
}
<div style="float:right">
@(Html.Kendo().Button().Name("cancelButton").Content("Cancel").Events(ev => ev.Click("onClickCancel")))
</div>

Saturday, June 20, 2015

Typical Exhibitor.properties config

#Auto-generated by Exhibitor

#Fri Jun 19 18:40:19 EDT 2015

com.netflix.exhibitor-rolling-hostnames=

com.netflix.exhibitor-rolling.zookeeper-data-directory=/var/lib/zookeeper/data

com.netflix.exhibitor-rolling.servers-spec=1\:192.168.174.147

com.netflix.exhibitor.java-environment=

com.netflix.exhibitor.zookeeper-data-directory=/var/lib/zookeeper/data

com.netflix.exhibitor-rolling-hostnames-index=0

com.netflix.exhibitor-rolling.java-environment=

com.netflix.exhibitor-rolling.observer-threshold=999

com.netflix.exhibitor.servers-spec=S\:10\:192.168.174.147,S\:11\:192.168.174.187,S\:12\:192.168.174.188

com.netflix.exhibitor.cleanup-period-ms=43200000

com.netflix.exhibitor.auto-manage-instances-fixed-ensemble-size=3

com.netflix.exhibitor.zookeeper-install-directory=/opt/zookeeper

com.netflix.exhibitor.check-ms=30000

com.netflix.exhibitor.zookeeper-log-directory=/var/lib/zookeeper/logs

com.netflix.exhibitor-rolling.auto-manage-instances=0

com.netflix.exhibitor-rolling.cleanup-period-ms=43200000

com.netflix.exhibitor-rolling.auto-manage-instances-settling-period-ms=180000

com.netflix.exhibitor-rolling.check-ms=30000

com.netflix.exhibitor.log-index-directory=

com.netflix.exhibitor-rolling.log-index-directory=

com.netflix.exhibitor.backup-period-ms=60000

com.netflix.exhibitor-rolling.connect-port=2888

com.netflix.exhibitor-rolling.election-port=3888

com.netflix.exhibitor-rolling.backup-extra=

com.netflix.exhibitor.client-port=2181

com.netflix.exhibitor-rolling.zoo-cfg-extra=syncLimit\=5&tickTime\=2000&initLimit\=10&autopurge.snapRetainCount\=3&autopurge.purgeInterval\=1

com.netflix.exhibitor-rolling.zookeeper-install-directory=/opt/zookeeper

com.netflix.exhibitor.cleanup-max-files=3

com.netflix.exhibitor-rolling.auto-manage-instances-fixed-ensemble-size=3

com.netflix.exhibitor-rolling.backup-period-ms=60000

com.netflix.exhibitor-rolling.client-port=2181

com.netflix.exhibitor.backup-max-store-ms=86400000

com.netflix.exhibitor-rolling.cleanup-max-files=3

com.netflix.exhibitor-rolling.backup-max-store-ms=86400000

com.netflix.exhibitor.connect-port=2888

com.netflix.exhibitor.backup-extra=

com.netflix.exhibitor.observer-threshold=999

com.netflix.exhibitor.log4j-properties=\# Define some default values that can be overridden by system properties\nzookeeper.root.logger\=INFO, FILE\nzookeeper.console.threshold\=INFO, FILE\nzookeeper.log.dir\=.\nzookeeper.log.file\=zookeeper.log\nzookeeper.log.threshold\=INFO,FATAL\nzookeeper.tracelog.dir\=.\nzookeeper.tracelog.file\=zookeeper_trace.log\n\n\#\n\# ZooKeeper Logging Configuration\n\#\n\n\# Format is "<default threshold> (, <appender>)+\n\n\# DEFAULT\: console appender only\n\# log4j.rootLogger\=${zookeeper.root.logger}\n\n\# Example with rolling log file\nlog4j.rootLogger\=INFO, CONSOLE, ROLLINGFILE\n\n\# Example with rolling log file and tracing\n\#log4j.rootLogger\=TRACE, CONSOLE, ROLLINGFILE, TRACEFILE\n\n\#\n\# Log INFO level and above messages to the console\n\#\nlog4j.appender.CONSOLE\=org.apache.log4j.ConsoleAppender\nlog4j.appender.CONSOLE.Threshold\=${zookeeper.console.threshold}\nlog4j.appender.CONSOLE.layout\=org.apache.log4j.PatternLayout\nlog4j.appender.CONSOLE.layout.ConversionPattern\=%d{ISO8601} [myid\:%X{myid}] - %-5p [%t\:%C{1}@%L] - %m%n\n\n\#\n\# Add ROLLINGFILE to rootLogger to get log file output\n\# Log DEBUG level and above messages to a log file\nlog4j.appender.ROLLINGFILE\=org.apache.log4j.RollingFileAppender\nlog4j.appender.ROLLINGFILE.Threshold\=${zookeeper.log.threshold}\nlog4j.appender.ROLLINGFILE.File\=${zookeeper.log.dir}/${zookeeper.log.file}\n\n\# Max log file size of 10MB\nlog4j.appender.ROLLINGFILE.MaxFileSize\=10MB\n\# uncomment the next line to limit number of backup files\n\#log4j.appender.ROLLINGFILE.MaxBackupIndex\=10\n\nlog4j.appender.ROLLINGFILE.layout\=org.apache.log4j.PatternLayout\nlog4j.appender.ROLLINGFILE.layout.ConversionPattern\=%d{ISO8601} [myid\:%X{myid}] - %-5p [%t\:%C{1}@%L] - %m%n\n\n\n\#\n\# Add TRACEFILE to rootLogger to get log file output\n\# Log DEBUG level and above messages to a log file\nlog4j.appender.TRACEFILE\=org.apache.log4j.FileAppender\nlog4j.appender.TRACEFILE.Threshold\=TRACE\nlog4j.appender.TRACEFILE.File\=${zookeeper.tracelog.dir}/${zookeeper.tracelog.file}\n\nlog4j.appender.TRACEFILE.layout\=org.apache.log4j.PatternLayout\n\#\#\# Notice we are including log4j's NDC here (%x)\nlog4j.appender.TRACEFILE.layout.ConversionPattern\=%d{ISO8601} [myid\:%X{myid}] - %-5p [%t\:%C{1}@%L][%x] - %m%n\n

com.netflix.exhibitor.auto-manage-instances-apply-all-at-once=1

com.netflix.exhibitor.election-port=3888

com.netflix.exhibitor-rolling.auto-manage-instances-apply-all-at-once=1

com.netflix.exhibitor.zoo-cfg-extra=syncLimit\=5&tickTime\=2000&initLimit\=10&autopurge.snapRetainCount\=3&autopurge.purgeInterval\=1

com.netflix.exhibitor-rolling.zookeeper-log-directory=/var/lib/zookeeper/logs

com.netflix.exhibitor.auto-manage-instances-settling-period-ms=180000

com.netflix.exhibitor-rolling.log4j-properties=\# Define some default values that can be overridden by system properties\nzookeeper.root.logger\=INFO, FILE\nzookeeper.console.threshold\=INFO, FILE\nzookeeper.log.dir\=.\nzookeeper.log.file\=zookeeper.log\nzookeeper.log.threshold\=INFO,FATAL\nzookeeper.tracelog.dir\=.\nzookeeper.tracelog.file\=zookeeper_trace.log\n\n\#\n\# ZooKeeper Logging Configuration\n\#\n\n\# Format is "<default threshold> (, <appender>)+\n\n\# DEFAULT\: console appender only\n\# log4j.rootLogger\=${zookeeper.root.logger}\n\n\# Example with rolling log file\nlog4j.rootLogger\=INFO, CONSOLE, ROLLINGFILE\n\n\# Example with rolling log file and tracing\n\#log4j.rootLogger\=TRACE, CONSOLE, ROLLINGFILE, TRACEFILE\n\n\#\n\# Log INFO level and above messages to the console\n\#\nlog4j.appender.CONSOLE\=org.apache.log4j.ConsoleAppender\nlog4j.appender.CONSOLE.Threshold\=${zookeeper.console.threshold}\nlog4j.appender.CONSOLE.layout\=org.apache.log4j.PatternLayout\nlog4j.appender.CONSOLE.layout.ConversionPattern\=%d{ISO8601} [myid\:%X{myid}] - %-5p [%t\:%C{1}@%L] - %m%n\n\n\#\n\# Add ROLLINGFILE to rootLogger to get log file output\n\# Log DEBUG level and above messages to a log file\nlog4j.appender.ROLLINGFILE\=org.apache.log4j.RollingFileAppender\nlog4j.appender.ROLLINGFILE.Threshold\=${zookeeper.log.threshold}\nlog4j.appender.ROLLINGFILE.File\=${zookeeper.log.dir}/${zookeeper.log.file}\n\n\# Max log file size of 10MB\nlog4j.appender.ROLLINGFILE.MaxFileSize\=10MB\n\# uncomment the next line to limit number of backup files\n\#log4j.appender.ROLLINGFILE.MaxBackupIndex\=10\n\nlog4j.appender.ROLLINGFILE.layout\=org.apache.log4j.PatternLayout\nlog4j.appender.ROLLINGFILE.layout.ConversionPattern\=%d{ISO8601} [myid\:%X{myid}] - %-5p [%t\:%C{1}@%L] - %m%n\n\n\n\#\n\# Add TRACEFILE to rootLogger to get log file output\n\# Log DEBUG level and above messages to a log file\nlog4j.appender.TRACEFILE\=org.apache.log4j.FileAppender\nlog4j.appender.TRACEFILE.Threshold\=TRACE\nlog4j.appender.TRACEFILE.File\=${zookeeper.tracelog.dir}/${zookeeper.tracelog.file}\n\nlog4j.appender.TRACEFILE.layout\=org.apache.log4j.PatternLayout\n\#\#\# Notice we are including log4j's NDC here (%x)\nlog4j.appender.TRACEFILE.layout.ConversionPattern\=%d{ISO8601} [myid\:%X{myid}] - %-5p [%t\:%C{1}@%L][%x] - %m%n\n

com.netflix.exhibitor.auto-manage-instances=0

Thursday, May 14, 2015

Building and installing Exhibitor WAR using Maven

-- Dependencies required for building Exhibitor

-- Launch CentOS console via VMPlayer
-- login as root/Password1
$ yum install git
$ yum install maven
$ quit

-- login as admin/Password

$ sudo mkdir ~/temp
$ cd ~/temp
$ sudo git clone git://github.com/Netflix/exhibitor.git
$ cd exhibitor/exhibitor-standalone/src/main/resources/buildscripts/war/maven
$ sudo vi src/main/resources/exhibitor.properties
-- uncommented the two prop lines, port to 8080 and then save the file
exhibitor-configtype=file
exhibitor-port=8080
$ sudo vi pow.xml
-- change the version to 1.5.5 and save the file

$ sudo mvn clean compile war:war

-- The exhibitor war file is built and ready to use.
-- Next Steps (if required)

$ sudo cp exhibitor/exhibitor-standalone/src/main/resources/buildscripts/war/maven/target/exhibitor-war-1.0.war /opt/tomcat/webapps/ROOT.war
$ sudo /opt/tomcat/bin/startup.sh start

Verify http://192.168.174.128:8080/exhibitor/v1/ui/index.html

Note:

Found the root cause of version not rendering on the Exhibitor landing page.Its says "dev" on the right hand side corner instead of the actual version.

If you use the 1.5.1 and 1.5.2 it works and displays the version.

Based on the artifact version in pom.xml file maven downloads files from http://repo.maven.apache.org/maven2/com/netflix/exhibitor/exhibitor-standalone/1.5.2/exhibitor-standalone-1.5.2.pom
compare with 1.5.3
http://repo.maven.apache.org/maven2/com/netflix/exhibitor/exhibitor-standalone/1.5.3/exhibitor-standalone-1.5.3.pom
the scope is runtime vs compile
therefore its not compiling for 1.5.3 or higher
so it all depends on how soon netflix would fix it and upload file.

Thursday, April 30, 2015

Knowledge Base : Apache Solr 4.6 and Zookeeper 3.4.5

Issue: Zookeeper Leader Conflict

Exception: SEVERE: ClusterState says we are the leader, but locally we don't think so*"

Cause: Zookeeper session on the leader node expires sooner than expected and its goes haywire. After this happens, leader(001) is going into 'recovery' mode and all the index updates are failing with "503- service unavailable" error message.

Resolution:

Increase the Zookeeper session timeouts in solr.xml.
Recycled Node 2 and Node 3 in sequence.
sudo service tomcat stop
sudo service tomcat start

-----------------------------------------------------------------------------------------------------------------------

Issue: Zookeeper autopurge is not working

Exception: java.io.FileNotFoundException: ./zookeeper.log (Permission denied)

Cause: Following paths in/opt/zookeeper/conf/zoo.cfg are incorrectly configured

dataDir=/opt/zookeeper/data
dataLogDir=/opt/zookeeper/data

Resolution:

1 .Stop Zk on all nodes. Edit /opt/zookeeper/conf/zoo.cfg

http://zk1.deveat.pmi.org:8080/exhibitor/v1/ui/index.html

2. Update directly from ZK UI and Commit

dataDir=/var/lib/zookeeper/data
dataLogDir=/var/lib/zookeeper/logs
autopurge.snapRetainCount=3
autopurge.purgeInterval=1

3.Run this on command line to give permissions

sudo chown -R tomcat /var/lib/zookeeper/data/myid
sudo chown -R tomcat /var/lib/zookeeper/logs/zookeeper.log

4. Restart tomcat on all nodes

References used:

http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup

New in 3.4.0: When enabled, ZooKeeper auto purge feature retains the autopurge.snapRetainCount most recent snapshots and the corresponding transaction logs in the dataDir and dataLogDirrespectively and deletes the rest. Defaults to 3. Minimum value is 3.

autopurge.purgeInterval

(No Java system property)

New in 3.4.0: The time interval in hours for which the purge task has to be triggered. Set to a positive integer (1 and above) to enable the auto purging. Defaults to 0.

-------------------------------------------------------------------------------------------------------------------------

Thursday, November 29, 2012

Generating Sequence Diagram from Visual Studio

If you have Visual Studio 2010 Ultimate edition installed on your machine, you could generate a cool sequence diagram from the code base.
Open the cs file where you have public methods declared.

Highlight the method name
Right click and click on 'Generate Sequence diagram' .
Visual studio automatically generates a sequence diagram for you in a new window.

Pages