Thursday, October 16, 2014

Lync Management Shell Error "Automatic Collection of Configuration data failed"

I recently ran into an error when building another pool in an environment. This was nothing new, we have built 4 other pools with no issues. This pool was to be placed in the APJ region for the company. From an OS perspective nothing was different, all the same pre-requisites installed, same VM configuration and same networking gear.

When installing Lync I came across the following error below:

clip_image001

My first thought was latency, this was a site that had higher latency than any other side in our environment. For example we were running about 280ms of round trip latency. Not that this is a ton of latency for tradition SIP traffic (too high for AV) so I did the clever trick of exporting and importing the Configuration. As expected this went right in and completed on all four FE's in this region no problem.

After the install I did some checking and all CMS replication looked to be good so I proceeded on ward with the install. Next, I had my four FE's installed and ready to update. When trying to run the Database update I continually came across the following error:

image

Not 100% sure what caused this error I turned to the internet, not finding much on the interweb we talked to a friend at MSFT and he found a case from a while ago that referenced updating .NET. We were running Windows 2012 NON R2 and had .NET 4.5 installed and we decided to run a trial on the update of .NET:

http://www.microsoft.com/en-us/download/details.aspx?id=42643

To our dismay, we were now able to execute our PowerShell commands successfully and update the databases.

NOTE - After digging around I found this blog with the same situation and I figured I would pass some credit his way:

http://todayitnotes.blogspot.com/2014/05/automatic-collection-of-configuration.html

YMMV

Monday, July 14, 2014

Lync Mobility Issues - Event IDs 1309, 5011, 20002

Environment:
TMG Array
ACE Load Balancer (I know it's not supported, we have bypassed and also see issue behind F5)
VMWare Environments
McAfee Anti-Virus

POC
3 FE servers combined Mediation servers

Production
7 FE servers
5 ME Servers
Dedicated VM gear

Users are losing connections to the Lync environments on mobile devices. This is happening every 24 hours. The UCWA services crashes on the box rendering everyone on the particular server unable to connect. This ONLY affects users on the server where the crash happened. You can find this information by running a get-CSUSERPOOLINFO - Identity.

clip_image002


clip_image004


clip_image006


Corresponding with the following error on the iPhone:

 
clip_image008

Findings:

In our findings we are finding that the W3WP services fails on the UCWA virtual pool. The pool is getting too many exclusions and causing the .NET domain to shut down.

Running "C:\windows\system32\inetsrv\appcmd.exe List WP" will show the processes running on the box.

clip_image010

Next going to "Task Manager"

You will see that process 2332 is not listed. This is not listed because the worker process has shut down. Only way to get this back up and running is to perform a IIS reset.

We have tried just recycling the application pool for UCWA but we found that this would sometimes work and sometimes not. Also, we have tried IISReset -noforce but we found that this would also be hit or miss. From time to time we had the w3wp.exe server not restart and would require us to "End Task" on the process.

Working with Microsoft we had taken many logs. We started with the DebugDiag.exe tool. This caused us other issues. Within 12 hours we would have the boxes crippled by this tool since it consumed 28gigs of the 32gigs available. We have also take logs with Procdump.exe. This was also unhelpful for us, we would see the crash happen with Procdump running and it would not catch it.

The DebugDiag logs that did get captured showed a few things. First they showed that we had memory issues, this was to be expected since we had consumed most of the memory in the box. The frustrating part was that we only ran this for 8 hours and had the crash. Secondly, these logs showed that we had a TON of exclusions, the million dollar question is what is causing them. We spent about 2 weeks on this issue with MSFT with no resolution. So, the decision to rebuild was made because it was suspected that the issue was 2012 R2 with a new patch that fixed issues with Windows Update Services.

Possible Fixes Tried:
Anti-virus Exclusions
Windows 2012 R2 and Windows 2012 both showing symptoms of this issue
Bypassing Load Balancers
Moved Servers to dedicated VM gear
Installed F5 to replace the Cisco ACE
We rebuilt the entire pool to eliminate 2012 R2 as a possible issue

Final Resolution:
On the resolution we started working with MSFT and McAfee together. The initial fix was to exclude the following and change some IIS settings to the following:
 
  • Ensure exclusions for Anti-Virus programs include the following:
    • %systemdrive%\Windows\Microsoft.NET
    • %systemdrive%\Windows\assembly
    • %systemdrive%\Windows\system32\inetrsrv
    • %systemdrive%\inetpub\temp

Also, include all sub folders and the exclusions specified in http://technet.microsoft.com/en-us/library/dn440138.aspx.

A command needs to be run on the Lync Servers for the below:


  • To ensure UCWA continues to work following a recycle event run the following command on Lync machines:
    • C:\Windows\System32\inetsrv>appcmd set config /section:applicationPools /[name='LyncUcwa'].recycling.disallowOverlappingRotation:true

Ensure IIS is logging all recycle events by running the following on Lync servers:

  • C:\Windows\System32\inetsrv>appcmd set config /section:applicationPools /[name='LyncUcwa'].recycling.logEventOnRecycle:Time,Requests,Schedule,Memory,IsapiUnhealthy,OnDemand,ConfigChange,PrivateMemory

This command was used to ensure that the PID didn't overlap when the UCWA application pool restarted.

So, the final verdict came in, the issue is two-fold. First, McAfee is scanning and manipulating files for the UCWA application pool. This is causing UCWA to fail, MSFT has this following command that resolves the failure:
 

C:\Windows\System32\inetsrv>appcmd set config /section:applicationPools /[name='LyncUcwa'].recycling.disallowOverlappingRotation:true

This in short allows the UCWA pool to recycle properly when this event happens during the scan. This is a solution that McAfee and MSFT are fixing, MSFT is coming out with a hotfix for this situation. McAfee is also coming out with a fix to leave the UCWA directory files along. So, for now the overlapping set to True is a fix to solve the frustration.


Lync 2013 WAC Issue "Sorry, PowerPoint Web App Ran into a problem opening this presentation."

    Issue: When users are trying to share PowerPoints in meetings the PowerPoint's are uploaded fine (mainly PPT) but the user receives an error "Sorry, PowerPoint Web App ran into a problem opening this presentation. To view this presentation please open it in Microsoft PowerPoint."




























    Environment:
    (2) WAC servers behind a F5 Load Balancer
    TMG Cluster with 4 TMG servers.
    (7) Lync 2013 Enterprise Server

    Trials:
    I have tried a number of things to resolve this issue:
    • Bypass TMG by trying internal - Fail
    • Bypass F5 using host files and targeting each WAC server to ensure it wasn't an issue with one - Fail
    • Tried loading PPT on other environments to ensure it wasn't a corrupt file - Success
    • Loaded PPT on the WAC farm in our DR site - Success
    • WAC SP1 - Fail
    • Rebuilt both WAC servers - Fail

    What I have used to troubleshoot this issue:
    • Fiddler is the most detailed in helping pin point issues – Fiddler Showed the 200 during the upload but then the 500 during the failed download
    • IIS Logs - had shown me pretty much what fiddler showed
    • ULS Logs - wasn't finding much other than the 500 error until verbose logging was enabled then showed the following error relating to cache:

    DiskCacheReader: TimeoutException [Machine: http://lyncWACServer01:809/diskcache/DiskCache.svc, Exception:System.TimeoutException: The HTTP request to 'http://lyncWACServer01:809/diskcache/DiskCache.svc' has exceeded the allotted timeout of 00:00:02. The time allotted to this operation may have been a portion of a longer timeout. ---> System.Net.WebException: The request was aborted: The request was canceled.   
    DocumentInfoCache.GetDocumentCacheItem: Item found, 0 minutes old
    SetCompleted - Completed with unthrown exception Microsoft.Office.Server.Powerpoint.Pipe.Interface.PipeApplicationException: Exception of type 'Microsoft.Office.Server.Powerpoint.Pipe.Interface.PipeApplicationException' was thrown.   

    • Event Viewer on WAC server - fund this kind of useless        

    Resolution:
    In working on this a few commands came in handy:

    • Set-OfficeWebAppFarm -openfromurlenabled - This command allows you to generate a PowerPoint right on the server itself. This is very useful in eliminating Lync from the equation as well as any network related issues. To get to this tool you simply browse straight to the server itself (or VIP.




    How I did this was simply created a shared folder on the desktop, placed the path to the folder along with the powerpoint file (\\testserver\c$\user\test\PPT\Test.pptx in the first line and used the "Create Link". Then using the Test This Link I was able to see if the PowerPoint would render on the screen.

    In my case no such luck, same old error as above. Since I Knew I was failing locally I figured why not turn up some more logging to see if I could find something. The command allowed me to do this:

    • "Set-Officewebappsfarm -logverbosity verbose" this turned logging on high in OWAS, this also requires a services restart to complete.

    With verbose to high I ran back through the same tests with the same result and not seeing much other than cache issues. Speaking with MSFT PSS we were informed that whenever a PowerPoint fails it's never cleaned from the cache and will no longer display. The only way to clean this is to remove the cache. To do this its back to stopping services on the servers browsing to "C:\ProgramData\Microsoft\OfficeWebApps\Working\d" and removing ALL of the contents from the "d" folder.






































    With the "d" folder clear, restart your services and give another test. In my case I was now able to render the PowerPoint file just fine. With everything back to normal so you can now disable the logging:

    • "Set-Officewebappsfarm -logverbosity """

Wednesday, April 30, 2014

Lync 2013 Front End Services Not Starting. Lync Running on Windows 2012 Server.

This past week I ran into an issue with my lab that I thought was worth a blog post. Out of nowhere my lab host VM froze and required me to hard restart the box. This isn’t the first time I have had this issue due to my hardware being on its last let. It’s a lab what do you expect! Bringing back my VM host and all the guest just like I have done many times before I noticed something different. The front end service on my Standard Lync 2013 server would not start. I was also seeing some consistent event errors each time I tried to restart the service:

To give a bit of background on my lab I am running Lync 2013 Standard on a Windows 2012 Standard box. This solution rides on a Windows 2008 R2 domain with an internal private CA.

















Event ID: 32178 Source: LS User Services

 
















Event ID: 32174 Source: LS User Services



















Event ID: 57006 Source: LS User Store Sync Agent

Steps Spent Troubleshooting:
Starting down my troubleshooting path it lead me to a google page full of resolutions that had me deleting non self-signed certificates from the “Trusted Root Certificate Store” on my computer. I spend a bit of time scratching my head on this one trying to see what I was missing thinking it had to be this issue to resolve my problem. Everything was lining up perfect from an event viewer standpoint. After a few different looks in the store I decided to move on.

Next I moved to more posts stating that I needed to rebuild my fabric for Microsoft Lync. Since I was using a STD server I thought it was kind of odd but what the heck it’s my lab, let’s try it all. Again as with the last possible resolution, I spent some time trying a few of the different commands to see if they would have different outcomes and still no success.

I then thought what about updates! Maybe running the CU would help fix my issue. So, I ran the latest CU updates over again and re-published my databases hoping that if it was a DB issue it would resolve my issue. Once again, no success.

Resolution:

Finally I thought it was time for the next step, rebuild the solution. I really didn’t want to do this since I didn’t want to rebuild my entire lab so I took the next best thing. Reinstalling the databases to see if I could get this resolved. Again, this is my lab and I didn’t have a ton of data but, it is still data so I proceeded with caution. 

To start this I wanted to ensure I wouldn’t lose my information so thanks to Elan Shudnow’s blog - http://www.shudnow.net/2012/10/09/dbimpexp-exe-functionality-integrated-into-lync-2013-preview-management-shell/ and using:

Export-csuserdata
Export-csconfiguration - If my XDS database went bad I had a copy
Export-cslisconfiguration – again I just wanted all my data

I was able to export my data and give some confidence that I could bring my data back online after my reinstall.

Running the following command allowed did a clean install of my databases. Since I thought it was related to my database I was left with no option.

Install-CsDatabase -ConfiguredDatabases -SqlServerFqdn <SQL FQDN> -UseDefaultSqlPaths –Clean

Once the reinstall was done, I reimported my user data to ensure I would have all my info and my front end service started with no issues. Luckily I had some hotel time while I am staffed on a project out of town so I was free and clear to mess with the lap and get this resolved. In the end I fixed the issue with no data loss.


YMMV

Friday, March 28, 2014

Lync 2013 Lab with SIP Trunks on the Cheap - 1

Hi Blog World,

First of all since this is my first blog post I figured I would spend a few minutes introducing myself. My name is Tony Larsen, I have been working in the IT space for over a decade in numerous roles, for most of my past my passion has always aligned me with Microsoft UC. I started in the space back in Live Communications Server 2007. I then started my journey down the road of UC waiting for the day that big business would see the light of the power of MSFT UC. Now in my time I work for Avanade as a Group Manager within the Lync Solutions group. We focus on enterprise rollouts of Enterprise Voice. This is an amazing time since big business has finally latched on and we are going full steam ahead.
So, my first blog I feel is fitting. Since I started working within UC I always found it challenging to find a way to have a functional lab. Other than the bits of Lync the gear is hard to get, having a functional PBX isn’t the easiest especially for someone new to the field. For the next few posts I am going to show how I get my lab up and running connecting to the SIP world and ringing Cisco phones (flashed with a generic SIP firmware), Lync Polycom Phones, and SIP trunks to the real work. What does all this require one might ask?

  • Virtual instance of Lync 2013 Standard
  • FreePBX – 5.2.11
  • SIP trunks from any carrier – Since getting a Lync certified SIP trunk isn’t the cheapest I prefer to use FreePBX to terminate cheaper SIP trunks that I can do a Pay-as-you-go
    • Teliax
    • Les.Net
  • Cisco IP Phone 7960
  • Polycom CX500


This solution gives someone the opportunity to lab out the complete power of Lync 2013 with Enterprise Voice. As I get the lab setup and working I will continue to post my findings. Hope to see you all back soon.

Tony