Friday, October 30, 2009

Dependency Analysis of .NET and SQL Server Applications

Recently I starting making some small updates to a very cool dependency analysis application that I designed last year. The basic premise is to be able to trace dependencies from the database up through the user interface or vice versa in order to answer questions such as, if we change table XYZ, what stored procedures and ASP.NET forms will be affected?

The application is actually very good at answering those questions. Field level analysis was more difficult to achieve, so left that out, but at the object level (table/view/proc/class/form) it works pretty well.

I'll describe here, at a high level, the approach I took for each type of data. Let me know if you are interested in code samples for specific implementations!

To analyze .NET, I analyze the actual assemblies using disassembly and reflection. This was tricky but very do-able. Essentially I grab a copy of all the assemblies used by an application and load them "for reflection only" into a separate AppDomain. I gather some dependencies by looking at the types references in member signatures (using reflection) as well as referenced in code (using the disassembler). I used Lutz Roeder's .NET Reflector for the disassembler (http://blog.lutzroeder.com/), which RedGate now has available for free (http://reflector.red-gate.com/).

Also, on the .NET side, I parse configuration files for connection strings and I have "framework" specific code to look for particular attributes or other means that various projects I am analyzing have used to map .NET code to database objects.

To analyze SQL Server database schema, we used the RedGate tools and parsed the xml output. To analyze SSIS packages, we used the SQL Server 2005 Metadata Toolkit (which is actually on codeplex at http://code.msdn.microsoft.com/SqlServerMetadata/, contrary to what the download link at Microsoft http://www.microsoft.com/downloads/details.aspx?FamilyID=182bd330-0189-450c-a2fe-df5c132d9da9&displaylang=en says.).

To analyze stored procedure, functions, and views T-SQL code for dependencies, we used http://www.sqlparser.com/.

The basic idea is to record the objects I find and the relationships between those objects, and then provide ways to explore the data. I came up with a system to record two dimensions, or types, of relationships -- "contains" (e.g., a database contains a table) and "uses" (e.g. a stored procedure "uses" a table, or a .NET class "uses" a stored procedure, etc.).

Normalizing the Day of the Week in SQL Server

There is a very interesting in article in SQL Server Magazine about normalizing the day of the week -- this is very relevant if you need to write code that performs business logic based on, say, Fridays, and you want that code to work properly no matter where in the world the SQL Server is installed:

Normalizing the First Day of the Week, September 2009
http://www.sqlmag.com/Articles/ArticleID/102470/102470.html?Ad=1

Friday, October 23, 2009

Capturing StandardOutput AND StandardError from a Process

I recently revived a program I had written to do an automated build and deploy of a solution containing a Visual Studio Database Edition project as well as a client UI in WPF. The build program runs a bunch of command-line programs, capturing all the output along the way and keeping track of errors, etc. When I was going through the code, I noticed a bug. I am using ProcessStartInfo and Process and I thought I was capturing both StandardOutput and StandardError but I wasn’t. My code looked like this:

      psi.UseShellExecute = false;
psi.RedirectStandardOutput
= true;
psi.RedirectStandardError
= true;
Process p
= Process.Start(psi);
string output = p.StandardOutput.ReadToEnd();
string error = p.StandardOutput.ReadToEnd();
p.WaitForExit();


So I “fixed” the code so now it looked like this:



      ...
string error = p.StandardError.ReadToEnd();
...


But when I ran it, one of the programs seemed to hang forever. I was able to determine that the program did in fact finish, and was actually my code that hung forever. It turns out that if you use ReadToEnd() on both StandardOutput and StandardError, you can reach a blocking situation that hangs your code. This is even documented on MSDN here:




“A deadlock condition results if the parent process calls p.StandardOutput.ReadToEnd followed by p.StandardError.ReadToEnd and the child process writes enough text to fill its error stream. The parent process would wait indefinitely for the child process to close its StandardOutput stream. The child process would wait indefinitely for the parent to read from the full StandardError stream.”




This is exactly what was happening in my code. I was up until 4am updating the Build program, and at this point I wanted a quick solution. The MSDN documentation hinted at the solution (asynchronous threads) but did not provide code outright. I did some searching on the Internet and found some solutions out there, but they were all much heavier than I wanted. I wanted something simple, preferably just a few lines of code and very modular, so after I couple hours of sleep and some coffee, I came up with a solution. First, I created a very simple class, which could be called from a new Thread, to capture the output of a stream:



  class MyStreamReader
{
StreamReader _sr
= null;
string _text = null;
public string Text { get { return _text; } }

public MyStreamReader(StreamReader sr)
{
_sr
= sr;
}

public void Go()
{
_text
= _sr.ReadToEnd();
}
}


Then I used that class to capture the standard error and standard output from my newly launched process, as follows:



      ProcessStartInfo psi = new ProcessStartInfo(commandToRun);
psi.UseShellExecute
= false;
psi.RedirectStandardOutput
= true;
psi.RedirectStandardError
= true;
Process p
= Process.Start(psi);

// Create my objects to capture output asynchronously
MyStreamReader msr_stdout = new MyStreamReader(p.StandardOutput);
MyStreamReader msr_stderr
= new MyStreamReader(p.StandardError);

// Create the thread objects to run the code asynchronously
Thread t_stdout = new Thread(msr_stdout.Go);
Thread t_stderr
= new Thread(msr_stderr.Go);

// Launch both threads
t_stdout.Start();
t_stderr.Start();

// Wait for both output and error streams to finish
t_stdout.Join();
t_stderr.Join();

p.WaitForExit();

// retrieve the output and error text
string output = msr_stdout.Text;
string error = msr_stderr.Text;

Console.WriteLine(output);
Console.WriteLine(error);


I tested out my new code now everything works properly – now I just need to thank the developer who checked in the post deployment scripts that caused the errors, otherwise I might not have caught this problem to begin with!

Friday, October 16, 2009

Opening a Visual Studio Solution from a Source Control Repository

Whether you are using a new machine that you have just setup, or a new developer is joining the project, the situation often comes up where a Visual Studio solution or project is already in source control and now you want to open it in Visual Studio and have all the source control bindings work properly.

This seems like such a simple task, but over the years I have seen people lose hours of productivity running into problems getting the bindings to work properly. Over the course of those years, I have found the following steps to work really well. I usually use SourceGear’s Vault product, but these general steps apply to just about any source control system that integrates with Visual Studio, including Visual SourceSafe, Team Foundation Server, and Ankh SVN.

You need to get the bindings to source control setup for a solution on your local development machine once, and then after that you can easily just click on the SLN file or open the solution from Recent Projects in Visual Studio, and the bindings will work properly (even using the correct source control plug-in if you use multiple plug-ins).

The following are the steps for the one-time initialization process – or re-initialization if your bindings have somehow gone awry. These steps assume that you do not have the source control program (if it has a standalone client) oepn or Visual Studio open when you start the steps.

  1. Open your source control client outside of Visual Studio
  2. Set the working folder, if it is not already set, at solution level or higher, make sure there are no overrides lower in the folder tree

    1. Especially for Vista or Win7, make sure the working folder doesn't require admin rights – e.g. in C:\Users and not C:\ etc.
  3. Get Latest Version on Solution, in Overwrite mode (this might be called “check out” in SVN, but we are just retrieving here, not locking).
  4. Open Visual Studio
  5. Tools/Options - Choose source control client to be the appropriate source control client, e.g.:
    image
  6. Open <MySolution>.sln from Source Control using File -> Source Control -> Open from Source Control (note that the exact command name varies by source control plug-in, but you get the idea)
  7. Verify that entire solution loaded correctly
  8. Close Visual Studio
  9. Close your source control client, if you had one open
  10. For now on, open solution by opening <MySolution>.sln on your hard drive. (then it won't matter if you set your default source control provider to something else, because the <MySolution>.sln is bound to the one you chose above)

VSDB: Schema Compare and “Update Action” Override Persistence

On my team, we frequently use the Schema Compare tools to push local database changes into the project (and then into source control). In the process, we often override the Update Action to Skip for items which correspond to changes someone else has made to the project since we last did a deploy.

For example, let’s say Bob does a deploy and then changes his local database. In the meantime, Jane adds a new stored procedure to the project. Now, when Bob does a Schema Compare, he will not only see his changes, but the Schema Compare will suggest Dropping the new stored procedure that Jane created since it does not exist in Bob’s database. Bob switches his view to Non Skip objects and notices this, and changes it from Drop to Skip. Well, the next time Bob opens that same Schema Compare, this same stored procedure will be skipped again – but Bob may want to get a “fresh” schema compare and see all differences (and thus “Non-Skip” items). Bob could create a new Schema Compare, but then he would have to carefully select all the Options again….

When you open a Schema Compare session and you choose to override an "Update Action" – e.g., SKIP one of the proposed updates, it actually remembers that overridden setting for that object even when you refresh, or save and re-open, the Schema Compare. This could actually be really useful at times but it can also cause chaos and misery if you don't know about it or have forgotten that you are overriding something to Skip that you actually want to update now.

image

These Update Action overrides are persisted in the .scmp file. If you want to clear them all, you close the schema compare and then open the .scmp file in a text editor (such as the Visual Studio XML Editor):

image

Then delete all the nodes between the <ExcludedSourceElements> and </ExcludedSourceElements> as well between <ExcludedTargetElements> and </ExcludedTargetElements> (note that below I collapsed the SourceModelProvider, TargetModelProvider, and SchemaCompareSettingsService nodes for brevity):

image

Save this version of your schema compare and check it into source control. Then in the future if you want clear your overrides you can simply do an Undo Checkout from source control, or choose not to Save your Schema Compare when you close it.

VSDB: Tracking down Deploy errors

I’m working on a project that has thousands of lines of post-deployment scripts. Once I have done a successful build, I still sometimes get errors in the deploy, usually indicating a problem in one of my post-deployment scripts. Unfortunately, Visual Studio Database Edition rarely gives me enough information to easily identify the location or source of the error.

I believe the problem is two-fold. First off, any output from PRINT statements that may help you locate the error might be useless – I believe that the PRINT output is buffered and that when the error occurs, the buffer might not be flushed. So there may still be a bunch of PRINT output between what you see and when the error really occurs (hey, if anyone from the VSDB team is reading this, please fix this in VS.NET 2010!). The second problem is that VSDB does not locate the error for you and tell you which file and what line the error is on. That’s probably because it concatenates all of the post deployment scripts together with the database creation script into one huge SQLCMD file.

So, what's a developer to do?

Once of the fastest ways to figure out the issue is to leverage another tool that will show you ALL of the PRINT output as well as tell you what line of code the error occurs on: SQL Server Management Studio.

When you do a deploy, VSDB creates a big script that creates the entire database – this script also includes all of the contents of the post-deployment scripts. You want to run this script in SSMS to find out where the error is. Then, find that same piece of code in the project (probably in the post-deployment scripts section). Hopefully you will quickly recognize it once you see it.

Here's what to do:

  1. Locate the database project's folder (your working folder on your PC). If you're not sure you can find it here:
    image

  2. Navigate to the "sql" folder:
    image

  3. Then navigate to the folder corresponding to the Build Configuration you used, e.g. "debug":
    image

  4. Find the .sql file that matches the name of your deployment database (you specified this in your database project's properties' Deploy tab at some point, otherwise it might default to the name of the project):
     image

  5. Open that file in SQL Server Management Studio.

  6. Note that this .SQL file is a "SQLCMD" file, using special syntax with commands that start with ":" to set variables etc. Don't worry about this too much, but once the script is open in SSMS, be sure to toggle on SQLCMD in the Query menu:
    image

  7. Make sure you are connected to the correct SQL Server\Instance before you run the script. It will probably drop and re-create the database you had specified, just like the Deploy command would, so keep that in mind. Note that you can tweak a couple things if you want – changing the DatabaseName variable for example, and removing the specification of the MDB/LDF file names from the database creation.

  8. You can now run the script (F5).

  9. Now hopefully you will get the same error message again, but this time when you double-click on the error message you will get to the exact script line that is causing the error, so that you can correct it in the project. Also, you will probably get all of the PRINT output leading up to that, which may help.

VSDB: Deployments

Over the course of working on a couple VSDB projects with multiple developers, I have found that productivity loss and confusion can result from developers’ settings producing unexpected results during a deploy – for example, having some changes to the database project in source control not get applied to the developer’s database; or the developer deploying to a different database than expected.

I have found that the following settings provide consistent results.

For the Database Project, edit your settings (right click the Database Project and select Properties):

image

Now make sure you UNCHECK "Block incremental deployment If data loss might occur" (shown checked here)

CHECK "Always re-create database" and of course "Back up database before deployment“.

image

These settings will ensure that your local development database is always exactly what is checked into source control – nothing less and nothing more. That should help avoid issues where one developer inadvertently undoes the changes of another developer, etc.

Using Visual Studio Database Edition (VSDB)

Database Edition is designed so that each team member works with their own isolated database. Some folks will also be making changes to the database, and others may not be.

If you are planning to make changes to the database, here are the steps I recommend:  

  1. Check with other team members to see if you might be changing some of the same objects or not. Coordinate so that you do not overwrite each other's changes.  
  2. Get latest version of at least the database project files from source control.  
  3. Update your database from the latest database project files. There are a couple ways to handle this. Let's assume you do NOT have any changes or data in your database that you need to keep. Simply do a DEPLOY of the database project to get the latest version. (If you do have changes you need to keep, then skip to Step 5 and be VERY careful about what you sync up.) If you are not sure, you can run a schema compare to see the differences.
  4. Make your changes.  
  5. Sync your changes back to the project.  

    1. Run a Schema Compare

      1. Source: your isolated DEV database
      2. Target: Project
      3. Options: see my previous email and sure the correct Object Types are being ignored
    2. Carefully review the Schema Compare output

      1. Filter to Non-Skip Objects
        image
      2. Scroll through the list of Objects
      3. For any objects that you did not intend to change, delete, or add, change the Update Action to Skip

        1. Be sure that you are not deleting stored procs, etc., that someone else recently added – e.g. if you have done another Get Latest Version on the database project since you deployed your isolated database
    3. Write Updates to the target – this will update all non-skip items you reviewed above
       image
  6. Check In your changes to the Project in  
  7. Let the team know that you checked in database changes and what they might affect. The team may need to update other code and get your changes before they make DB changes. 

Problems are likely to occur if folks get careless about reviewing schema compare output and what will get updated. For example, let's take the following example:

  1. Person A gets latest and deploys
  2. Person B gets latest and deploys
  3. Person A creates new proc MyProcA, syncs back to the project, and checks in
  4. Person B creates new proc MyProcB
  5. Person B does schema compare to project (maybe a get latest version happened in the project, too – for example, by adding a new object the solution file will need to be checked out)

    1. NOTE: Schema compare will want to ADD MyProcB and DELETE MyProcA (because MyProcA does not exist in Person B's isolated database)
  6. Person B ignores output and does Write Updates and then Checks In … now MyProcA has been removed from the project

So that's more or less why the steps outlined above are in general good practice and will hopefully keep you out of trouble.

Also note that Scripts are not touched by Schema Compare (e.g. post-deployment scripts which seed data).

For more reading… (this is from the DB edition documentation, available at):

http://msdn.microsoft.com/en-us/library/aa833404(VS.100).aspx

Or download the whole doc at (see Documentation.zip at the end in the Downloads section):

http://www.microsoft.com/downloads/details.aspx?FamilyID=bb3ad767-5f69-4db9-b1c9-8f55759846ed&displaylang=en